14a32bdc0d
- Unified monorepo with backend (Express), frontend (Next.js), and devops - Backend: ESLint, Prettier, Jest tests (3 passing), health endpoint, .env.example - Frontend: Fixed build errors, fixed all lint errors (0 remaining), tests passing - DevOps: Docker Compose with PostgreSQL, backend, frontend + healthchecks - CI/CD: 3 GitHub Actions workflows (backend, frontend, docker integration) - DX: Husky pre-commit hooks with smart change detection - Docs: Root README with architecture, CONTRIBUTING.md, PR template Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
261 lines
7.2 KiB
Markdown
261 lines
7.2 KiB
Markdown
# Automatic Lighthouse Scanning System
|
|
|
|
This document describes the automatic Lighthouse scanning system that has been integrated into your website monitoring application.
|
|
|
|
## Overview
|
|
|
|
The automatic scanning system provides:
|
|
- **Scheduled Scans**: Periodic scans based on user-configured schedules
|
|
- **Change Detection**: Automatic scans triggered when website content changes
|
|
- **Subscription Limits**: Respects user subscription tiers and rate limits
|
|
- **Webhook Support**: External triggers for website changes
|
|
- **Comprehensive UI**: User-friendly interface for managing scan schedules
|
|
|
|
## System Architecture
|
|
|
|
### Core Components
|
|
|
|
1. **LighthouseScanner** (`src/services/lighthouseScanner.ts`)
|
|
- Handles core scanning logic
|
|
- Manages change detection
|
|
- Enforces subscription limits
|
|
- Simulates Lighthouse scans
|
|
|
|
2. **ScanScheduler** (`src/services/scanScheduler.ts`)
|
|
- Manages scheduled scans
|
|
- Processes change detection
|
|
- Orchestrates scan execution
|
|
|
|
3. **Cron Handler** (`src/app/api/cron/scan/route.ts`)
|
|
- Main entry point for automated scans
|
|
- Supports different scan modes
|
|
- Provides scan statistics
|
|
|
|
4. **Webhook Handler** (`src/app/api/webhooks/website-change/route.ts`)
|
|
- Receives external change notifications
|
|
- Triggers high-priority scans
|
|
- Validates subscription limits
|
|
|
|
5. **ScanScheduleManager** (`src/components/dashboard/ScanScheduleManager.tsx`)
|
|
- User interface for managing scan schedules
|
|
- Displays usage statistics
|
|
- Allows manual scan triggers
|
|
|
|
## Features
|
|
|
|
### Scheduled Scanning
|
|
- **Frequency Options**: Hourly, daily, weekly, monthly
|
|
- **Device Types**: Desktop and/or mobile
|
|
- **Categories**: Performance, accessibility, SEO, best practices
|
|
- **Subscription Tiers**: Different limits per tier
|
|
|
|
### Change Detection
|
|
- **Content Hashing**: Detects changes in website content
|
|
- **Automatic Triggers**: High-priority scans when changes detected
|
|
- **Subscription Validation**: Only available for certain tiers
|
|
|
|
### Subscription Management
|
|
- **Daily Limits**: Maximum scans per day
|
|
- **Monthly Limits**: Maximum scans per month
|
|
- **Feature Access**: Different capabilities per tier
|
|
- **Usage Tracking**: Real-time usage monitoring
|
|
|
|
### Webhook Integration
|
|
- **External Triggers**: Receive change notifications from external systems
|
|
- **Validation**: Verify subscription and limits
|
|
- **Audit Logging**: Track all webhook activities
|
|
|
|
## Database Schema
|
|
|
|
The system uses several new tables:
|
|
|
|
### Core Tables
|
|
- `scans`: Main scan records
|
|
- `scan_results`: Detailed scan results
|
|
- `pages`: Website pages with content hashes
|
|
- `metric_values`: Individual metric values
|
|
- `resource_analysis`: Resource usage analysis
|
|
|
|
### Configuration Tables
|
|
- `metric_definitions`: Available metrics
|
|
- `alert_configurations`: Alert settings
|
|
- `subscription_limits`: Tier-based limits
|
|
|
|
### Audit Tables
|
|
- `audit_logs`: System activity logging
|
|
- `crawl_queue`: Crawl job queue
|
|
- `crawl_sessions`: Crawl session tracking
|
|
|
|
## API Endpoints
|
|
|
|
### Cron Endpoints
|
|
```
|
|
POST /api/cron/scan?mode=all # Full scan (scheduled + change detection)
|
|
POST /api/cron/scan?mode=scheduled # Scheduled scans only
|
|
POST /api/cron/scan?mode=change_detection # Change detection only
|
|
```
|
|
|
|
### Webhook Endpoints
|
|
```
|
|
POST /api/webhooks/website-change # External change notifications
|
|
```
|
|
|
|
### Manual Endpoints
|
|
```
|
|
POST /api/cron/scan # Manual scan trigger (authenticated)
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
```env
|
|
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
|
|
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key
|
|
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
|
|
```
|
|
|
|
### Subscription Tiers
|
|
- **Free**: 10 scans/day, 100 scans/month
|
|
- **Pro**: 50 scans/day, 500 scans/month
|
|
- **Enterprise**: 200 scans/day, 2000 scans/month
|
|
|
|
## Usage
|
|
|
|
### Setting Up Automated Scans
|
|
|
|
1. **Deploy the Application**
|
|
```bash
|
|
# Deploy to Vercel (recommended)
|
|
vercel --prod
|
|
|
|
# Or deploy to your preferred platform
|
|
```
|
|
|
|
2. **Set Up Cron Jobs**
|
|
```bash
|
|
# Run the setup script
|
|
./scripts/setup-cron.sh
|
|
|
|
# Or follow the manual setup guide
|
|
# docs/cron-setup-guide.md
|
|
```
|
|
|
|
3. **Configure Database**
|
|
```sql
|
|
-- Run the setup script
|
|
\i setup-database.sql
|
|
```
|
|
|
|
### Managing Scan Schedules
|
|
|
|
1. **Access the Dashboard**
|
|
- Navigate to `/dashboard/websites`
|
|
- Click on a website to view details
|
|
- Find the "Scan Schedule Management" section
|
|
|
|
2. **Configure Settings**
|
|
- Toggle automatic scanning on/off
|
|
- Set scan frequency (hourly, daily, weekly, monthly)
|
|
- Choose device types (desktop, mobile)
|
|
- Select scan categories
|
|
|
|
3. **Monitor Usage**
|
|
- View daily and monthly scan usage
|
|
- Check against subscription limits
|
|
- Trigger manual scans when needed
|
|
|
|
### Webhook Integration
|
|
|
|
1. **Set Up External Monitoring**
|
|
- Configure your external system to detect website changes
|
|
- Send POST requests to `/api/webhooks/website-change`
|
|
|
|
2. **Webhook Payload**
|
|
```json
|
|
{
|
|
"websiteId": "website-uuid",
|
|
"url": "https://example.com/changed-page",
|
|
"changeType": "content_update",
|
|
"contentHash": "new-content-hash",
|
|
"metadata": {
|
|
"source": "external-system",
|
|
"timestamp": "2024-01-01T00:00:00Z"
|
|
}
|
|
}
|
|
```
|
|
|
|
## Monitoring and Troubleshooting
|
|
|
|
### Check System Status
|
|
```bash
|
|
# Test the cron endpoint
|
|
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
|
|
|
|
# Check database logs
|
|
SELECT * FROM audit_logs ORDER BY created_at DESC LIMIT 10;
|
|
```
|
|
|
|
### Common Issues
|
|
|
|
1. **Scans Not Running**
|
|
- Check cron job configuration
|
|
- Verify database connection
|
|
- Review subscription limits
|
|
|
|
2. **Change Detection Not Working**
|
|
- Ensure subscription tier supports change detection
|
|
- Check webhook endpoint accessibility
|
|
- Verify content hash computation
|
|
|
|
3. **Performance Issues**
|
|
- Monitor scan frequency
|
|
- Check database performance
|
|
- Review resource usage
|
|
|
|
## Development
|
|
|
|
### Adding New Metrics
|
|
1. Update `metric_definitions` table
|
|
2. Modify `LighthouseScanner` class
|
|
3. Update UI components
|
|
|
|
### Customizing Scan Logic
|
|
1. Modify `performScan` method in `LighthouseScanner`
|
|
2. Update `runLighthouse` simulation
|
|
3. Adjust result processing
|
|
|
|
### Extending Subscription Tiers
|
|
1. Update `getSubscriptionLimits` method
|
|
2. Modify database schema
|
|
3. Update UI components
|
|
|
|
## Security Considerations
|
|
|
|
- **Authentication**: Manual endpoints require user authentication
|
|
- **Rate Limiting**: Built-in subscription-based limits
|
|
- **Input Validation**: All webhook inputs are validated
|
|
- **Audit Logging**: All activities are logged for security
|
|
|
|
## Performance Optimization
|
|
|
|
- **Batch Processing**: Multiple websites processed efficiently
|
|
- **Error Recovery**: Failed scans don't affect the system
|
|
- **Resource Management**: Controlled resource usage
|
|
- **Caching**: Optimized database queries
|
|
|
|
## Support
|
|
|
|
For issues or questions:
|
|
1. Check the troubleshooting section
|
|
2. Review application logs
|
|
3. Verify database setup
|
|
4. Test endpoints manually
|
|
5. Check subscription configuration
|
|
|
|
## Future Enhancements
|
|
|
|
- **Real-time Notifications**: Push notifications for scan results
|
|
- **Advanced Analytics**: Detailed performance insights
|
|
- **Custom Metrics**: User-defined performance metrics
|
|
- **Integration APIs**: Third-party service integrations
|
|
- **Machine Learning**: Predictive performance analysis |