Files
cloudlense/frontend/docs/automatic-scanning-system.md
Dennis 50e25e3ee8 refactor: flatten monorepo structure to backend/ frontend/ devops/
Rename subdirectories for a cleaner single-repo layout:
- website-monitoring-backend/  → backend/
- website-monitoring-frontend/ → frontend/
- website-monitoring-devops/   → devops/

Update all references in package.json scripts, CI workflows,
docker-compose, pre-commit hooks, and documentation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-07 00:25:29 +01:00

261 lines
7.2 KiB
Markdown

# Automatic Lighthouse Scanning System
This document describes the automatic Lighthouse scanning system that has been integrated into your website monitoring application.
## Overview
The automatic scanning system provides:
- **Scheduled Scans**: Periodic scans based on user-configured schedules
- **Change Detection**: Automatic scans triggered when website content changes
- **Subscription Limits**: Respects user subscription tiers and rate limits
- **Webhook Support**: External triggers for website changes
- **Comprehensive UI**: User-friendly interface for managing scan schedules
## System Architecture
### Core Components
1. **LighthouseScanner** (`src/services/lighthouseScanner.ts`)
- Handles core scanning logic
- Manages change detection
- Enforces subscription limits
- Simulates Lighthouse scans
2. **ScanScheduler** (`src/services/scanScheduler.ts`)
- Manages scheduled scans
- Processes change detection
- Orchestrates scan execution
3. **Cron Handler** (`src/app/api/cron/scan/route.ts`)
- Main entry point for automated scans
- Supports different scan modes
- Provides scan statistics
4. **Webhook Handler** (`src/app/api/webhooks/website-change/route.ts`)
- Receives external change notifications
- Triggers high-priority scans
- Validates subscription limits
5. **ScanScheduleManager** (`src/components/dashboard/ScanScheduleManager.tsx`)
- User interface for managing scan schedules
- Displays usage statistics
- Allows manual scan triggers
## Features
### Scheduled Scanning
- **Frequency Options**: Hourly, daily, weekly, monthly
- **Device Types**: Desktop and/or mobile
- **Categories**: Performance, accessibility, SEO, best practices
- **Subscription Tiers**: Different limits per tier
### Change Detection
- **Content Hashing**: Detects changes in website content
- **Automatic Triggers**: High-priority scans when changes detected
- **Subscription Validation**: Only available for certain tiers
### Subscription Management
- **Daily Limits**: Maximum scans per day
- **Monthly Limits**: Maximum scans per month
- **Feature Access**: Different capabilities per tier
- **Usage Tracking**: Real-time usage monitoring
### Webhook Integration
- **External Triggers**: Receive change notifications from external systems
- **Validation**: Verify subscription and limits
- **Audit Logging**: Track all webhook activities
## Database Schema
The system uses several new tables:
### Core Tables
- `scans`: Main scan records
- `scan_results`: Detailed scan results
- `pages`: Website pages with content hashes
- `metric_values`: Individual metric values
- `resource_analysis`: Resource usage analysis
### Configuration Tables
- `metric_definitions`: Available metrics
- `alert_configurations`: Alert settings
- `subscription_limits`: Tier-based limits
### Audit Tables
- `audit_logs`: System activity logging
- `crawl_queue`: Crawl job queue
- `crawl_sessions`: Crawl session tracking
## API Endpoints
### Cron Endpoints
```
POST /api/cron/scan?mode=all # Full scan (scheduled + change detection)
POST /api/cron/scan?mode=scheduled # Scheduled scans only
POST /api/cron/scan?mode=change_detection # Change detection only
```
### Webhook Endpoints
```
POST /api/webhooks/website-change # External change notifications
```
### Manual Endpoints
```
POST /api/cron/scan # Manual scan trigger (authenticated)
```
## Configuration
### Environment Variables
```env
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
```
### Subscription Tiers
- **Free**: 10 scans/day, 100 scans/month
- **Pro**: 50 scans/day, 500 scans/month
- **Enterprise**: 200 scans/day, 2000 scans/month
## Usage
### Setting Up Automated Scans
1. **Deploy the Application**
```bash
# Deploy to Vercel (recommended)
vercel --prod
# Or deploy to your preferred platform
```
2. **Set Up Cron Jobs**
```bash
# Run the setup script
./scripts/setup-cron.sh
# Or follow the manual setup guide
# docs/cron-setup-guide.md
```
3. **Configure Database**
```sql
-- Run the setup script
\i setup-database.sql
```
### Managing Scan Schedules
1. **Access the Dashboard**
- Navigate to `/dashboard/websites`
- Click on a website to view details
- Find the "Scan Schedule Management" section
2. **Configure Settings**
- Toggle automatic scanning on/off
- Set scan frequency (hourly, daily, weekly, monthly)
- Choose device types (desktop, mobile)
- Select scan categories
3. **Monitor Usage**
- View daily and monthly scan usage
- Check against subscription limits
- Trigger manual scans when needed
### Webhook Integration
1. **Set Up External Monitoring**
- Configure your external system to detect website changes
- Send POST requests to `/api/webhooks/website-change`
2. **Webhook Payload**
```json
{
"websiteId": "website-uuid",
"url": "https://example.com/changed-page",
"changeType": "content_update",
"contentHash": "new-content-hash",
"metadata": {
"source": "external-system",
"timestamp": "2024-01-01T00:00:00Z"
}
}
```
## Monitoring and Troubleshooting
### Check System Status
```bash
# Test the cron endpoint
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
# Check database logs
SELECT * FROM audit_logs ORDER BY created_at DESC LIMIT 10;
```
### Common Issues
1. **Scans Not Running**
- Check cron job configuration
- Verify database connection
- Review subscription limits
2. **Change Detection Not Working**
- Ensure subscription tier supports change detection
- Check webhook endpoint accessibility
- Verify content hash computation
3. **Performance Issues**
- Monitor scan frequency
- Check database performance
- Review resource usage
## Development
### Adding New Metrics
1. Update `metric_definitions` table
2. Modify `LighthouseScanner` class
3. Update UI components
### Customizing Scan Logic
1. Modify `performScan` method in `LighthouseScanner`
2. Update `runLighthouse` simulation
3. Adjust result processing
### Extending Subscription Tiers
1. Update `getSubscriptionLimits` method
2. Modify database schema
3. Update UI components
## Security Considerations
- **Authentication**: Manual endpoints require user authentication
- **Rate Limiting**: Built-in subscription-based limits
- **Input Validation**: All webhook inputs are validated
- **Audit Logging**: All activities are logged for security
## Performance Optimization
- **Batch Processing**: Multiple websites processed efficiently
- **Error Recovery**: Failed scans don't affect the system
- **Resource Management**: Controlled resource usage
- **Caching**: Optimized database queries
## Support
For issues or questions:
1. Check the troubleshooting section
2. Review application logs
3. Verify database setup
4. Test endpoints manually
5. Check subscription configuration
## Future Enhancements
- **Real-time Notifications**: Push notifications for scan results
- **Advanced Analytics**: Detailed performance insights
- **Custom Metrics**: User-defined performance metrics
- **Integration APIs**: Third-party service integrations
- **Machine Learning**: Predictive performance analysis