# Automatic Lighthouse Scanning System This document describes the automatic Lighthouse scanning system that has been integrated into your website monitoring application. ## Overview The automatic scanning system provides: - **Scheduled Scans**: Periodic scans based on user-configured schedules - **Change Detection**: Automatic scans triggered when website content changes - **Subscription Limits**: Respects user subscription tiers and rate limits - **Webhook Support**: External triggers for website changes - **Comprehensive UI**: User-friendly interface for managing scan schedules ## System Architecture ### Core Components 1. **LighthouseScanner** (`src/services/lighthouseScanner.ts`) - Handles core scanning logic - Manages change detection - Enforces subscription limits - Simulates Lighthouse scans 2. **ScanScheduler** (`src/services/scanScheduler.ts`) - Manages scheduled scans - Processes change detection - Orchestrates scan execution 3. **Cron Handler** (`src/app/api/cron/scan/route.ts`) - Main entry point for automated scans - Supports different scan modes - Provides scan statistics 4. **Webhook Handler** (`src/app/api/webhooks/website-change/route.ts`) - Receives external change notifications - Triggers high-priority scans - Validates subscription limits 5. **ScanScheduleManager** (`src/components/dashboard/ScanScheduleManager.tsx`) - User interface for managing scan schedules - Displays usage statistics - Allows manual scan triggers ## Features ### Scheduled Scanning - **Frequency Options**: Hourly, daily, weekly, monthly - **Device Types**: Desktop and/or mobile - **Categories**: Performance, accessibility, SEO, best practices - **Subscription Tiers**: Different limits per tier ### Change Detection - **Content Hashing**: Detects changes in website content - **Automatic Triggers**: High-priority scans when changes detected - **Subscription Validation**: Only available for certain tiers ### Subscription Management - **Daily Limits**: Maximum scans per day - **Monthly Limits**: Maximum scans per month - **Feature Access**: Different capabilities per tier - **Usage Tracking**: Real-time usage monitoring ### Webhook Integration - **External Triggers**: Receive change notifications from external systems - **Validation**: Verify subscription and limits - **Audit Logging**: Track all webhook activities ## Database Schema The system uses several new tables: ### Core Tables - `scans`: Main scan records - `scan_results`: Detailed scan results - `pages`: Website pages with content hashes - `metric_values`: Individual metric values - `resource_analysis`: Resource usage analysis ### Configuration Tables - `metric_definitions`: Available metrics - `alert_configurations`: Alert settings - `subscription_limits`: Tier-based limits ### Audit Tables - `audit_logs`: System activity logging - `crawl_queue`: Crawl job queue - `crawl_sessions`: Crawl session tracking ## API Endpoints ### Cron Endpoints ``` POST /api/cron/scan?mode=all # Full scan (scheduled + change detection) POST /api/cron/scan?mode=scheduled # Scheduled scans only POST /api/cron/scan?mode=change_detection # Change detection only ``` ### Webhook Endpoints ``` POST /api/webhooks/website-change # External change notifications ``` ### Manual Endpoints ``` POST /api/cron/scan # Manual scan trigger (authenticated) ``` ## Configuration ### Environment Variables ```env NEXT_PUBLIC_SUPABASE_URL=your_supabase_url NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key SUPABASE_SERVICE_ROLE_KEY=your_service_role_key ``` ### Subscription Tiers - **Free**: 10 scans/day, 100 scans/month - **Pro**: 50 scans/day, 500 scans/month - **Enterprise**: 200 scans/day, 2000 scans/month ## Usage ### Setting Up Automated Scans 1. **Deploy the Application** ```bash # Deploy to Vercel (recommended) vercel --prod # Or deploy to your preferred platform ``` 2. **Set Up Cron Jobs** ```bash # Run the setup script ./scripts/setup-cron.sh # Or follow the manual setup guide # docs/cron-setup-guide.md ``` 3. **Configure Database** ```sql -- Run the setup script \i setup-database.sql ``` ### Managing Scan Schedules 1. **Access the Dashboard** - Navigate to `/dashboard/websites` - Click on a website to view details - Find the "Scan Schedule Management" section 2. **Configure Settings** - Toggle automatic scanning on/off - Set scan frequency (hourly, daily, weekly, monthly) - Choose device types (desktop, mobile) - Select scan categories 3. **Monitor Usage** - View daily and monthly scan usage - Check against subscription limits - Trigger manual scans when needed ### Webhook Integration 1. **Set Up External Monitoring** - Configure your external system to detect website changes - Send POST requests to `/api/webhooks/website-change` 2. **Webhook Payload** ```json { "websiteId": "website-uuid", "url": "https://example.com/changed-page", "changeType": "content_update", "contentHash": "new-content-hash", "metadata": { "source": "external-system", "timestamp": "2024-01-01T00:00:00Z" } } ``` ## Monitoring and Troubleshooting ### Check System Status ```bash # Test the cron endpoint curl -X POST "https://your-domain.com/api/cron/scan?mode=all" # Check database logs SELECT * FROM audit_logs ORDER BY created_at DESC LIMIT 10; ``` ### Common Issues 1. **Scans Not Running** - Check cron job configuration - Verify database connection - Review subscription limits 2. **Change Detection Not Working** - Ensure subscription tier supports change detection - Check webhook endpoint accessibility - Verify content hash computation 3. **Performance Issues** - Monitor scan frequency - Check database performance - Review resource usage ## Development ### Adding New Metrics 1. Update `metric_definitions` table 2. Modify `LighthouseScanner` class 3. Update UI components ### Customizing Scan Logic 1. Modify `performScan` method in `LighthouseScanner` 2. Update `runLighthouse` simulation 3. Adjust result processing ### Extending Subscription Tiers 1. Update `getSubscriptionLimits` method 2. Modify database schema 3. Update UI components ## Security Considerations - **Authentication**: Manual endpoints require user authentication - **Rate Limiting**: Built-in subscription-based limits - **Input Validation**: All webhook inputs are validated - **Audit Logging**: All activities are logged for security ## Performance Optimization - **Batch Processing**: Multiple websites processed efficiently - **Error Recovery**: Failed scans don't affect the system - **Resource Management**: Controlled resource usage - **Caching**: Optimized database queries ## Support For issues or questions: 1. Check the troubleshooting section 2. Review application logs 3. Verify database setup 4. Test endpoints manually 5. Check subscription configuration ## Future Enhancements - **Real-time Notifications**: Push notifications for scan results - **Advanced Analytics**: Detailed performance insights - **Custom Metrics**: User-defined performance metrics - **Integration APIs**: Third-party service integrations - **Machine Learning**: Predictive performance analysis