# Cron Job Setup Guide for Automatic Lighthouse Scanning This guide will help you set up automated cron jobs to run the Lighthouse scanning system for your website monitoring application. ## Overview The automatic scanning system includes: - **Scheduled Scans**: Periodic scans based on user-configured schedules - **Change Detection**: Automatic scans triggered when website content changes - **Subscription Limits**: Respects user subscription tiers and rate limits ## Prerequisites 1. **Environment Variables**: Ensure your `.env` file has the required Supabase configuration: ```env NEXT_PUBLIC_SUPABASE_URL=your_supabase_url NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key SUPABASE_SERVICE_ROLE_KEY=your_service_role_key ``` 2. **Database Setup**: Make sure all required tables are created using the `setup-database.sql` script. 3. **Deployed Application**: Your Next.js application should be deployed and accessible via HTTPS. ## Cron Job Configuration ### Option 1: Using Vercel Cron Jobs (Recommended) If you're deploying on Vercel, you can use their built-in cron job feature: 1. **Create a `vercel.json` file** in your project root: ```json { "crons": [ { "path": "/api/cron/scan?mode=all", "schedule": "0 */6 * * *" } ] } ``` 2. **Schedule Explanation**: - `0 */6 * * *` = Every 6 hours - `0 */4 * * *` = Every 4 hours - `0 */2 * * *` = Every 2 hours - `0 * * * *` = Every hour - `*/15 * * * *` = Every 15 minutes 3. **Deploy to Vercel**: The cron jobs will automatically start working after deployment. ### Option 2: Using External Cron Services #### A. Cron-job.org (Free) 1. Go to [cron-job.org](https://cron-job.org) 2. Create an account and add a new cron job 3. Set the URL to: `https://your-domain.com/api/cron/scan?mode=all` 4. Configure the schedule (recommended: every 6 hours) 5. Enable monitoring and notifications #### B. EasyCron (Free tier available) 1. Go to [easycron.com](https://easycron.com) 2. Create an account and add a new cron job 3. Set the URL to: `https://your-domain.com/api/cron/scan?mode=all` 4. Configure the schedule 5. Set up email notifications for failures #### C. GitHub Actions (Free for public repos) 1. Create `.github/workflows/cron-scan.yml`: ```yaml name: Lighthouse Scan Cron Job on: schedule: - cron: '0 */6 * * *' # Every 6 hours jobs: scan: runs-on: ubuntu-latest steps: - name: Trigger Scan run: | curl -X POST "https://your-domain.com/api/cron/scan?mode=all" ``` ### Option 3: Server Cron Jobs (VPS/Dedicated Server) If you're running on a VPS or dedicated server: 1. **SSH into your server** 2. **Edit crontab**: `crontab -e` 3. **Add the cron job**: ```bash # Run every 6 hours 0 */6 * * * curl -X POST "https://your-domain.com/api/cron/scan?mode=all" # Or run every hour 0 * * * * curl -X POST "https://your-domain.com/api/cron/scan?mode=all" ``` ## API Endpoints The cron system provides several endpoints for different scan modes: ### 1. Full Scan (Recommended for cron jobs) ``` POST /api/cron/scan?mode=all ``` - Runs both scheduled scans and change detection - Respects subscription limits - Returns scan statistics ### 2. Scheduled Scans Only ``` POST /api/cron/scan?mode=scheduled ``` - Only runs scans based on user-configured schedules - Useful for testing or specific use cases ### 3. Change Detection Only ``` POST /api/cron/scan?mode=change_detection ``` - Only checks for website changes and triggers scans - Can be run more frequently than full scans ### 4. Manual Scan Trigger ``` POST /api/cron/scan ``` - Triggers a scan for a specific website - Requires authentication - Used by the ScanScheduleManager component ## Monitoring and Logging ### 1. Check Cron Job Status You can monitor if your cron jobs are working by: 1. **Checking the API response**: ```bash curl -X POST "https://your-domain.com/api/cron/scan?mode=all" ``` 2. **Expected response**: ```json { "success": true, "message": "Scan processing completed", "statistics": { "scheduledScansProcessed": 5, "changeDetectionChecks": 10, "scansTriggered": 3, "errors": 0 } } ``` ### 2. Database Logs Check the `audit_logs` table for scan activities: ```sql SELECT * FROM audit_logs WHERE action_type IN ('scan_scheduled', 'scan_triggered', 'change_detected') ORDER BY created_at DESC LIMIT 10; ``` ### 3. Error Monitoring Set up monitoring for: - HTTP 500 errors on the cron endpoint - Database connection failures - Subscription limit violations ## Testing Your Setup ### 1. Manual Test ```bash # Test the cron endpoint manually curl -X POST "https://your-domain.com/api/cron/scan?mode=all" ``` ### 2. Check Database ```sql -- Check if scans are being created SELECT * FROM scans ORDER BY created_at DESC LIMIT 5; -- Check if scan results are being saved SELECT * FROM scan_results ORDER BY created_at DESC LIMIT 5; ``` ### 3. Monitor Logs Check your application logs for any errors or warnings related to the scanning process. ## Troubleshooting ### Common Issues 1. **Cron job not running**: - Check if the URL is accessible - Verify HTTPS is working - Check server logs for errors 2. **No scans being triggered**: - Verify database tables exist - Check subscription tier configuration - Ensure websites have scan schedules configured 3. **Rate limiting issues**: - Check subscription limits in the database - Verify the `subscription_limits` table has correct data 4. **Authentication errors**: - Verify `SUPABASE_SERVICE_ROLE_KEY` is set correctly - Check if the service role has proper permissions ### Debug Mode Enable debug logging by setting: ```env TASKMASTER_LOG_LEVEL=debug ``` This will provide more detailed logs about the scanning process. ## Security Considerations 1. **API Protection**: Consider adding authentication to the cron endpoint if needed 2. **Rate Limiting**: The system already includes subscription-based rate limiting 3. **Error Handling**: Failed scans are logged and don't affect the overall system 4. **Data Privacy**: Only scan websites that users have explicitly added ## Performance Optimization 1. **Scan Frequency**: Start with every 6 hours, adjust based on usage 2. **Batch Processing**: The system processes multiple websites in batches 3. **Error Recovery**: Failed scans are retried automatically 4. **Resource Usage**: Monitor server resources during scan execution ## Next Steps 1. **Set up the cron job** using one of the methods above 2. **Test the system** with a few websites 3. **Monitor performance** and adjust scan frequency as needed 4. **Set up alerts** for cron job failures 5. **Configure webhooks** for external change detection triggers ## Support If you encounter issues: 1. Check the troubleshooting section above 2. Review application logs 3. Verify database setup 4. Test the API endpoint manually 5. Check subscription configuration