Files
Dennis 50e25e3ee8 refactor: flatten monorepo structure to backend/ frontend/ devops/
Rename subdirectories for a cleaner single-repo layout:
- website-monitoring-backend/  → backend/
- website-monitoring-frontend/ → frontend/
- website-monitoring-devops/   → devops/

Update all references in package.json scripts, CI workflows,
docker-compose, pre-commit hooks, and documentation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-07 00:25:29 +01:00

6.9 KiB

Cron Job Setup Guide for Automatic Lighthouse Scanning

This guide will help you set up automated cron jobs to run the Lighthouse scanning system for your website monitoring application.

Overview

The automatic scanning system includes:

  • Scheduled Scans: Periodic scans based on user-configured schedules
  • Change Detection: Automatic scans triggered when website content changes
  • Subscription Limits: Respects user subscription tiers and rate limits

Prerequisites

  1. Environment Variables: Ensure your .env file has the required Supabase configuration:

    NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
    NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key
    SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
    
  2. Database Setup: Make sure all required tables are created using the setup-database.sql script.

  3. Deployed Application: Your Next.js application should be deployed and accessible via HTTPS.

Cron Job Configuration

If you're deploying on Vercel, you can use their built-in cron job feature:

  1. Create a vercel.json file in your project root:

    {
      "crons": [
        {
          "path": "/api/cron/scan?mode=all",
          "schedule": "0 */6 * * *"
        }
      ]
    }
    
  2. Schedule Explanation:

    • 0 */6 * * * = Every 6 hours
    • 0 */4 * * * = Every 4 hours
    • 0 */2 * * * = Every 2 hours
    • 0 * * * * = Every hour
    • */15 * * * * = Every 15 minutes
  3. Deploy to Vercel: The cron jobs will automatically start working after deployment.

Option 2: Using External Cron Services

A. Cron-job.org (Free)

  1. Go to cron-job.org
  2. Create an account and add a new cron job
  3. Set the URL to: https://your-domain.com/api/cron/scan?mode=all
  4. Configure the schedule (recommended: every 6 hours)
  5. Enable monitoring and notifications

B. EasyCron (Free tier available)

  1. Go to easycron.com
  2. Create an account and add a new cron job
  3. Set the URL to: https://your-domain.com/api/cron/scan?mode=all
  4. Configure the schedule
  5. Set up email notifications for failures

C. GitHub Actions (Free for public repos)

  1. Create .github/workflows/cron-scan.yml:
    name: Lighthouse Scan Cron Job
    
    on:
      schedule:
        - cron: '0 */6 * * *'  # Every 6 hours
    
    jobs:
      scan:
        runs-on: ubuntu-latest
        steps:
          - name: Trigger Scan
            run: |
              curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
    

Option 3: Server Cron Jobs (VPS/Dedicated Server)

If you're running on a VPS or dedicated server:

  1. SSH into your server
  2. Edit crontab: crontab -e
  3. Add the cron job:
    # Run every 6 hours
    0 */6 * * * curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
    
    # Or run every hour
    0 * * * * curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
    

API Endpoints

The cron system provides several endpoints for different scan modes:

POST /api/cron/scan?mode=all
  • Runs both scheduled scans and change detection
  • Respects subscription limits
  • Returns scan statistics

2. Scheduled Scans Only

POST /api/cron/scan?mode=scheduled
  • Only runs scans based on user-configured schedules
  • Useful for testing or specific use cases

3. Change Detection Only

POST /api/cron/scan?mode=change_detection
  • Only checks for website changes and triggers scans
  • Can be run more frequently than full scans

4. Manual Scan Trigger

POST /api/cron/scan
  • Triggers a scan for a specific website
  • Requires authentication
  • Used by the ScanScheduleManager component

Monitoring and Logging

1. Check Cron Job Status

You can monitor if your cron jobs are working by:

  1. Checking the API response:

    curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
    
  2. Expected response:

    {
      "success": true,
      "message": "Scan processing completed",
      "statistics": {
        "scheduledScansProcessed": 5,
        "changeDetectionChecks": 10,
        "scansTriggered": 3,
        "errors": 0
      }
    }
    

2. Database Logs

Check the audit_logs table for scan activities:

SELECT * FROM audit_logs 
WHERE action_type IN ('scan_scheduled', 'scan_triggered', 'change_detected')
ORDER BY created_at DESC 
LIMIT 10;

3. Error Monitoring

Set up monitoring for:

  • HTTP 500 errors on the cron endpoint
  • Database connection failures
  • Subscription limit violations

Testing Your Setup

1. Manual Test

# Test the cron endpoint manually
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"

2. Check Database

-- Check if scans are being created
SELECT * FROM scans ORDER BY created_at DESC LIMIT 5;

-- Check if scan results are being saved
SELECT * FROM scan_results ORDER BY created_at DESC LIMIT 5;

3. Monitor Logs

Check your application logs for any errors or warnings related to the scanning process.

Troubleshooting

Common Issues

  1. Cron job not running:

    • Check if the URL is accessible
    • Verify HTTPS is working
    • Check server logs for errors
  2. No scans being triggered:

    • Verify database tables exist
    • Check subscription tier configuration
    • Ensure websites have scan schedules configured
  3. Rate limiting issues:

    • Check subscription limits in the database
    • Verify the subscription_limits table has correct data
  4. Authentication errors:

    • Verify SUPABASE_SERVICE_ROLE_KEY is set correctly
    • Check if the service role has proper permissions

Debug Mode

Enable debug logging by setting:

TASKMASTER_LOG_LEVEL=debug

This will provide more detailed logs about the scanning process.

Security Considerations

  1. API Protection: Consider adding authentication to the cron endpoint if needed
  2. Rate Limiting: The system already includes subscription-based rate limiting
  3. Error Handling: Failed scans are logged and don't affect the overall system
  4. Data Privacy: Only scan websites that users have explicitly added

Performance Optimization

  1. Scan Frequency: Start with every 6 hours, adjust based on usage
  2. Batch Processing: The system processes multiple websites in batches
  3. Error Recovery: Failed scans are retried automatically
  4. Resource Usage: Monitor server resources during scan execution

Next Steps

  1. Set up the cron job using one of the methods above
  2. Test the system with a few websites
  3. Monitor performance and adjust scan frequency as needed
  4. Set up alerts for cron job failures
  5. Configure webhooks for external change detection triggers

Support

If you encounter issues:

  1. Check the troubleshooting section above
  2. Review application logs
  3. Verify database setup
  4. Test the API endpoint manually
  5. Check subscription configuration