- Unified monorepo with backend (Express), frontend (Next.js), and devops - Backend: ESLint, Prettier, Jest tests (3 passing), health endpoint, .env.example - Frontend: Fixed build errors, fixed all lint errors (0 remaining), tests passing - DevOps: Docker Compose with PostgreSQL, backend, frontend + healthchecks - CI/CD: 3 GitHub Actions workflows (backend, frontend, docker integration) - DX: Husky pre-commit hooks with smart change detection - Docs: Root README with architecture, CONTRIBUTING.md, PR template Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
6.9 KiB
Cron Job Setup Guide for Automatic Lighthouse Scanning
This guide will help you set up automated cron jobs to run the Lighthouse scanning system for your website monitoring application.
Overview
The automatic scanning system includes:
- Scheduled Scans: Periodic scans based on user-configured schedules
- Change Detection: Automatic scans triggered when website content changes
- Subscription Limits: Respects user subscription tiers and rate limits
Prerequisites
-
Environment Variables: Ensure your
.envfile has the required Supabase configuration:NEXT_PUBLIC_SUPABASE_URL=your_supabase_url NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key SUPABASE_SERVICE_ROLE_KEY=your_service_role_key -
Database Setup: Make sure all required tables are created using the
setup-database.sqlscript. -
Deployed Application: Your Next.js application should be deployed and accessible via HTTPS.
Cron Job Configuration
Option 1: Using Vercel Cron Jobs (Recommended)
If you're deploying on Vercel, you can use their built-in cron job feature:
-
Create a
vercel.jsonfile in your project root:{ "crons": [ { "path": "/api/cron/scan?mode=all", "schedule": "0 */6 * * *" } ] } -
Schedule Explanation:
0 */6 * * *= Every 6 hours0 */4 * * *= Every 4 hours0 */2 * * *= Every 2 hours0 * * * *= Every hour*/15 * * * *= Every 15 minutes
-
Deploy to Vercel: The cron jobs will automatically start working after deployment.
Option 2: Using External Cron Services
A. Cron-job.org (Free)
- Go to cron-job.org
- Create an account and add a new cron job
- Set the URL to:
https://your-domain.com/api/cron/scan?mode=all - Configure the schedule (recommended: every 6 hours)
- Enable monitoring and notifications
B. EasyCron (Free tier available)
- Go to easycron.com
- Create an account and add a new cron job
- Set the URL to:
https://your-domain.com/api/cron/scan?mode=all - Configure the schedule
- Set up email notifications for failures
C. GitHub Actions (Free for public repos)
- Create
.github/workflows/cron-scan.yml:name: Lighthouse Scan Cron Job on: schedule: - cron: '0 */6 * * *' # Every 6 hours jobs: scan: runs-on: ubuntu-latest steps: - name: Trigger Scan run: | curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
Option 3: Server Cron Jobs (VPS/Dedicated Server)
If you're running on a VPS or dedicated server:
- SSH into your server
- Edit crontab:
crontab -e - Add the cron job:
# Run every 6 hours 0 */6 * * * curl -X POST "https://your-domain.com/api/cron/scan?mode=all" # Or run every hour 0 * * * * curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
API Endpoints
The cron system provides several endpoints for different scan modes:
1. Full Scan (Recommended for cron jobs)
POST /api/cron/scan?mode=all
- Runs both scheduled scans and change detection
- Respects subscription limits
- Returns scan statistics
2. Scheduled Scans Only
POST /api/cron/scan?mode=scheduled
- Only runs scans based on user-configured schedules
- Useful for testing or specific use cases
3. Change Detection Only
POST /api/cron/scan?mode=change_detection
- Only checks for website changes and triggers scans
- Can be run more frequently than full scans
4. Manual Scan Trigger
POST /api/cron/scan
- Triggers a scan for a specific website
- Requires authentication
- Used by the ScanScheduleManager component
Monitoring and Logging
1. Check Cron Job Status
You can monitor if your cron jobs are working by:
-
Checking the API response:
curl -X POST "https://your-domain.com/api/cron/scan?mode=all" -
Expected response:
{ "success": true, "message": "Scan processing completed", "statistics": { "scheduledScansProcessed": 5, "changeDetectionChecks": 10, "scansTriggered": 3, "errors": 0 } }
2. Database Logs
Check the audit_logs table for scan activities:
SELECT * FROM audit_logs
WHERE action_type IN ('scan_scheduled', 'scan_triggered', 'change_detected')
ORDER BY created_at DESC
LIMIT 10;
3. Error Monitoring
Set up monitoring for:
- HTTP 500 errors on the cron endpoint
- Database connection failures
- Subscription limit violations
Testing Your Setup
1. Manual Test
# Test the cron endpoint manually
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
2. Check Database
-- Check if scans are being created
SELECT * FROM scans ORDER BY created_at DESC LIMIT 5;
-- Check if scan results are being saved
SELECT * FROM scan_results ORDER BY created_at DESC LIMIT 5;
3. Monitor Logs
Check your application logs for any errors or warnings related to the scanning process.
Troubleshooting
Common Issues
-
Cron job not running:
- Check if the URL is accessible
- Verify HTTPS is working
- Check server logs for errors
-
No scans being triggered:
- Verify database tables exist
- Check subscription tier configuration
- Ensure websites have scan schedules configured
-
Rate limiting issues:
- Check subscription limits in the database
- Verify the
subscription_limitstable has correct data
-
Authentication errors:
- Verify
SUPABASE_SERVICE_ROLE_KEYis set correctly - Check if the service role has proper permissions
- Verify
Debug Mode
Enable debug logging by setting:
TASKMASTER_LOG_LEVEL=debug
This will provide more detailed logs about the scanning process.
Security Considerations
- API Protection: Consider adding authentication to the cron endpoint if needed
- Rate Limiting: The system already includes subscription-based rate limiting
- Error Handling: Failed scans are logged and don't affect the overall system
- Data Privacy: Only scan websites that users have explicitly added
Performance Optimization
- Scan Frequency: Start with every 6 hours, adjust based on usage
- Batch Processing: The system processes multiple websites in batches
- Error Recovery: Failed scans are retried automatically
- Resource Usage: Monitor server resources during scan execution
Next Steps
- Set up the cron job using one of the methods above
- Test the system with a few websites
- Monitor performance and adjust scan frequency as needed
- Set up alerts for cron job failures
- Configure webhooks for external change detection triggers
Support
If you encounter issues:
- Check the troubleshooting section above
- Review application logs
- Verify database setup
- Test the API endpoint manually
- Check subscription configuration