Files
cloudlense/website-monitoring-frontend/docs/cron-setup-guide.md
T
Dennis 14a32bdc0d feat: initialize monorepo with full dev team best practices
- Unified monorepo with backend (Express), frontend (Next.js), and devops
- Backend: ESLint, Prettier, Jest tests (3 passing), health endpoint, .env.example
- Frontend: Fixed build errors, fixed all lint errors (0 remaining), tests passing
- DevOps: Docker Compose with PostgreSQL, backend, frontend + healthchecks
- CI/CD: 3 GitHub Actions workflows (backend, frontend, docker integration)
- DX: Husky pre-commit hooks with smart change detection
- Docs: Root README with architecture, CONTRIBUTING.md, PR template

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-06 00:05:50 +01:00

260 lines
6.9 KiB
Markdown

# Cron Job Setup Guide for Automatic Lighthouse Scanning
This guide will help you set up automated cron jobs to run the Lighthouse scanning system for your website monitoring application.
## Overview
The automatic scanning system includes:
- **Scheduled Scans**: Periodic scans based on user-configured schedules
- **Change Detection**: Automatic scans triggered when website content changes
- **Subscription Limits**: Respects user subscription tiers and rate limits
## Prerequisites
1. **Environment Variables**: Ensure your `.env` file has the required Supabase configuration:
```env
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
```
2. **Database Setup**: Make sure all required tables are created using the `setup-database.sql` script.
3. **Deployed Application**: Your Next.js application should be deployed and accessible via HTTPS.
## Cron Job Configuration
### Option 1: Using Vercel Cron Jobs (Recommended)
If you're deploying on Vercel, you can use their built-in cron job feature:
1. **Create a `vercel.json` file** in your project root:
```json
{
"crons": [
{
"path": "/api/cron/scan?mode=all",
"schedule": "0 */6 * * *"
}
]
}
```
2. **Schedule Explanation**:
- `0 */6 * * *` = Every 6 hours
- `0 */4 * * *` = Every 4 hours
- `0 */2 * * *` = Every 2 hours
- `0 * * * *` = Every hour
- `*/15 * * * *` = Every 15 minutes
3. **Deploy to Vercel**: The cron jobs will automatically start working after deployment.
### Option 2: Using External Cron Services
#### A. Cron-job.org (Free)
1. Go to [cron-job.org](https://cron-job.org)
2. Create an account and add a new cron job
3. Set the URL to: `https://your-domain.com/api/cron/scan?mode=all`
4. Configure the schedule (recommended: every 6 hours)
5. Enable monitoring and notifications
#### B. EasyCron (Free tier available)
1. Go to [easycron.com](https://easycron.com)
2. Create an account and add a new cron job
3. Set the URL to: `https://your-domain.com/api/cron/scan?mode=all`
4. Configure the schedule
5. Set up email notifications for failures
#### C. GitHub Actions (Free for public repos)
1. Create `.github/workflows/cron-scan.yml`:
```yaml
name: Lighthouse Scan Cron Job
on:
schedule:
- cron: '0 */6 * * *' # Every 6 hours
jobs:
scan:
runs-on: ubuntu-latest
steps:
- name: Trigger Scan
run: |
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
```
### Option 3: Server Cron Jobs (VPS/Dedicated Server)
If you're running on a VPS or dedicated server:
1. **SSH into your server**
2. **Edit crontab**: `crontab -e`
3. **Add the cron job**:
```bash
# Run every 6 hours
0 */6 * * * curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
# Or run every hour
0 * * * * curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
```
## API Endpoints
The cron system provides several endpoints for different scan modes:
### 1. Full Scan (Recommended for cron jobs)
```
POST /api/cron/scan?mode=all
```
- Runs both scheduled scans and change detection
- Respects subscription limits
- Returns scan statistics
### 2. Scheduled Scans Only
```
POST /api/cron/scan?mode=scheduled
```
- Only runs scans based on user-configured schedules
- Useful for testing or specific use cases
### 3. Change Detection Only
```
POST /api/cron/scan?mode=change_detection
```
- Only checks for website changes and triggers scans
- Can be run more frequently than full scans
### 4. Manual Scan Trigger
```
POST /api/cron/scan
```
- Triggers a scan for a specific website
- Requires authentication
- Used by the ScanScheduleManager component
## Monitoring and Logging
### 1. Check Cron Job Status
You can monitor if your cron jobs are working by:
1. **Checking the API response**:
```bash
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
```
2. **Expected response**:
```json
{
"success": true,
"message": "Scan processing completed",
"statistics": {
"scheduledScansProcessed": 5,
"changeDetectionChecks": 10,
"scansTriggered": 3,
"errors": 0
}
}
```
### 2. Database Logs
Check the `audit_logs` table for scan activities:
```sql
SELECT * FROM audit_logs
WHERE action_type IN ('scan_scheduled', 'scan_triggered', 'change_detected')
ORDER BY created_at DESC
LIMIT 10;
```
### 3. Error Monitoring
Set up monitoring for:
- HTTP 500 errors on the cron endpoint
- Database connection failures
- Subscription limit violations
## Testing Your Setup
### 1. Manual Test
```bash
# Test the cron endpoint manually
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
```
### 2. Check Database
```sql
-- Check if scans are being created
SELECT * FROM scans ORDER BY created_at DESC LIMIT 5;
-- Check if scan results are being saved
SELECT * FROM scan_results ORDER BY created_at DESC LIMIT 5;
```
### 3. Monitor Logs
Check your application logs for any errors or warnings related to the scanning process.
## Troubleshooting
### Common Issues
1. **Cron job not running**:
- Check if the URL is accessible
- Verify HTTPS is working
- Check server logs for errors
2. **No scans being triggered**:
- Verify database tables exist
- Check subscription tier configuration
- Ensure websites have scan schedules configured
3. **Rate limiting issues**:
- Check subscription limits in the database
- Verify the `subscription_limits` table has correct data
4. **Authentication errors**:
- Verify `SUPABASE_SERVICE_ROLE_KEY` is set correctly
- Check if the service role has proper permissions
### Debug Mode
Enable debug logging by setting:
```env
TASKMASTER_LOG_LEVEL=debug
```
This will provide more detailed logs about the scanning process.
## Security Considerations
1. **API Protection**: Consider adding authentication to the cron endpoint if needed
2. **Rate Limiting**: The system already includes subscription-based rate limiting
3. **Error Handling**: Failed scans are logged and don't affect the overall system
4. **Data Privacy**: Only scan websites that users have explicitly added
## Performance Optimization
1. **Scan Frequency**: Start with every 6 hours, adjust based on usage
2. **Batch Processing**: The system processes multiple websites in batches
3. **Error Recovery**: Failed scans are retried automatically
4. **Resource Usage**: Monitor server resources during scan execution
## Next Steps
1. **Set up the cron job** using one of the methods above
2. **Test the system** with a few websites
3. **Monitor performance** and adjust scan frequency as needed
4. **Set up alerts** for cron job failures
5. **Configure webhooks** for external change detection triggers
## Support
If you encounter issues:
1. Check the troubleshooting section above
2. Review application logs
3. Verify database setup
4. Test the API endpoint manually
5. Check subscription configuration