14a32bdc0d
- Unified monorepo with backend (Express), frontend (Next.js), and devops - Backend: ESLint, Prettier, Jest tests (3 passing), health endpoint, .env.example - Frontend: Fixed build errors, fixed all lint errors (0 remaining), tests passing - DevOps: Docker Compose with PostgreSQL, backend, frontend + healthchecks - CI/CD: 3 GitHub Actions workflows (backend, frontend, docker integration) - DX: Husky pre-commit hooks with smart change detection - Docs: Root README with architecture, CONTRIBUTING.md, PR template Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
260 lines
6.9 KiB
Markdown
260 lines
6.9 KiB
Markdown
# Cron Job Setup Guide for Automatic Lighthouse Scanning
|
|
|
|
This guide will help you set up automated cron jobs to run the Lighthouse scanning system for your website monitoring application.
|
|
|
|
## Overview
|
|
|
|
The automatic scanning system includes:
|
|
- **Scheduled Scans**: Periodic scans based on user-configured schedules
|
|
- **Change Detection**: Automatic scans triggered when website content changes
|
|
- **Subscription Limits**: Respects user subscription tiers and rate limits
|
|
|
|
## Prerequisites
|
|
|
|
1. **Environment Variables**: Ensure your `.env` file has the required Supabase configuration:
|
|
```env
|
|
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
|
|
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key
|
|
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
|
|
```
|
|
|
|
2. **Database Setup**: Make sure all required tables are created using the `setup-database.sql` script.
|
|
|
|
3. **Deployed Application**: Your Next.js application should be deployed and accessible via HTTPS.
|
|
|
|
## Cron Job Configuration
|
|
|
|
### Option 1: Using Vercel Cron Jobs (Recommended)
|
|
|
|
If you're deploying on Vercel, you can use their built-in cron job feature:
|
|
|
|
1. **Create a `vercel.json` file** in your project root:
|
|
```json
|
|
{
|
|
"crons": [
|
|
{
|
|
"path": "/api/cron/scan?mode=all",
|
|
"schedule": "0 */6 * * *"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
2. **Schedule Explanation**:
|
|
- `0 */6 * * *` = Every 6 hours
|
|
- `0 */4 * * *` = Every 4 hours
|
|
- `0 */2 * * *` = Every 2 hours
|
|
- `0 * * * *` = Every hour
|
|
- `*/15 * * * *` = Every 15 minutes
|
|
|
|
3. **Deploy to Vercel**: The cron jobs will automatically start working after deployment.
|
|
|
|
### Option 2: Using External Cron Services
|
|
|
|
#### A. Cron-job.org (Free)
|
|
|
|
1. Go to [cron-job.org](https://cron-job.org)
|
|
2. Create an account and add a new cron job
|
|
3. Set the URL to: `https://your-domain.com/api/cron/scan?mode=all`
|
|
4. Configure the schedule (recommended: every 6 hours)
|
|
5. Enable monitoring and notifications
|
|
|
|
#### B. EasyCron (Free tier available)
|
|
|
|
1. Go to [easycron.com](https://easycron.com)
|
|
2. Create an account and add a new cron job
|
|
3. Set the URL to: `https://your-domain.com/api/cron/scan?mode=all`
|
|
4. Configure the schedule
|
|
5. Set up email notifications for failures
|
|
|
|
#### C. GitHub Actions (Free for public repos)
|
|
|
|
1. Create `.github/workflows/cron-scan.yml`:
|
|
```yaml
|
|
name: Lighthouse Scan Cron Job
|
|
|
|
on:
|
|
schedule:
|
|
- cron: '0 */6 * * *' # Every 6 hours
|
|
|
|
jobs:
|
|
scan:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- name: Trigger Scan
|
|
run: |
|
|
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
|
|
```
|
|
|
|
### Option 3: Server Cron Jobs (VPS/Dedicated Server)
|
|
|
|
If you're running on a VPS or dedicated server:
|
|
|
|
1. **SSH into your server**
|
|
2. **Edit crontab**: `crontab -e`
|
|
3. **Add the cron job**:
|
|
```bash
|
|
# Run every 6 hours
|
|
0 */6 * * * curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
|
|
|
|
# Or run every hour
|
|
0 * * * * curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
|
|
```
|
|
|
|
## API Endpoints
|
|
|
|
The cron system provides several endpoints for different scan modes:
|
|
|
|
### 1. Full Scan (Recommended for cron jobs)
|
|
```
|
|
POST /api/cron/scan?mode=all
|
|
```
|
|
- Runs both scheduled scans and change detection
|
|
- Respects subscription limits
|
|
- Returns scan statistics
|
|
|
|
### 2. Scheduled Scans Only
|
|
```
|
|
POST /api/cron/scan?mode=scheduled
|
|
```
|
|
- Only runs scans based on user-configured schedules
|
|
- Useful for testing or specific use cases
|
|
|
|
### 3. Change Detection Only
|
|
```
|
|
POST /api/cron/scan?mode=change_detection
|
|
```
|
|
- Only checks for website changes and triggers scans
|
|
- Can be run more frequently than full scans
|
|
|
|
### 4. Manual Scan Trigger
|
|
```
|
|
POST /api/cron/scan
|
|
```
|
|
- Triggers a scan for a specific website
|
|
- Requires authentication
|
|
- Used by the ScanScheduleManager component
|
|
|
|
## Monitoring and Logging
|
|
|
|
### 1. Check Cron Job Status
|
|
|
|
You can monitor if your cron jobs are working by:
|
|
|
|
1. **Checking the API response**:
|
|
```bash
|
|
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
|
|
```
|
|
|
|
2. **Expected response**:
|
|
```json
|
|
{
|
|
"success": true,
|
|
"message": "Scan processing completed",
|
|
"statistics": {
|
|
"scheduledScansProcessed": 5,
|
|
"changeDetectionChecks": 10,
|
|
"scansTriggered": 3,
|
|
"errors": 0
|
|
}
|
|
}
|
|
```
|
|
|
|
### 2. Database Logs
|
|
|
|
Check the `audit_logs` table for scan activities:
|
|
```sql
|
|
SELECT * FROM audit_logs
|
|
WHERE action_type IN ('scan_scheduled', 'scan_triggered', 'change_detected')
|
|
ORDER BY created_at DESC
|
|
LIMIT 10;
|
|
```
|
|
|
|
### 3. Error Monitoring
|
|
|
|
Set up monitoring for:
|
|
- HTTP 500 errors on the cron endpoint
|
|
- Database connection failures
|
|
- Subscription limit violations
|
|
|
|
## Testing Your Setup
|
|
|
|
### 1. Manual Test
|
|
```bash
|
|
# Test the cron endpoint manually
|
|
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
|
|
```
|
|
|
|
### 2. Check Database
|
|
```sql
|
|
-- Check if scans are being created
|
|
SELECT * FROM scans ORDER BY created_at DESC LIMIT 5;
|
|
|
|
-- Check if scan results are being saved
|
|
SELECT * FROM scan_results ORDER BY created_at DESC LIMIT 5;
|
|
```
|
|
|
|
### 3. Monitor Logs
|
|
Check your application logs for any errors or warnings related to the scanning process.
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
1. **Cron job not running**:
|
|
- Check if the URL is accessible
|
|
- Verify HTTPS is working
|
|
- Check server logs for errors
|
|
|
|
2. **No scans being triggered**:
|
|
- Verify database tables exist
|
|
- Check subscription tier configuration
|
|
- Ensure websites have scan schedules configured
|
|
|
|
3. **Rate limiting issues**:
|
|
- Check subscription limits in the database
|
|
- Verify the `subscription_limits` table has correct data
|
|
|
|
4. **Authentication errors**:
|
|
- Verify `SUPABASE_SERVICE_ROLE_KEY` is set correctly
|
|
- Check if the service role has proper permissions
|
|
|
|
### Debug Mode
|
|
|
|
Enable debug logging by setting:
|
|
```env
|
|
TASKMASTER_LOG_LEVEL=debug
|
|
```
|
|
|
|
This will provide more detailed logs about the scanning process.
|
|
|
|
## Security Considerations
|
|
|
|
1. **API Protection**: Consider adding authentication to the cron endpoint if needed
|
|
2. **Rate Limiting**: The system already includes subscription-based rate limiting
|
|
3. **Error Handling**: Failed scans are logged and don't affect the overall system
|
|
4. **Data Privacy**: Only scan websites that users have explicitly added
|
|
|
|
## Performance Optimization
|
|
|
|
1. **Scan Frequency**: Start with every 6 hours, adjust based on usage
|
|
2. **Batch Processing**: The system processes multiple websites in batches
|
|
3. **Error Recovery**: Failed scans are retried automatically
|
|
4. **Resource Usage**: Monitor server resources during scan execution
|
|
|
|
## Next Steps
|
|
|
|
1. **Set up the cron job** using one of the methods above
|
|
2. **Test the system** with a few websites
|
|
3. **Monitor performance** and adjust scan frequency as needed
|
|
4. **Set up alerts** for cron job failures
|
|
5. **Configure webhooks** for external change detection triggers
|
|
|
|
## Support
|
|
|
|
If you encounter issues:
|
|
1. Check the troubleshooting section above
|
|
2. Review application logs
|
|
3. Verify database setup
|
|
4. Test the API endpoint manually
|
|
5. Check subscription configuration |