refactor: flatten monorepo structure to backend/ frontend/ devops/
Rename subdirectories for a cleaner single-repo layout: - website-monitoring-backend/ → backend/ - website-monitoring-frontend/ → frontend/ - website-monitoring-devops/ → devops/ Update all references in package.json scripts, CI workflows, docker-compose, pre-commit hooks, and documentation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
@@ -0,0 +1,261 @@
|
||||
# Automatic Lighthouse Scanning System
|
||||
|
||||
This document describes the automatic Lighthouse scanning system that has been integrated into your website monitoring application.
|
||||
|
||||
## Overview
|
||||
|
||||
The automatic scanning system provides:
|
||||
- **Scheduled Scans**: Periodic scans based on user-configured schedules
|
||||
- **Change Detection**: Automatic scans triggered when website content changes
|
||||
- **Subscription Limits**: Respects user subscription tiers and rate limits
|
||||
- **Webhook Support**: External triggers for website changes
|
||||
- **Comprehensive UI**: User-friendly interface for managing scan schedules
|
||||
|
||||
## System Architecture
|
||||
|
||||
### Core Components
|
||||
|
||||
1. **LighthouseScanner** (`src/services/lighthouseScanner.ts`)
|
||||
- Handles core scanning logic
|
||||
- Manages change detection
|
||||
- Enforces subscription limits
|
||||
- Simulates Lighthouse scans
|
||||
|
||||
2. **ScanScheduler** (`src/services/scanScheduler.ts`)
|
||||
- Manages scheduled scans
|
||||
- Processes change detection
|
||||
- Orchestrates scan execution
|
||||
|
||||
3. **Cron Handler** (`src/app/api/cron/scan/route.ts`)
|
||||
- Main entry point for automated scans
|
||||
- Supports different scan modes
|
||||
- Provides scan statistics
|
||||
|
||||
4. **Webhook Handler** (`src/app/api/webhooks/website-change/route.ts`)
|
||||
- Receives external change notifications
|
||||
- Triggers high-priority scans
|
||||
- Validates subscription limits
|
||||
|
||||
5. **ScanScheduleManager** (`src/components/dashboard/ScanScheduleManager.tsx`)
|
||||
- User interface for managing scan schedules
|
||||
- Displays usage statistics
|
||||
- Allows manual scan triggers
|
||||
|
||||
## Features
|
||||
|
||||
### Scheduled Scanning
|
||||
- **Frequency Options**: Hourly, daily, weekly, monthly
|
||||
- **Device Types**: Desktop and/or mobile
|
||||
- **Categories**: Performance, accessibility, SEO, best practices
|
||||
- **Subscription Tiers**: Different limits per tier
|
||||
|
||||
### Change Detection
|
||||
- **Content Hashing**: Detects changes in website content
|
||||
- **Automatic Triggers**: High-priority scans when changes detected
|
||||
- **Subscription Validation**: Only available for certain tiers
|
||||
|
||||
### Subscription Management
|
||||
- **Daily Limits**: Maximum scans per day
|
||||
- **Monthly Limits**: Maximum scans per month
|
||||
- **Feature Access**: Different capabilities per tier
|
||||
- **Usage Tracking**: Real-time usage monitoring
|
||||
|
||||
### Webhook Integration
|
||||
- **External Triggers**: Receive change notifications from external systems
|
||||
- **Validation**: Verify subscription and limits
|
||||
- **Audit Logging**: Track all webhook activities
|
||||
|
||||
## Database Schema
|
||||
|
||||
The system uses several new tables:
|
||||
|
||||
### Core Tables
|
||||
- `scans`: Main scan records
|
||||
- `scan_results`: Detailed scan results
|
||||
- `pages`: Website pages with content hashes
|
||||
- `metric_values`: Individual metric values
|
||||
- `resource_analysis`: Resource usage analysis
|
||||
|
||||
### Configuration Tables
|
||||
- `metric_definitions`: Available metrics
|
||||
- `alert_configurations`: Alert settings
|
||||
- `subscription_limits`: Tier-based limits
|
||||
|
||||
### Audit Tables
|
||||
- `audit_logs`: System activity logging
|
||||
- `crawl_queue`: Crawl job queue
|
||||
- `crawl_sessions`: Crawl session tracking
|
||||
|
||||
## API Endpoints
|
||||
|
||||
### Cron Endpoints
|
||||
```
|
||||
POST /api/cron/scan?mode=all # Full scan (scheduled + change detection)
|
||||
POST /api/cron/scan?mode=scheduled # Scheduled scans only
|
||||
POST /api/cron/scan?mode=change_detection # Change detection only
|
||||
```
|
||||
|
||||
### Webhook Endpoints
|
||||
```
|
||||
POST /api/webhooks/website-change # External change notifications
|
||||
```
|
||||
|
||||
### Manual Endpoints
|
||||
```
|
||||
POST /api/cron/scan # Manual scan trigger (authenticated)
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables
|
||||
```env
|
||||
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
|
||||
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key
|
||||
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
|
||||
```
|
||||
|
||||
### Subscription Tiers
|
||||
- **Free**: 10 scans/day, 100 scans/month
|
||||
- **Pro**: 50 scans/day, 500 scans/month
|
||||
- **Enterprise**: 200 scans/day, 2000 scans/month
|
||||
|
||||
## Usage
|
||||
|
||||
### Setting Up Automated Scans
|
||||
|
||||
1. **Deploy the Application**
|
||||
```bash
|
||||
# Deploy to Vercel (recommended)
|
||||
vercel --prod
|
||||
|
||||
# Or deploy to your preferred platform
|
||||
```
|
||||
|
||||
2. **Set Up Cron Jobs**
|
||||
```bash
|
||||
# Run the setup script
|
||||
./scripts/setup-cron.sh
|
||||
|
||||
# Or follow the manual setup guide
|
||||
# docs/cron-setup-guide.md
|
||||
```
|
||||
|
||||
3. **Configure Database**
|
||||
```sql
|
||||
-- Run the setup script
|
||||
\i setup-database.sql
|
||||
```
|
||||
|
||||
### Managing Scan Schedules
|
||||
|
||||
1. **Access the Dashboard**
|
||||
- Navigate to `/dashboard/websites`
|
||||
- Click on a website to view details
|
||||
- Find the "Scan Schedule Management" section
|
||||
|
||||
2. **Configure Settings**
|
||||
- Toggle automatic scanning on/off
|
||||
- Set scan frequency (hourly, daily, weekly, monthly)
|
||||
- Choose device types (desktop, mobile)
|
||||
- Select scan categories
|
||||
|
||||
3. **Monitor Usage**
|
||||
- View daily and monthly scan usage
|
||||
- Check against subscription limits
|
||||
- Trigger manual scans when needed
|
||||
|
||||
### Webhook Integration
|
||||
|
||||
1. **Set Up External Monitoring**
|
||||
- Configure your external system to detect website changes
|
||||
- Send POST requests to `/api/webhooks/website-change`
|
||||
|
||||
2. **Webhook Payload**
|
||||
```json
|
||||
{
|
||||
"websiteId": "website-uuid",
|
||||
"url": "https://example.com/changed-page",
|
||||
"changeType": "content_update",
|
||||
"contentHash": "new-content-hash",
|
||||
"metadata": {
|
||||
"source": "external-system",
|
||||
"timestamp": "2024-01-01T00:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Monitoring and Troubleshooting
|
||||
|
||||
### Check System Status
|
||||
```bash
|
||||
# Test the cron endpoint
|
||||
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
|
||||
|
||||
# Check database logs
|
||||
SELECT * FROM audit_logs ORDER BY created_at DESC LIMIT 10;
|
||||
```
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Scans Not Running**
|
||||
- Check cron job configuration
|
||||
- Verify database connection
|
||||
- Review subscription limits
|
||||
|
||||
2. **Change Detection Not Working**
|
||||
- Ensure subscription tier supports change detection
|
||||
- Check webhook endpoint accessibility
|
||||
- Verify content hash computation
|
||||
|
||||
3. **Performance Issues**
|
||||
- Monitor scan frequency
|
||||
- Check database performance
|
||||
- Review resource usage
|
||||
|
||||
## Development
|
||||
|
||||
### Adding New Metrics
|
||||
1. Update `metric_definitions` table
|
||||
2. Modify `LighthouseScanner` class
|
||||
3. Update UI components
|
||||
|
||||
### Customizing Scan Logic
|
||||
1. Modify `performScan` method in `LighthouseScanner`
|
||||
2. Update `runLighthouse` simulation
|
||||
3. Adjust result processing
|
||||
|
||||
### Extending Subscription Tiers
|
||||
1. Update `getSubscriptionLimits` method
|
||||
2. Modify database schema
|
||||
3. Update UI components
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- **Authentication**: Manual endpoints require user authentication
|
||||
- **Rate Limiting**: Built-in subscription-based limits
|
||||
- **Input Validation**: All webhook inputs are validated
|
||||
- **Audit Logging**: All activities are logged for security
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
- **Batch Processing**: Multiple websites processed efficiently
|
||||
- **Error Recovery**: Failed scans don't affect the system
|
||||
- **Resource Management**: Controlled resource usage
|
||||
- **Caching**: Optimized database queries
|
||||
|
||||
## Support
|
||||
|
||||
For issues or questions:
|
||||
1. Check the troubleshooting section
|
||||
2. Review application logs
|
||||
3. Verify database setup
|
||||
4. Test endpoints manually
|
||||
5. Check subscription configuration
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
- **Real-time Notifications**: Push notifications for scan results
|
||||
- **Advanced Analytics**: Detailed performance insights
|
||||
- **Custom Metrics**: User-defined performance metrics
|
||||
- **Integration APIs**: Third-party service integrations
|
||||
- **Machine Learning**: Predictive performance analysis
|
||||
@@ -0,0 +1,260 @@
|
||||
# Cron Job Setup Guide for Automatic Lighthouse Scanning
|
||||
|
||||
This guide will help you set up automated cron jobs to run the Lighthouse scanning system for your website monitoring application.
|
||||
|
||||
## Overview
|
||||
|
||||
The automatic scanning system includes:
|
||||
- **Scheduled Scans**: Periodic scans based on user-configured schedules
|
||||
- **Change Detection**: Automatic scans triggered when website content changes
|
||||
- **Subscription Limits**: Respects user subscription tiers and rate limits
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. **Environment Variables**: Ensure your `.env` file has the required Supabase configuration:
|
||||
```env
|
||||
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
|
||||
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key
|
||||
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
|
||||
```
|
||||
|
||||
2. **Database Setup**: Make sure all required tables are created using the `setup-database.sql` script.
|
||||
|
||||
3. **Deployed Application**: Your Next.js application should be deployed and accessible via HTTPS.
|
||||
|
||||
## Cron Job Configuration
|
||||
|
||||
### Option 1: Using Vercel Cron Jobs (Recommended)
|
||||
|
||||
If you're deploying on Vercel, you can use their built-in cron job feature:
|
||||
|
||||
1. **Create a `vercel.json` file** in your project root:
|
||||
```json
|
||||
{
|
||||
"crons": [
|
||||
{
|
||||
"path": "/api/cron/scan?mode=all",
|
||||
"schedule": "0 */6 * * *"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
2. **Schedule Explanation**:
|
||||
- `0 */6 * * *` = Every 6 hours
|
||||
- `0 */4 * * *` = Every 4 hours
|
||||
- `0 */2 * * *` = Every 2 hours
|
||||
- `0 * * * *` = Every hour
|
||||
- `*/15 * * * *` = Every 15 minutes
|
||||
|
||||
3. **Deploy to Vercel**: The cron jobs will automatically start working after deployment.
|
||||
|
||||
### Option 2: Using External Cron Services
|
||||
|
||||
#### A. Cron-job.org (Free)
|
||||
|
||||
1. Go to [cron-job.org](https://cron-job.org)
|
||||
2. Create an account and add a new cron job
|
||||
3. Set the URL to: `https://your-domain.com/api/cron/scan?mode=all`
|
||||
4. Configure the schedule (recommended: every 6 hours)
|
||||
5. Enable monitoring and notifications
|
||||
|
||||
#### B. EasyCron (Free tier available)
|
||||
|
||||
1. Go to [easycron.com](https://easycron.com)
|
||||
2. Create an account and add a new cron job
|
||||
3. Set the URL to: `https://your-domain.com/api/cron/scan?mode=all`
|
||||
4. Configure the schedule
|
||||
5. Set up email notifications for failures
|
||||
|
||||
#### C. GitHub Actions (Free for public repos)
|
||||
|
||||
1. Create `.github/workflows/cron-scan.yml`:
|
||||
```yaml
|
||||
name: Lighthouse Scan Cron Job
|
||||
|
||||
on:
|
||||
schedule:
|
||||
- cron: '0 */6 * * *' # Every 6 hours
|
||||
|
||||
jobs:
|
||||
scan:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Trigger Scan
|
||||
run: |
|
||||
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
|
||||
```
|
||||
|
||||
### Option 3: Server Cron Jobs (VPS/Dedicated Server)
|
||||
|
||||
If you're running on a VPS or dedicated server:
|
||||
|
||||
1. **SSH into your server**
|
||||
2. **Edit crontab**: `crontab -e`
|
||||
3. **Add the cron job**:
|
||||
```bash
|
||||
# Run every 6 hours
|
||||
0 */6 * * * curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
|
||||
|
||||
# Or run every hour
|
||||
0 * * * * curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
The cron system provides several endpoints for different scan modes:
|
||||
|
||||
### 1. Full Scan (Recommended for cron jobs)
|
||||
```
|
||||
POST /api/cron/scan?mode=all
|
||||
```
|
||||
- Runs both scheduled scans and change detection
|
||||
- Respects subscription limits
|
||||
- Returns scan statistics
|
||||
|
||||
### 2. Scheduled Scans Only
|
||||
```
|
||||
POST /api/cron/scan?mode=scheduled
|
||||
```
|
||||
- Only runs scans based on user-configured schedules
|
||||
- Useful for testing or specific use cases
|
||||
|
||||
### 3. Change Detection Only
|
||||
```
|
||||
POST /api/cron/scan?mode=change_detection
|
||||
```
|
||||
- Only checks for website changes and triggers scans
|
||||
- Can be run more frequently than full scans
|
||||
|
||||
### 4. Manual Scan Trigger
|
||||
```
|
||||
POST /api/cron/scan
|
||||
```
|
||||
- Triggers a scan for a specific website
|
||||
- Requires authentication
|
||||
- Used by the ScanScheduleManager component
|
||||
|
||||
## Monitoring and Logging
|
||||
|
||||
### 1. Check Cron Job Status
|
||||
|
||||
You can monitor if your cron jobs are working by:
|
||||
|
||||
1. **Checking the API response**:
|
||||
```bash
|
||||
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
|
||||
```
|
||||
|
||||
2. **Expected response**:
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"message": "Scan processing completed",
|
||||
"statistics": {
|
||||
"scheduledScansProcessed": 5,
|
||||
"changeDetectionChecks": 10,
|
||||
"scansTriggered": 3,
|
||||
"errors": 0
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Database Logs
|
||||
|
||||
Check the `audit_logs` table for scan activities:
|
||||
```sql
|
||||
SELECT * FROM audit_logs
|
||||
WHERE action_type IN ('scan_scheduled', 'scan_triggered', 'change_detected')
|
||||
ORDER BY created_at DESC
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
### 3. Error Monitoring
|
||||
|
||||
Set up monitoring for:
|
||||
- HTTP 500 errors on the cron endpoint
|
||||
- Database connection failures
|
||||
- Subscription limit violations
|
||||
|
||||
## Testing Your Setup
|
||||
|
||||
### 1. Manual Test
|
||||
```bash
|
||||
# Test the cron endpoint manually
|
||||
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
|
||||
```
|
||||
|
||||
### 2. Check Database
|
||||
```sql
|
||||
-- Check if scans are being created
|
||||
SELECT * FROM scans ORDER BY created_at DESC LIMIT 5;
|
||||
|
||||
-- Check if scan results are being saved
|
||||
SELECT * FROM scan_results ORDER BY created_at DESC LIMIT 5;
|
||||
```
|
||||
|
||||
### 3. Monitor Logs
|
||||
Check your application logs for any errors or warnings related to the scanning process.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Cron job not running**:
|
||||
- Check if the URL is accessible
|
||||
- Verify HTTPS is working
|
||||
- Check server logs for errors
|
||||
|
||||
2. **No scans being triggered**:
|
||||
- Verify database tables exist
|
||||
- Check subscription tier configuration
|
||||
- Ensure websites have scan schedules configured
|
||||
|
||||
3. **Rate limiting issues**:
|
||||
- Check subscription limits in the database
|
||||
- Verify the `subscription_limits` table has correct data
|
||||
|
||||
4. **Authentication errors**:
|
||||
- Verify `SUPABASE_SERVICE_ROLE_KEY` is set correctly
|
||||
- Check if the service role has proper permissions
|
||||
|
||||
### Debug Mode
|
||||
|
||||
Enable debug logging by setting:
|
||||
```env
|
||||
TASKMASTER_LOG_LEVEL=debug
|
||||
```
|
||||
|
||||
This will provide more detailed logs about the scanning process.
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **API Protection**: Consider adding authentication to the cron endpoint if needed
|
||||
2. **Rate Limiting**: The system already includes subscription-based rate limiting
|
||||
3. **Error Handling**: Failed scans are logged and don't affect the overall system
|
||||
4. **Data Privacy**: Only scan websites that users have explicitly added
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
1. **Scan Frequency**: Start with every 6 hours, adjust based on usage
|
||||
2. **Batch Processing**: The system processes multiple websites in batches
|
||||
3. **Error Recovery**: Failed scans are retried automatically
|
||||
4. **Resource Usage**: Monitor server resources during scan execution
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Set up the cron job** using one of the methods above
|
||||
2. **Test the system** with a few websites
|
||||
3. **Monitor performance** and adjust scan frequency as needed
|
||||
4. **Set up alerts** for cron job failures
|
||||
5. **Configure webhooks** for external change detection triggers
|
||||
|
||||
## Support
|
||||
|
||||
If you encounter issues:
|
||||
1. Check the troubleshooting section above
|
||||
2. Review application logs
|
||||
3. Verify database setup
|
||||
4. Test the API endpoint manually
|
||||
5. Check subscription configuration
|
||||
Reference in New Issue
Block a user