refactor: flatten monorepo structure to backend/ frontend/ devops/

Rename subdirectories for a cleaner single-repo layout:
- website-monitoring-backend/  → backend/
- website-monitoring-frontend/ → frontend/
- website-monitoring-devops/   → devops/

Update all references in package.json scripts, CI workflows,
docker-compose, pre-commit hooks, and documentation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Dennis
2026-03-07 00:25:29 +01:00
parent 4607af8def
commit 50e25e3ee8
253 changed files with 54 additions and 51 deletions
+261
View File
@@ -0,0 +1,261 @@
# Automatic Lighthouse Scanning System
This document describes the automatic Lighthouse scanning system that has been integrated into your website monitoring application.
## Overview
The automatic scanning system provides:
- **Scheduled Scans**: Periodic scans based on user-configured schedules
- **Change Detection**: Automatic scans triggered when website content changes
- **Subscription Limits**: Respects user subscription tiers and rate limits
- **Webhook Support**: External triggers for website changes
- **Comprehensive UI**: User-friendly interface for managing scan schedules
## System Architecture
### Core Components
1. **LighthouseScanner** (`src/services/lighthouseScanner.ts`)
- Handles core scanning logic
- Manages change detection
- Enforces subscription limits
- Simulates Lighthouse scans
2. **ScanScheduler** (`src/services/scanScheduler.ts`)
- Manages scheduled scans
- Processes change detection
- Orchestrates scan execution
3. **Cron Handler** (`src/app/api/cron/scan/route.ts`)
- Main entry point for automated scans
- Supports different scan modes
- Provides scan statistics
4. **Webhook Handler** (`src/app/api/webhooks/website-change/route.ts`)
- Receives external change notifications
- Triggers high-priority scans
- Validates subscription limits
5. **ScanScheduleManager** (`src/components/dashboard/ScanScheduleManager.tsx`)
- User interface for managing scan schedules
- Displays usage statistics
- Allows manual scan triggers
## Features
### Scheduled Scanning
- **Frequency Options**: Hourly, daily, weekly, monthly
- **Device Types**: Desktop and/or mobile
- **Categories**: Performance, accessibility, SEO, best practices
- **Subscription Tiers**: Different limits per tier
### Change Detection
- **Content Hashing**: Detects changes in website content
- **Automatic Triggers**: High-priority scans when changes detected
- **Subscription Validation**: Only available for certain tiers
### Subscription Management
- **Daily Limits**: Maximum scans per day
- **Monthly Limits**: Maximum scans per month
- **Feature Access**: Different capabilities per tier
- **Usage Tracking**: Real-time usage monitoring
### Webhook Integration
- **External Triggers**: Receive change notifications from external systems
- **Validation**: Verify subscription and limits
- **Audit Logging**: Track all webhook activities
## Database Schema
The system uses several new tables:
### Core Tables
- `scans`: Main scan records
- `scan_results`: Detailed scan results
- `pages`: Website pages with content hashes
- `metric_values`: Individual metric values
- `resource_analysis`: Resource usage analysis
### Configuration Tables
- `metric_definitions`: Available metrics
- `alert_configurations`: Alert settings
- `subscription_limits`: Tier-based limits
### Audit Tables
- `audit_logs`: System activity logging
- `crawl_queue`: Crawl job queue
- `crawl_sessions`: Crawl session tracking
## API Endpoints
### Cron Endpoints
```
POST /api/cron/scan?mode=all # Full scan (scheduled + change detection)
POST /api/cron/scan?mode=scheduled # Scheduled scans only
POST /api/cron/scan?mode=change_detection # Change detection only
```
### Webhook Endpoints
```
POST /api/webhooks/website-change # External change notifications
```
### Manual Endpoints
```
POST /api/cron/scan # Manual scan trigger (authenticated)
```
## Configuration
### Environment Variables
```env
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
```
### Subscription Tiers
- **Free**: 10 scans/day, 100 scans/month
- **Pro**: 50 scans/day, 500 scans/month
- **Enterprise**: 200 scans/day, 2000 scans/month
## Usage
### Setting Up Automated Scans
1. **Deploy the Application**
```bash
# Deploy to Vercel (recommended)
vercel --prod
# Or deploy to your preferred platform
```
2. **Set Up Cron Jobs**
```bash
# Run the setup script
./scripts/setup-cron.sh
# Or follow the manual setup guide
# docs/cron-setup-guide.md
```
3. **Configure Database**
```sql
-- Run the setup script
\i setup-database.sql
```
### Managing Scan Schedules
1. **Access the Dashboard**
- Navigate to `/dashboard/websites`
- Click on a website to view details
- Find the "Scan Schedule Management" section
2. **Configure Settings**
- Toggle automatic scanning on/off
- Set scan frequency (hourly, daily, weekly, monthly)
- Choose device types (desktop, mobile)
- Select scan categories
3. **Monitor Usage**
- View daily and monthly scan usage
- Check against subscription limits
- Trigger manual scans when needed
### Webhook Integration
1. **Set Up External Monitoring**
- Configure your external system to detect website changes
- Send POST requests to `/api/webhooks/website-change`
2. **Webhook Payload**
```json
{
"websiteId": "website-uuid",
"url": "https://example.com/changed-page",
"changeType": "content_update",
"contentHash": "new-content-hash",
"metadata": {
"source": "external-system",
"timestamp": "2024-01-01T00:00:00Z"
}
}
```
## Monitoring and Troubleshooting
### Check System Status
```bash
# Test the cron endpoint
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
# Check database logs
SELECT * FROM audit_logs ORDER BY created_at DESC LIMIT 10;
```
### Common Issues
1. **Scans Not Running**
- Check cron job configuration
- Verify database connection
- Review subscription limits
2. **Change Detection Not Working**
- Ensure subscription tier supports change detection
- Check webhook endpoint accessibility
- Verify content hash computation
3. **Performance Issues**
- Monitor scan frequency
- Check database performance
- Review resource usage
## Development
### Adding New Metrics
1. Update `metric_definitions` table
2. Modify `LighthouseScanner` class
3. Update UI components
### Customizing Scan Logic
1. Modify `performScan` method in `LighthouseScanner`
2. Update `runLighthouse` simulation
3. Adjust result processing
### Extending Subscription Tiers
1. Update `getSubscriptionLimits` method
2. Modify database schema
3. Update UI components
## Security Considerations
- **Authentication**: Manual endpoints require user authentication
- **Rate Limiting**: Built-in subscription-based limits
- **Input Validation**: All webhook inputs are validated
- **Audit Logging**: All activities are logged for security
## Performance Optimization
- **Batch Processing**: Multiple websites processed efficiently
- **Error Recovery**: Failed scans don't affect the system
- **Resource Management**: Controlled resource usage
- **Caching**: Optimized database queries
## Support
For issues or questions:
1. Check the troubleshooting section
2. Review application logs
3. Verify database setup
4. Test endpoints manually
5. Check subscription configuration
## Future Enhancements
- **Real-time Notifications**: Push notifications for scan results
- **Advanced Analytics**: Detailed performance insights
- **Custom Metrics**: User-defined performance metrics
- **Integration APIs**: Third-party service integrations
- **Machine Learning**: Predictive performance analysis
+260
View File
@@ -0,0 +1,260 @@
# Cron Job Setup Guide for Automatic Lighthouse Scanning
This guide will help you set up automated cron jobs to run the Lighthouse scanning system for your website monitoring application.
## Overview
The automatic scanning system includes:
- **Scheduled Scans**: Periodic scans based on user-configured schedules
- **Change Detection**: Automatic scans triggered when website content changes
- **Subscription Limits**: Respects user subscription tiers and rate limits
## Prerequisites
1. **Environment Variables**: Ensure your `.env` file has the required Supabase configuration:
```env
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
```
2. **Database Setup**: Make sure all required tables are created using the `setup-database.sql` script.
3. **Deployed Application**: Your Next.js application should be deployed and accessible via HTTPS.
## Cron Job Configuration
### Option 1: Using Vercel Cron Jobs (Recommended)
If you're deploying on Vercel, you can use their built-in cron job feature:
1. **Create a `vercel.json` file** in your project root:
```json
{
"crons": [
{
"path": "/api/cron/scan?mode=all",
"schedule": "0 */6 * * *"
}
]
}
```
2. **Schedule Explanation**:
- `0 */6 * * *` = Every 6 hours
- `0 */4 * * *` = Every 4 hours
- `0 */2 * * *` = Every 2 hours
- `0 * * * *` = Every hour
- `*/15 * * * *` = Every 15 minutes
3. **Deploy to Vercel**: The cron jobs will automatically start working after deployment.
### Option 2: Using External Cron Services
#### A. Cron-job.org (Free)
1. Go to [cron-job.org](https://cron-job.org)
2. Create an account and add a new cron job
3. Set the URL to: `https://your-domain.com/api/cron/scan?mode=all`
4. Configure the schedule (recommended: every 6 hours)
5. Enable monitoring and notifications
#### B. EasyCron (Free tier available)
1. Go to [easycron.com](https://easycron.com)
2. Create an account and add a new cron job
3. Set the URL to: `https://your-domain.com/api/cron/scan?mode=all`
4. Configure the schedule
5. Set up email notifications for failures
#### C. GitHub Actions (Free for public repos)
1. Create `.github/workflows/cron-scan.yml`:
```yaml
name: Lighthouse Scan Cron Job
on:
schedule:
- cron: '0 */6 * * *' # Every 6 hours
jobs:
scan:
runs-on: ubuntu-latest
steps:
- name: Trigger Scan
run: |
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
```
### Option 3: Server Cron Jobs (VPS/Dedicated Server)
If you're running on a VPS or dedicated server:
1. **SSH into your server**
2. **Edit crontab**: `crontab -e`
3. **Add the cron job**:
```bash
# Run every 6 hours
0 */6 * * * curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
# Or run every hour
0 * * * * curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
```
## API Endpoints
The cron system provides several endpoints for different scan modes:
### 1. Full Scan (Recommended for cron jobs)
```
POST /api/cron/scan?mode=all
```
- Runs both scheduled scans and change detection
- Respects subscription limits
- Returns scan statistics
### 2. Scheduled Scans Only
```
POST /api/cron/scan?mode=scheduled
```
- Only runs scans based on user-configured schedules
- Useful for testing or specific use cases
### 3. Change Detection Only
```
POST /api/cron/scan?mode=change_detection
```
- Only checks for website changes and triggers scans
- Can be run more frequently than full scans
### 4. Manual Scan Trigger
```
POST /api/cron/scan
```
- Triggers a scan for a specific website
- Requires authentication
- Used by the ScanScheduleManager component
## Monitoring and Logging
### 1. Check Cron Job Status
You can monitor if your cron jobs are working by:
1. **Checking the API response**:
```bash
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
```
2. **Expected response**:
```json
{
"success": true,
"message": "Scan processing completed",
"statistics": {
"scheduledScansProcessed": 5,
"changeDetectionChecks": 10,
"scansTriggered": 3,
"errors": 0
}
}
```
### 2. Database Logs
Check the `audit_logs` table for scan activities:
```sql
SELECT * FROM audit_logs
WHERE action_type IN ('scan_scheduled', 'scan_triggered', 'change_detected')
ORDER BY created_at DESC
LIMIT 10;
```
### 3. Error Monitoring
Set up monitoring for:
- HTTP 500 errors on the cron endpoint
- Database connection failures
- Subscription limit violations
## Testing Your Setup
### 1. Manual Test
```bash
# Test the cron endpoint manually
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
```
### 2. Check Database
```sql
-- Check if scans are being created
SELECT * FROM scans ORDER BY created_at DESC LIMIT 5;
-- Check if scan results are being saved
SELECT * FROM scan_results ORDER BY created_at DESC LIMIT 5;
```
### 3. Monitor Logs
Check your application logs for any errors or warnings related to the scanning process.
## Troubleshooting
### Common Issues
1. **Cron job not running**:
- Check if the URL is accessible
- Verify HTTPS is working
- Check server logs for errors
2. **No scans being triggered**:
- Verify database tables exist
- Check subscription tier configuration
- Ensure websites have scan schedules configured
3. **Rate limiting issues**:
- Check subscription limits in the database
- Verify the `subscription_limits` table has correct data
4. **Authentication errors**:
- Verify `SUPABASE_SERVICE_ROLE_KEY` is set correctly
- Check if the service role has proper permissions
### Debug Mode
Enable debug logging by setting:
```env
TASKMASTER_LOG_LEVEL=debug
```
This will provide more detailed logs about the scanning process.
## Security Considerations
1. **API Protection**: Consider adding authentication to the cron endpoint if needed
2. **Rate Limiting**: The system already includes subscription-based rate limiting
3. **Error Handling**: Failed scans are logged and don't affect the overall system
4. **Data Privacy**: Only scan websites that users have explicitly added
## Performance Optimization
1. **Scan Frequency**: Start with every 6 hours, adjust based on usage
2. **Batch Processing**: The system processes multiple websites in batches
3. **Error Recovery**: Failed scans are retried automatically
4. **Resource Usage**: Monitor server resources during scan execution
## Next Steps
1. **Set up the cron job** using one of the methods above
2. **Test the system** with a few websites
3. **Monitor performance** and adjust scan frequency as needed
4. **Set up alerts** for cron job failures
5. **Configure webhooks** for external change detection triggers
## Support
If you encounter issues:
1. Check the troubleshooting section above
2. Review application logs
3. Verify database setup
4. Test the API endpoint manually
5. Check subscription configuration