50e25e3ee8
Rename subdirectories for a cleaner single-repo layout: - website-monitoring-backend/ → backend/ - website-monitoring-frontend/ → frontend/ - website-monitoring-devops/ → devops/ Update all references in package.json scripts, CI workflows, docker-compose, pre-commit hooks, and documentation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7.2 KiB
7.2 KiB
Automatic Lighthouse Scanning System
This document describes the automatic Lighthouse scanning system that has been integrated into your website monitoring application.
Overview
The automatic scanning system provides:
- Scheduled Scans: Periodic scans based on user-configured schedules
- Change Detection: Automatic scans triggered when website content changes
- Subscription Limits: Respects user subscription tiers and rate limits
- Webhook Support: External triggers for website changes
- Comprehensive UI: User-friendly interface for managing scan schedules
System Architecture
Core Components
-
LighthouseScanner (
src/services/lighthouseScanner.ts)- Handles core scanning logic
- Manages change detection
- Enforces subscription limits
- Simulates Lighthouse scans
-
ScanScheduler (
src/services/scanScheduler.ts)- Manages scheduled scans
- Processes change detection
- Orchestrates scan execution
-
Cron Handler (
src/app/api/cron/scan/route.ts)- Main entry point for automated scans
- Supports different scan modes
- Provides scan statistics
-
Webhook Handler (
src/app/api/webhooks/website-change/route.ts)- Receives external change notifications
- Triggers high-priority scans
- Validates subscription limits
-
ScanScheduleManager (
src/components/dashboard/ScanScheduleManager.tsx)- User interface for managing scan schedules
- Displays usage statistics
- Allows manual scan triggers
Features
Scheduled Scanning
- Frequency Options: Hourly, daily, weekly, monthly
- Device Types: Desktop and/or mobile
- Categories: Performance, accessibility, SEO, best practices
- Subscription Tiers: Different limits per tier
Change Detection
- Content Hashing: Detects changes in website content
- Automatic Triggers: High-priority scans when changes detected
- Subscription Validation: Only available for certain tiers
Subscription Management
- Daily Limits: Maximum scans per day
- Monthly Limits: Maximum scans per month
- Feature Access: Different capabilities per tier
- Usage Tracking: Real-time usage monitoring
Webhook Integration
- External Triggers: Receive change notifications from external systems
- Validation: Verify subscription and limits
- Audit Logging: Track all webhook activities
Database Schema
The system uses several new tables:
Core Tables
scans: Main scan recordsscan_results: Detailed scan resultspages: Website pages with content hashesmetric_values: Individual metric valuesresource_analysis: Resource usage analysis
Configuration Tables
metric_definitions: Available metricsalert_configurations: Alert settingssubscription_limits: Tier-based limits
Audit Tables
audit_logs: System activity loggingcrawl_queue: Crawl job queuecrawl_sessions: Crawl session tracking
API Endpoints
Cron Endpoints
POST /api/cron/scan?mode=all # Full scan (scheduled + change detection)
POST /api/cron/scan?mode=scheduled # Scheduled scans only
POST /api/cron/scan?mode=change_detection # Change detection only
Webhook Endpoints
POST /api/webhooks/website-change # External change notifications
Manual Endpoints
POST /api/cron/scan # Manual scan trigger (authenticated)
Configuration
Environment Variables
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key
Subscription Tiers
- Free: 10 scans/day, 100 scans/month
- Pro: 50 scans/day, 500 scans/month
- Enterprise: 200 scans/day, 2000 scans/month
Usage
Setting Up Automated Scans
-
Deploy the Application
# Deploy to Vercel (recommended) vercel --prod # Or deploy to your preferred platform -
Set Up Cron Jobs
# Run the setup script ./scripts/setup-cron.sh # Or follow the manual setup guide # docs/cron-setup-guide.md -
Configure Database
-- Run the setup script \i setup-database.sql
Managing Scan Schedules
-
Access the Dashboard
- Navigate to
/dashboard/websites - Click on a website to view details
- Find the "Scan Schedule Management" section
- Navigate to
-
Configure Settings
- Toggle automatic scanning on/off
- Set scan frequency (hourly, daily, weekly, monthly)
- Choose device types (desktop, mobile)
- Select scan categories
-
Monitor Usage
- View daily and monthly scan usage
- Check against subscription limits
- Trigger manual scans when needed
Webhook Integration
-
Set Up External Monitoring
- Configure your external system to detect website changes
- Send POST requests to
/api/webhooks/website-change
-
Webhook Payload
{ "websiteId": "website-uuid", "url": "https://example.com/changed-page", "changeType": "content_update", "contentHash": "new-content-hash", "metadata": { "source": "external-system", "timestamp": "2024-01-01T00:00:00Z" } }
Monitoring and Troubleshooting
Check System Status
# Test the cron endpoint
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"
# Check database logs
SELECT * FROM audit_logs ORDER BY created_at DESC LIMIT 10;
Common Issues
-
Scans Not Running
- Check cron job configuration
- Verify database connection
- Review subscription limits
-
Change Detection Not Working
- Ensure subscription tier supports change detection
- Check webhook endpoint accessibility
- Verify content hash computation
-
Performance Issues
- Monitor scan frequency
- Check database performance
- Review resource usage
Development
Adding New Metrics
- Update
metric_definitionstable - Modify
LighthouseScannerclass - Update UI components
Customizing Scan Logic
- Modify
performScanmethod inLighthouseScanner - Update
runLighthousesimulation - Adjust result processing
Extending Subscription Tiers
- Update
getSubscriptionLimitsmethod - Modify database schema
- Update UI components
Security Considerations
- Authentication: Manual endpoints require user authentication
- Rate Limiting: Built-in subscription-based limits
- Input Validation: All webhook inputs are validated
- Audit Logging: All activities are logged for security
Performance Optimization
- Batch Processing: Multiple websites processed efficiently
- Error Recovery: Failed scans don't affect the system
- Resource Management: Controlled resource usage
- Caching: Optimized database queries
Support
For issues or questions:
- Check the troubleshooting section
- Review application logs
- Verify database setup
- Test endpoints manually
- Check subscription configuration
Future Enhancements
- Real-time Notifications: Push notifications for scan results
- Advanced Analytics: Detailed performance insights
- Custom Metrics: User-defined performance metrics
- Integration APIs: Third-party service integrations
- Machine Learning: Predictive performance analysis