Files
cloudlense/frontend/docs/automatic-scanning-system.md
T
Dennis 50e25e3ee8 refactor: flatten monorepo structure to backend/ frontend/ devops/
Rename subdirectories for a cleaner single-repo layout:
- website-monitoring-backend/  → backend/
- website-monitoring-frontend/ → frontend/
- website-monitoring-devops/   → devops/

Update all references in package.json scripts, CI workflows,
docker-compose, pre-commit hooks, and documentation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-07 00:25:29 +01:00

7.2 KiB

Automatic Lighthouse Scanning System

This document describes the automatic Lighthouse scanning system that has been integrated into your website monitoring application.

Overview

The automatic scanning system provides:

  • Scheduled Scans: Periodic scans based on user-configured schedules
  • Change Detection: Automatic scans triggered when website content changes
  • Subscription Limits: Respects user subscription tiers and rate limits
  • Webhook Support: External triggers for website changes
  • Comprehensive UI: User-friendly interface for managing scan schedules

System Architecture

Core Components

  1. LighthouseScanner (src/services/lighthouseScanner.ts)

    • Handles core scanning logic
    • Manages change detection
    • Enforces subscription limits
    • Simulates Lighthouse scans
  2. ScanScheduler (src/services/scanScheduler.ts)

    • Manages scheduled scans
    • Processes change detection
    • Orchestrates scan execution
  3. Cron Handler (src/app/api/cron/scan/route.ts)

    • Main entry point for automated scans
    • Supports different scan modes
    • Provides scan statistics
  4. Webhook Handler (src/app/api/webhooks/website-change/route.ts)

    • Receives external change notifications
    • Triggers high-priority scans
    • Validates subscription limits
  5. ScanScheduleManager (src/components/dashboard/ScanScheduleManager.tsx)

    • User interface for managing scan schedules
    • Displays usage statistics
    • Allows manual scan triggers

Features

Scheduled Scanning

  • Frequency Options: Hourly, daily, weekly, monthly
  • Device Types: Desktop and/or mobile
  • Categories: Performance, accessibility, SEO, best practices
  • Subscription Tiers: Different limits per tier

Change Detection

  • Content Hashing: Detects changes in website content
  • Automatic Triggers: High-priority scans when changes detected
  • Subscription Validation: Only available for certain tiers

Subscription Management

  • Daily Limits: Maximum scans per day
  • Monthly Limits: Maximum scans per month
  • Feature Access: Different capabilities per tier
  • Usage Tracking: Real-time usage monitoring

Webhook Integration

  • External Triggers: Receive change notifications from external systems
  • Validation: Verify subscription and limits
  • Audit Logging: Track all webhook activities

Database Schema

The system uses several new tables:

Core Tables

  • scans: Main scan records
  • scan_results: Detailed scan results
  • pages: Website pages with content hashes
  • metric_values: Individual metric values
  • resource_analysis: Resource usage analysis

Configuration Tables

  • metric_definitions: Available metrics
  • alert_configurations: Alert settings
  • subscription_limits: Tier-based limits

Audit Tables

  • audit_logs: System activity logging
  • crawl_queue: Crawl job queue
  • crawl_sessions: Crawl session tracking

API Endpoints

Cron Endpoints

POST /api/cron/scan?mode=all          # Full scan (scheduled + change detection)
POST /api/cron/scan?mode=scheduled    # Scheduled scans only
POST /api/cron/scan?mode=change_detection  # Change detection only

Webhook Endpoints

POST /api/webhooks/website-change     # External change notifications

Manual Endpoints

POST /api/cron/scan                   # Manual scan trigger (authenticated)

Configuration

Environment Variables

NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_service_role_key

Subscription Tiers

  • Free: 10 scans/day, 100 scans/month
  • Pro: 50 scans/day, 500 scans/month
  • Enterprise: 200 scans/day, 2000 scans/month

Usage

Setting Up Automated Scans

  1. Deploy the Application

    # Deploy to Vercel (recommended)
    vercel --prod
    
    # Or deploy to your preferred platform
    
  2. Set Up Cron Jobs

    # Run the setup script
    ./scripts/setup-cron.sh
    
    # Or follow the manual setup guide
    # docs/cron-setup-guide.md
    
  3. Configure Database

    -- Run the setup script
    \i setup-database.sql
    

Managing Scan Schedules

  1. Access the Dashboard

    • Navigate to /dashboard/websites
    • Click on a website to view details
    • Find the "Scan Schedule Management" section
  2. Configure Settings

    • Toggle automatic scanning on/off
    • Set scan frequency (hourly, daily, weekly, monthly)
    • Choose device types (desktop, mobile)
    • Select scan categories
  3. Monitor Usage

    • View daily and monthly scan usage
    • Check against subscription limits
    • Trigger manual scans when needed

Webhook Integration

  1. Set Up External Monitoring

    • Configure your external system to detect website changes
    • Send POST requests to /api/webhooks/website-change
  2. Webhook Payload

    {
      "websiteId": "website-uuid",
      "url": "https://example.com/changed-page",
      "changeType": "content_update",
      "contentHash": "new-content-hash",
      "metadata": {
        "source": "external-system",
        "timestamp": "2024-01-01T00:00:00Z"
      }
    }
    

Monitoring and Troubleshooting

Check System Status

# Test the cron endpoint
curl -X POST "https://your-domain.com/api/cron/scan?mode=all"

# Check database logs
SELECT * FROM audit_logs ORDER BY created_at DESC LIMIT 10;

Common Issues

  1. Scans Not Running

    • Check cron job configuration
    • Verify database connection
    • Review subscription limits
  2. Change Detection Not Working

    • Ensure subscription tier supports change detection
    • Check webhook endpoint accessibility
    • Verify content hash computation
  3. Performance Issues

    • Monitor scan frequency
    • Check database performance
    • Review resource usage

Development

Adding New Metrics

  1. Update metric_definitions table
  2. Modify LighthouseScanner class
  3. Update UI components

Customizing Scan Logic

  1. Modify performScan method in LighthouseScanner
  2. Update runLighthouse simulation
  3. Adjust result processing

Extending Subscription Tiers

  1. Update getSubscriptionLimits method
  2. Modify database schema
  3. Update UI components

Security Considerations

  • Authentication: Manual endpoints require user authentication
  • Rate Limiting: Built-in subscription-based limits
  • Input Validation: All webhook inputs are validated
  • Audit Logging: All activities are logged for security

Performance Optimization

  • Batch Processing: Multiple websites processed efficiently
  • Error Recovery: Failed scans don't affect the system
  • Resource Management: Controlled resource usage
  • Caching: Optimized database queries

Support

For issues or questions:

  1. Check the troubleshooting section
  2. Review application logs
  3. Verify database setup
  4. Test endpoints manually
  5. Check subscription configuration

Future Enhancements

  • Real-time Notifications: Push notifications for scan results
  • Advanced Analytics: Detailed performance insights
  • Custom Metrics: User-defined performance metrics
  • Integration APIs: Third-party service integrations
  • Machine Learning: Predictive performance analysis