feat: production hardening + smart subpage scanning with layout dedup

Security:
- Add CRON_SECRET auth to /api/cron/* endpoints
- Add admin role verification to /api/admin/* routes
- Add org membership check to /api/billing/usage
- Add security headers (HSTS, X-Frame-Options, CSP, etc.)
- Add env variable validation at startup
- Add rate limiting to backend API (30 req/min per IP)

Infrastructure:
- Multi-stage Dockerfiles with non-root user + healthchecks
- Updated cron workflow to pass CRON_SECRET header
- Updated .env.example with all optional vars

Smart subpage scanning:
- Crawler now computes template_hash (DOM structure without content)
- Scanner scans ALL unique-layout pages, not just main page
- Pages with same layout (e.g. product pages) scanned only once
- Deduplication by template_hash, fallback to content_hash
- Main page always scanned with high priority
- Re-checks subscription limits before each page scan

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Dennis
2026-03-06 07:44:32 +01:00
parent d8de0a973a
commit 1c545c93b4
18 changed files with 498 additions and 59 deletions
@@ -26,7 +26,7 @@ jobs:
DEPLOYMENT_URL="${DEPLOYMENT_URL:-https://your-domain.com}"
echo "Running uptime checks at: $DEPLOYMENT_URL/api/cron/uptime"
response=$(curl -s -w "\n%{http_code}" "$DEPLOYMENT_URL/api/cron/uptime")
response=$(curl -s -w "\n%{http_code}" -H "Authorization: Bearer $CRON_SECRET" "$DEPLOYMENT_URL/api/cron/uptime")
http_code=$(echo "$response" | tail -n1)
response_body=$(echo "$response" | head -n -1)
@@ -41,6 +41,7 @@ jobs:
fi
env:
DEPLOYMENT_URL: ${{ secrets.DEPLOYMENT_URL }}
CRON_SECRET: ${{ secrets.CRON_SECRET }} CRON_SECRET: ${{ secrets.CRON_SECRET }}
scan:
runs-on: ubuntu-latest
@@ -51,7 +52,7 @@ jobs:
DEPLOYMENT_URL="${DEPLOYMENT_URL:-https://your-domain.com}"
echo "Triggering scan at: $DEPLOYMENT_URL/api/cron/scan?mode=all"
response=$(curl -s -w "\n%{http_code}" -X POST "$DEPLOYMENT_URL/api/cron/scan?mode=all")
response=$(curl -s -w "\n%{http_code}" -X POST -H "Authorization: Bearer $CRON_SECRET" "$DEPLOYMENT_URL/api/cron/scan?mode=all")
http_code=$(echo "$response" | tail -n1)
response_body=$(echo "$response" | head -n -1)