Website Down When Running Ads: P1 Processing Process Within 2 Hours

webdesignSeptember 8, 2025·#Web Design

Website "collapses" while running Ads? This article guides the P1 treatment process in 2 hours: restore minimum access, reduce budget burn, close technical holes and create a prevention runbook.

Website Down When Running Ads: P1 Processing Process Within 2 Hours

When the website "suddenly falls ill" while burning its advertising budget, the damage is not only the burned Ads money, but also loss of orders, loss of reputation and internal turmoil. This satellite article guides the P1 troubleshooting process in 2 hours - practical, concise, with checklists - so that your team can calmly restore the website as quickly as possible, while reducing advertising budget loss.

This is a supplementary article to the main article about maintenance services in HCM. If you need the full roadmap (SLA, process, tools), see Website maintenance service Ho Chi Minh – Pillar Page.

Typical situation when "ads are down while running"

Sudden load due to strong pumping by TVC/Influencer/Performance → CPU/RAM/DB "burning red", rows wait for full connection, web time-out.
Hurry update (plugin/theme/core) before running campaign → conflict, PHP/JS error, blank page.
Weak infrastructure: cache/CDN not optimized, DB no index/caching, no autoscale.
Attacked DDoS/Layer 7 right at the “falling point” – suspicion from competitors/botnets.
Payment/cart problems: gateway timeout, webhook failure → users cannot pay, lost revenue.

P1 target in 2 hours

Access recovery (minimum page homepage/landing/checkout) at a "sufficient" level.
Reduce budget burn: adjust Ads channel to temporary landing, reduce budget, or take a controlled pause.
Isolate the cause & prevent recurrence within the incident time frame.
Short post-mortem report within 24 hours to learn test.

120-minute (minute-by-minute) process

T-00 → T-15: QUICK REVIEW & NOTIFICATION

Enable friendly maintenance page (503 + retry-after) or static failover landing (HTML + form/CTA), limited bounce.
Internal notification (Marketing/CS/Sales/Founder) in one sentence: “P1 – site downtime, ETA 120’ – updates every 15’.”
Quick check:
- TTFB/uptime (StatusCake/UptimeRobot).
- CPU/RAM/IO/DB connections (Cloud/Hosting dashboard) BREATHING”
  - Suspected high load: enable/tighten CDN/WAF, cache page, reduce TTL; temporarily turn off heavy queries (search, filter).
  - Suspected code conflict: rollback latest version (Git/Backup), turn off suspect plugin.
  - Suspected DDoS Layer 7: turn on Under Attack Mode (Cloudflare), block abnormal IP/ASN, heavy endpoint rate-limit (/search, /cart, /checkout).
  - DB congestion: flush slow query cache, increase max_connections temporarily, restart service if necessary.
  T-30 → T-60: MINIMIZE RECOVERY & REDUCE COSTS ADS
  - Reopen route for sale/landing/checkout page first; The blog/introduction part can be left for later.
  - Marketing:
    - If the site is not stable: transfer budget to static landing (AMP/static HTML, CDN) or backup lead form (Google Form/Typeform).
    - Reduce 30–70% of budget for teams that are running strong fire; Temporarily turn off poor quality placements.
    - Update Customer Service: “The system is undergoing urgent maintenance, please order via hotline/inbox”.
  - Quick QA: desktop/mobile, login/register/cart/checkout (sandbox), lead form, pixel/GA4/UTM.
  T-60 → T-90: STABILIZED & HARDIFIED
  - Additional protection:
    - Cache full page HTML (except cart/checkout).
    - Defer/async JS, disable unnecessary scripts.
    - Severe feature limitations: live search, comparison, related suggestions.
  - Infrastructure:
    - Temporarily increase vCPU CPU/RAM, enable autoscale if available.
    - Move media to CDN if available not yet.
  - DB:
    - Optimize folding index for frequently queried tables (orders, posts, products).
    - Turn on object cache (Redis/Memcached).
  T-90 → T-120: TRY LIGHT LOAD & REDISCRIMINATION ADS ADS
  - Light load test (main user journey scenario 5–10 req/s).
  - Reallocation of ads: small groups, gradually increasing according to load threshold.
  - Incident logging: timeline – preliminary cause – applied changes – backlog to be done later.
  Role set & communication channels (5 people are enough)
  - Incident Lead (DevOps/Tech Lead): technical decision making, 15’ updates.
  - Web Engineer: rollback, hotfix, feature off/on, technical QA techniques.
  - Performance/Marketing: adjust Ads/landing, customer service messages.
  - CS/CRM: Receive leads/orders manually, reassure customers, synthesize feedback.
  - Stakeholder (PM/Founder): approve public messages & prioritize sources force.
  Channel: 1 general chat group (Slack/Telegram), 1 task board (Trello/Jira) → avoid information chaos.
  Quick combat checklists
  A. Technical (restore in 2 hours)
  - Enable 503 or static landing (with CTA).
  - CDN/WAF: Under Attack, rate-limit, bot fight.
  - Rollback code/plug-in, disable newly updated stuff.
  - Reopen semi-important pages; turn off heavy functions.
  - Redis/Memcached + page cache full; Reduce DNS/CDN TTL.
  - DB: increase temporary connection, optimize slow queries, restart if locked.
  - QA checkout/form/pixel/GA4/UTM.
  B. Marketing/CS (reduce budget burn)
  - Adjust campaign budget, temporarily turn off risk group.
  - Redirect to static landing/backup form.
  - Brief notice on fanpage/CS: urgent maintenance underway.
  - Manual lead collection (hotline/inbox) – re-enter CRM later.
  C. After stabilization (24h)
  - Post-mortem: root cause (RCA) + preventive actions.
  - Infrastructure optimization plan (cache/CDN/WAF/autoscale).
  - Restore drill schedule & safety updates.
  To understand the overall maintenance landscape by month (calendar, cadence, report), refer to Monthly website maintenance process and strategic perspective in articles about maintenance & after-sales.
  6 common root causes & how to "plug the hole"
  1. Code/plug-in conflicts right before the campaign
    Prevention: staging → QA → deployment with checklist; freeze code 48 hours before time Monitor: hit/miss cache, TTFB.
    Fix: turn on full page cache, move media to CDN, reduce scripting.
  2. DB congestion - slow queries
    Prevention: add index, query plan, pagination, caching.
    Monitor: slow query log, max connections.
    Fix: optimize queries quickly heavy, increase resources, split replicas if any.
  3. DDoS/Layer 7 at the right time
    Prevention: WAF, bot management, threshold per IP/ASN.
    Monitor: unusual spikes by country/ASN.
    Fix: Under Attack, challenge, block range.
  Sample of communication message (short – reassuring – with alternative)
  “Sorry for the inconvenience! The sudden increase in traffic caused the system to be temporarily overloaded. The technical team is recovering within 120 minutes. You can quickly order at the backup link or text inbox/Hotline 09xx… for immediate support. Thank you for your understanding!”
  After the incident: 7 things to do to no longer have P1
  1. before the campaign 48–72h.
  2. Staging required + QA checklist, with 1-click rollback.
  3. CDN + full page cache + object cache actually works (hit/miss measurement).
  4. WAF/Bot management preset for peak hours.
  5. Plan autoscale (at least temporary scale-up).
  6. DR/Backup: snapshot before the campaign; restore drill periodically.
  7. Runbook “P1-Ads”: who does what, which channel, which message – post it immediately in the war-room.
  When should you outsource a maintenance team?
  - You don't have on-call 24/7, or don't have one DevOps.