HCM emergency website repair: 500/404, hacking, broken interface, payment error

When the website "falls ill" in the middle of running Ads or peak sales hours, every minute that passes is money. This article is P1 standard emergency response playbook for businesses in Ho Chi Minh City: how to identify incidents, "firefighting" procedures in the first 2 hours, data risk control and sustainable recovery roadmap.

1) Quick identification: what situation are you facing?

P1 group - must handle immediately

HTTP 500: blank page/“Internal Server Error”, log reports PHP fatal/timeout, dead backend service.
Batch HTTP 404/Soft 404: wrong redirect, slug changed after deploy/migrate, sitemap error.
Hacked/malicious code: strange popup, redirected to another domain, file .php strange, Google puts a warning "This site may be hacked".
Interface broken: CSS/JS not loading, 3rd-party script clash, theme/plugin update error.
Payment error: webhook/payment gateway timeout, double charge, not creating orders in CMS.

Signs to go with

GA4 sessions suddenly dropping; The exit rate increased abnormally.
Uptime monitor reported down, CPU/RAM skyrocketed.
Search Console increased error 5xx/404; Merchant Center rejects the product.

2) “2 golden hours” rule (maximum loss)

Freeze changes: pause all heavy deployments/plugin updates/crons.
Enable safe maintenance mode: only when required (eg data is exposed data/hack).
Snapshot of current status: export database + backup file wp-content/app before touching.
Turn on log & monitor: server log (Nginx/Apache), PHP-FPM, application (error.log), gateway log.
Priority P1 first P2: restore access, payment, Ads landing page → then optimize beautifully.

If you need a 24/7 on-call process with a clear SLA, see Ho Chi Minh website maintenance service (pillar article, full description of P1 rescue process, checklist and shift assignment) at Tan Phat Digital: Ho Chi Minh website maintenance service.

3) Incident handling process

A. HTTP 500 / white page

Enable debug & log: WP_DEBUG_LOG (WP), APP_DEBUG (Laravel), check error trace.
Fast rollback: return to the most recent build/backup if 500 after deployment.
Release account original: restart PHP-FPM, flush OPcache, check DB connection (max_connections).
Temporarily disable plugin/theme: rename the folder causing the error so the site lives first, fix the original later.

B. Bulk 404/Soft 404

Restore permalink (WP), rebuild routes (framework).
1–1 301 mapping for URLs changed after migrate/restructure.
Clean Sitemap: contains only URLs 200–indexable–canonical; resubmit GSC.

With the case of "migrate and drop traffic", handle according to P1–P3 (robots, 301 map, canonical, sitemap) all included in the monthly maintenance process (SOP for monitoring repeat errors & prevention) that the team has standardized: Monthly website maintenance process.

C. Hack/malware

Quarantine & change all passwords (hosting, DB, admin, SFTP, API).
Scan & clean: find strange files, obfuscate signatures (base64/gzinflate/eval); replace clean core, keep wp-content/uploads.
Update & patch: CMS/plugin/theme version, old anonymous plugin type, write lock wp-config.php.
WAF/CDN: enable firewall (rate limit, bot rules), block attack source IP; Turn on 2FA admin.
Request a re-review if warned in search.

D. Interface broken (CSS/JS)

Purge cache/CDN; check for 404 static, bundle clashes with version.
Rollback theme/plugin version; temporary auto-update lock.
Detach/disable conflicting scripts (chat, pixel, A/B testing) → load conditionally.

E. Payment error

Log control: webhook (200/400/500), order status in CMS, cron queue.
Fail-safe: if payment is successful but order is not created, compensate manual order + notify customer.
Increase timeout & retry: at gateway/webhook, verify SSL/TLS & IP allowlist.

4) Checklist "breathing oxygen" for commercial sites (WordPress/Woo, Shopify, Custom)

WordPress/WooCommerce platform

Turn off "strange" plugins, roll back the newly updated version.
Re-generate .htaccess/permalink.
Delete mu-plugins inject, scan wp-content/uploads for shells.
Check Woo queue (Action Scheduler) & webhook.

Shopify

Rollback theme version; Turn off newly installed app → test cart/checkout again.
Check ScriptTag/app injected into checkout.liquid (Shopify Plus).

Custom/Laravel/Next.js

Healthcheck DB/cache/queue; rollback build; check .env connection variable.
Check SSR/CSR: bundle error, route rewrite on Nginx.

5) Communicate & minimize damage (don't be silent)

Status message on banner/FAQ/FB fanpage: estimated fix time.
Pause Ads to the error page; Transfer budget to hotline/chat channel.
Customer care with pending orders: commitment to reasonable compensation (voucher/free shipping).

6) Closing the incident book: RCA & hardening (24–72 hours later)

RCA – Root Cause Analysis: timeline, root cause, impact & damage harmful.
SOPification: 500/404/hack/payment playbook according to your environment.
Hardening:
- 2FA, decentralization according to the principle of minimum.
- 3–2–1 backup: 3 copies, 2 medium, 1 off-site; test restore every month.
- Uptime & Core Web Vitals monitor.
Chaos day every month: 30-minute rollback/restore drill.

Need a “take over” team to operate periodically to prevent recurrence? You can switch to Web Maintenance package (including speed/security audit, on-duty duty, incident reporting): Web Maintenance Services .

7) SLA suggestions for “quick response team”

P1 – Site down/hack/payment error: accept shifts ≤ 15 minutes, restore basic access ≤ 120 minutes.
P2 – Interface broken, 404 small groups: ≤ 24 hours.
P3 – Speed optimization, technical SEO: 1–2 week sprint.
Communication channel: Slack/Zalo group, 24/7 on-call schedule, updated every day 30–60 minutes for P1.

8) Frequently asked questions

How long does it take to restore site down P1?
Usually 30–120 minutes if there is a nearby backup & server access. Deep data hacking/breaking can take >4 hours.

Is it possible to run Ads while patching the site?
You should pause ads on the page that is failing; keep the campaign for hotline/chat channel or landing page temporarily.

How to know if the malware is clean?
Scan signature, clean core diff, check cron/admin entry point, monitor outbound connection & re-scan after 24–48 hours.

Why after technical fix, traffic still not returned immediately?
SEO needs time for Google to crawl/index again; With sales websites, revenue will recover before overall traffic.

9) "Minimum must have" set after incident

Daily automatic backup + snapshot when deploying.
Staging required, review checklist before posting to production.
Monitoring: uptime, error log, CWV, successful payment rate.
Permissions & logs: owner, manager, dev; log edit code/configuration.

Site failures are unavoidable — but the damage can be controlled if you have a clear P1 playbook, adequate access, backup/monitoring habits, and discipline when deploying. Processing in the correct order (restore access → anti-infection → repair → harden), you will bring the system to a stable state within the first 2 hours and avoid repetition.