Automation Center
Overview
Automation Center is the fleet-wide operations dashboard at /automation. It provides a single view of automation health, queue status, job history, and system controls across all sites — not just one.
This is distinct from the per-site Content Automation settings inside each Client Profile. Client Profile handles one site's automation configuration (topics, schedule, frequency). Automation Center is the control room that monitors everything at once — queue depth, worker status, failure rates, dead letter queue, and system mode — with admin-only controls for pausing, resuming, and managing the queue.
Hierarchy:
- /automation — Fleet-wide dashboard (this feature)
- /site/<id>/profile → Automations tab — Per-site automation settings
What It Does
System Status
The top of the dashboard shows the current state of the automation system at a glance:
| Indicator | What It Shows |
|---|---|
| Mode | Current system mode — LIVE (active), DRY-RUN (running but not publishing), SHADOW (monitoring only), CANARY (gradual rollout), OFF, or OFFLINE |
| Leader | Hostname of the active automation worker |
| Queue | Current queue depth and in-progress job count |
| DLQ | Dead letter queue depth (red warning if > 0) |
| Cap | Daily publication cap status |
Below the status row:
| Tile | What It Shows |
|---|---|
| Next Run Windows | The 3 daily publication slots (8am, 2pm, 8pm ET) with next occurrence in UTC |
| WordPress Backfill | Status and pending/processed counts for WordPress content backfill |
| WordPress Upsert | Status and 24-hour update count for WordPress content sync |
| Post Verification | Pass/fail counts for post verification checks (24h) |
| IndexNow | Submitted/failed counts for search engine submissions (24h) |
Automation Insights KPIs
Five primary KPIs displayed as real-time counters:
| KPI | What It Measures |
|---|---|
| Posts (24h) | Total posts created across all sites in the last 24 hours |
| Verify (24h) | Post verification success/failure ratio |
| IndexNow (24h) | Search engine submission success/failure ratio |
| Queue | Current queue depth (jobs waiting to run) |
| DLQ | Current dead letter queue depth (jobs that failed permanently) |
Recent Jobs
A searchable, filterable table of all automation jobs across all sites.
Columns: Job ID, Site, Status, Topic, Created, Started, Finished, Actions
Filters: - By status: All, Queued, Running, Finished, Failed - By site: All sites or a specific site - By search term (URL, site name, topic)
Features: - Auto-refreshes every 10 seconds - Paginated (up to 100 per page) - Export to CSV - Color-coded status badges: green (finished), red (failed), blue (running), amber (queued)
Dead Letter Queue (DLQ)
Jobs that have failed permanently — exceeded retry attempts or hit catastrophic errors — are moved to the dead letter queue. The DLQ prevents failed jobs from clogging the main queue while retaining them for manual inspection.
What the DLQ panel shows: - Job ID, site, topic, creation date, number of attempts, error message - Site filter and pagination - Export to CSV
Admin actions on DLQ items: - Retry — moves the job back to "queued" with attempts reset to 0 - Delete — permanently removes the job
Per-Site Automation Cards
A grid of cards showing every site with automation configured. Each card displays:
- Site name and enabled/disabled status
- Automation frequency (daily, weekly, biweekly, monthly)
- Last run timestamp and next scheduled run
- Queue depth for that site
- Color-coded status (idle, queued, running, finished, failed)
- Run Now button (admin-only) to trigger an immediate manual run
Clicking a site card navigates to that site's Client Profile → Automations tab for per-site configuration.
Queue Management (Admin-Only)
Four queue control actions available to administrators:
| Action | What It Does |
|---|---|
| Pause | Stops all job processing without clearing the queue. Sets the global_enable flag to false. |
| Resume | Restarts job processing from where it was paused. |
| Retry All Failed | Moves all failed jobs back to "queued" with attempts reset to 0. |
| Clear Pending | Permanently deletes all queued jobs. Destructive — cannot be undone. |
All admin actions are logged with the actor's email for audit trail.
Why It Matters
- Fleet-wide visibility — instead of checking automation status site by site, see every site's queue health, job status, and failure rates on one page.
- Catch failures early — the DLQ and failure rate indicators surface problems before they compound. A spike in failed jobs across multiple sites often indicates an upstream issue (API rate limits, OpenAI outages, WordPress connectivity) that needs attention.
- Admin controls prevent cascading problems — pausing the queue during an API outage stops failed jobs from piling up. Retrying after the issue resolves is one click.
- Capacity planning — queue depth, daily caps, and publication rate KPIs show whether the system is keeping up with the configured publishing schedules across all sites.
- Accountability — every admin action (pause, resume, retry, delete) is logged with the actor's identity and timestamp.
How to Use It
Monitoring Day-to-Day Operations
- Navigate to Automation Center (
/automation) from the sidebar. - Check the System Status row — confirm the mode is LIVE and the leader worker is active.
- Review the KPI tiles — Posts (24h) shows production volume; DLQ > 0 means failures need attention.
- Scan the Per-Site Cards — look for sites with failed or stuck jobs (red status).
Investigating Failures
- Check the Recent Jobs panel — filter by status "Failed" to see all failures.
- Click into a failed job to see the error message and attempt count.
- If the failure is transient (API timeout, rate limit), use Retry All Failed to re-queue.
- If the failure is permanent (bad configuration, invalid topic), check the DLQ panel.
- In the DLQ, decide per-job: Retry (if the root cause is fixed) or Delete (if the job should be discarded).
Responding to System Issues
| Scenario | Action |
|---|---|
| OpenAI API outage | Pause the queue → wait for resolution → Resume |
| High failure rate spike | Check error messages for pattern → fix root cause → Retry All Failed |
| DLQ building up | Review each item → fix site-level configuration → Retry or Delete |
| Queue growing faster than processing | Check worker health → verify leader is active → check daily caps |
| Need to stop all automation | Pause the queue (preserves all jobs for later) |
Triggering Manual Runs
- Find the target site's card in the Per-Site grid.
- Click Run Now (admin-only).
- The system inserts a job with an idempotency key — duplicate rapid clicks are prevented (one run per site per minute).
Key Settings / Options
System Modes
| Mode | Behavior | Badge Color |
|---|---|---|
| LIVE | Active, processing and publishing jobs normally | Green |
| DRY-RUN | Running pipeline but not publishing to WordPress | Amber |
| SHADOW | Monitoring only, no job execution | Blue |
| CANARY | Gradual rollout — processing a subset of sites | Blue |
| OFF | Automation disabled globally | Gray |
| OFFLINE | Worker unreachable, operations halted | Gray |
Guardrails & Safety
| Guardrail | What It Prevents |
|---|---|
| Daily caps | Per-site configurable limit (1-10 posts/day, default 1). Prevents over-publishing. |
| Circuit breaker | Trips automatically if failure rate exceeds 20% across last 20+ jobs, or if p95 job duration exceeds 120 seconds. Disables processing until manually reset. |
| Idempotency keys | Prevents duplicate manual runs — one run per site per action type per minute. |
| Rate limiting | 3-second minimum between admin actions per user. |
| Duplicate prevention | Same job cannot be re-enqueued while already in queue. |
| Topic guards | Per-site topic whitelist/blacklist prevents forbidden topics from being automated. |
Access Control
| Capability | Admin | Non-Admin |
|---|---|---|
| View dashboard, KPIs, status | Yes | Yes |
| View recent jobs and DLQ | Yes | Yes (read-only) |
| Pause / Resume queue | Yes | No |
| Retry All Failed | Yes | No |
| Clear Pending | Yes | No |
| Retry / Delete DLQ items | Yes | No |
| Run Now (per-site) | Yes | No |
| Edit runtime flags | Yes | No |
Publication Schedule
The automation worker runs on a fixed daily schedule with 3 publication windows:
| Window | Time (ET) |
|---|---|
| Morning | 8:00 AM |
| Afternoon | 2:00 PM |
| Evening | 8:00 PM |
Per-site frequency settings (daily, weekly, biweekly, monthly) determine which windows a site participates in.
Notes / Edge Cases
- Worker lock — only one automation worker runs at a time, enforced by a database lock (
automation_lockstable) with heartbeat monitoring. If the heartbeat is older than 5 minutes, the worker is considered dead. - Circuit breaker auto-trips — if the failure rate exceeds 20% or p95 duration exceeds 120 seconds, the circuit breaker trips automatically. This protects against cascading failures but means automation will stop until an admin investigates and resets.
- DLQ is not automatic cleanup — jobs in the dead letter queue stay there indefinitely until an admin manually retries or deletes them. They don't auto-expire.
- Clear Pending is destructive — unlike Pause (which preserves jobs), Clear Pending permanently deletes all queued jobs. There is no undo.
- All admin actions are logged — pause, resume, retry, delete, and manual run actions include the actor's email in the log for accountability.
- Per-site automation is configured separately — Automation Center is for monitoring and fleet-level controls. To change a site's frequency, daily cap, topics, or schedule, go to that site's Client Profile → Automations tab.