How We Audited Our Own WordPress SEO Plugin (In Public)
We build our own WordPress SEO plugin. We ship it, we eat our own dog food, we watch our own rankings. Earlier this month we did something most plugin developers don’t: we ran a deep technical audit of our own codebase, spanning security, performance, architecture, accessibility, and testing. We wrote down everything we found — good and bad. Then we fixed it. This post is that audit, in public, with the receipts.
We’re publishing this for two reasons. One: transparency about the quality of software we’re selling. Two: WordPress plugin security is a recurring industry problem and we want the playbook in the open. If you own a plugin of your own, this is a template for how to audit it.
The Audit Process
We ran multiple parallel audit passes across ten dimensions per plugin, both for the free and Pro codebases:
- Security — XSS, SQL injection, CSRF, sanitization, escaping, secrets handling.
- WordPress integration — hook discipline, i18n, multisite, REST conformance.
- Code quality — PHP 8.3 feature usage, complexity, duplication, dead code.
- Architecture — DI, SOLID, extensibility, storage strategy.
- Performance — autoload footprint, N+1 queries, caching.
- Compatibility — WP/PHP version floors, SEO-plugin coexistence, WooCommerce.
- UX + accessibility — WCAG 2.1 AA, ARIA, keyboard navigation.
- Testing — coverage, CI, static analysis.
- AI + external integrations — OAuth, rate limits, prompt injection.
- File-by-file forensic read — line-level review of every service.
Each pass produced a detailed report. The combined output ran to roughly 330 findings for the free plugin and 310 for Pro.
The Three Feature-Breaking Bugs We Shipped
The most uncomfortable findings were bugs that had been sitting in production, breaking real features, that nobody had reported because the features hadn’t been exercised end-to-end against real data.
>=HTML entity in a SQL query.RankingRepository::topQueries()hadWHERE observed_on >= …— an HTML-entity-encoded operator. The query failed silently at runtime; the Search Console top-queries dashboard simply never rendered. We added a CI check that greps for HTML entities in PHP source.- Wrong
accepted_argson an action hook.add_action('emnes_seo_pro_404_suggestion', …, 10, 3)was registered with 3 args, butdo_actionfired it with 4 (the confidence score). The fourth arg was silently dropped, so auto-accept of 404 suggestions never actually fired — every confidence check saw 0. - A Freemius helper-name check that was correct. Our own audit agent flagged it as a typo. It wasn’t. Worth mentioning because audit tooling — even good audit tooling — has false positives, and you still have to verify.
None of these would have surfaced without deep end-to-end testing or code audit. Unit tests don’t catch them. Linting doesn’t catch them. Running the feature once against real data does.
The Security Wins
Nine security hardenings, in priority order:
- Encrypt API keys + OAuth tokens at rest. Previously stored plaintext in
wp_options. Now encrypted via libsodium’scrypto_secretboxwith a key derived from WordPress salts. - Rate-limit every AI endpoint. Per-user transient bucket. Compromised admin cookies can no longer drain OpenAI/Anthropic credits.
- SSRF guard on AI OG image sideload. Previously downloaded any URL the AI returned. Now checks scheme, host allow-list (OpenAI CDN hosts only), rejects private IPs, sniffs MIME type, caps size at 8 MB.
- User-scope the OAuth state transient. Parallel admins running the Search Console connect flow could collide on the shared state token.
- PKCE on the Google OAuth flow. S256 challenge + verifier. Intercepted auth codes no longer redeemable without also holding the verifier.
- Revoke tokens at Google on disconnect. Previously only deleted the local option. The token stayed alive on Google’s side.
- Redirect regex DoS guard. Rejects recursion tokens (
(?R),(?0),(?-n)). Pinspcre.backtrack_limitper call. - Prompt-injection isolation. User content gets wrapped in
<user_content>delimiters, control bytes stripped, system prompt explicitly flags the delimiters as untrusted. - REST-response redaction for secrets. The
/settingsGET endpoint previously returned stored API keys. Now returns a redaction sentinel; the write path recognises the sentinel as “keep existing value”.
The Performance Wins
- Rankings table
url_hashindex. PreviouslyGROUP BY urlon aVARCHAR(2048)column — unindexable, forced filesort on 100k+ rows. - Sitemap N+1 fix.
isIndexable()triggered 11 postmeta reads per URL. Oneupdate_meta_cachecall per batch primes everything. - RedirectMatcher object cache. Previously ran
SELECT * FROM redirectson every front-end pageview. Now cached viawp_cache_*with explicit invalidation on CRUD. - Google Search Console pagination. Previously capped at 500 rows per sync. Now walks
startRowto 25,000. - Autoload=no for new options. Secret options no longer bloat the autoload payload.
The Accessibility Wins
- Settings tabs — Home/End/Arrow key support, roving tabindex, proper
role="tab"/tabpanel/aria-labelledbypairing. - Pro badges —
aria-label+ ★ glyph +forced-colorsmedia query so the badge survives Windows High-Contrast. - Bulk operations — progress state, cancel affordance (in progress).
- Competing-SEO-plugin detection — if Yoast/RankMath/AIOSEO is active, our front-end output silently defers so you don’t ship duplicate titles/canonicals/JSON-LD.
The Audit Gaps We Accepted
Not every finding becomes a fix. We deferred or rejected several:
- Async OG image generation. Would move the 60s AI call off the publish pipeline. Correct long-term, but requires queueing infrastructure (Action Scheduler) that we’d rather not adopt just for this.
- WPML / Polylang hreflang. High effort, small current user base. Tracked for a later release.
- Multisite Search Console. Each sub-site has its own GSC property, so per-site tokens are semantically correct. The “multisite GSC is broken” finding turned out to be wrong after review.
The Numbers
- 22 files modified in the free plugin, +760/-82 lines.
- 20 files modified in Pro, +688/-112 lines.
- 25 free unit tests passing, up from 21.
- 14 new Pro unit tests — previously zero.
- CI added to both plugins, running on PHP 8.3 + 8.4.
- 16 new documentation files written — architecture, settings, REST API, developer extension recipes.
What This Process Taught Us
- Agent-assisted audit is real. We ran multiple AI agents in parallel, each scoped to a specific aspect, each producing an independent report. Output quality was high enough that roughly 95% of the findings were genuine. The 5% false positives — like the Freemius helper flag — were easy to verify against the code.
- Unit tests don’t catch the feature-breaking bugs. All three of the most embarrassing bugs we found were in code paths that had no test coverage. They’d have been caught by any end-to-end test, but nobody had written one.
- Write a CI grep for HTML entities in PHP. Seriously.
- Documentation falls out of truth fastest. Our own developer-recipe docs had one recipe that was broken (suggested rebinding a service that isn’t container-bound). Catching this required reading every recipe against the actual code.
The Multi-Agent Audit Approach
We ran ten parallel audit passes per plugin, each scoped to a single concern and executed by a separate AI agent. The concerns were: security, WordPress integration, code quality, architecture, performance, compatibility, UX + accessibility, testing, AI/external integrations, and a line-level file-by-file forensic read.
Why parallel, separated agents instead of one big audit:
- Specialisation. A security-focused agent finds different issues than a performance-focused agent, even reading the same code.
- Independence. Separate runs surface contradictions. When the performance agent says “cache this” and the security agent says “don’t cache secrets”, both findings matter.
- Context capacity. Each agent reads only what’s relevant to its domain. A single combined audit hits context limits well before it finishes.
- Report quality. A focused agent produces a focused report. A general agent produces a general report.
The Finding Inventory
Total findings across both plugins: roughly 640. Breakdown by category:
| Category | Free findings | Pro findings | Examples |
|---|---|---|---|
| Security | 45 | 38 | Plaintext secret storage, SSRF, rate-limit gaps |
| WP Integration | 43 | 30 | i18n loader, multisite createTables |
| Code Quality | 55 | 28 | PHP 8.4 syntax on 8.3 baseline, duplicate importers |
| Architecture | 31 | 45 | Missing migration runner, provider boilerplate |
| Performance | 32 | 23 | N+1 queries, autoload bloat, rankings filesort |
| Compatibility | 23 | 18 | Competing-plugin detection, WC product schema |
| UX/A11y | 50 | 30+ | Tab ARIA, Pro badge contrast |
| Testing | 30 | 30+ | Zero REST tests, no HTTP seam |
| AI Providers | — | 36 | Max-tokens unset, hallucinated alt text |
| External Integrations | — | 36 | No PKCE, tokens plaintext, revoke missing |
Not every finding becomes a fix. We triaged:
- Critical — feature-breaking bugs + high-severity security. Fixed in Phase 1 (within 48 hours).
- High — security hardening, test foundation, performance regressions. Fixed in Phase 2 (within the next week).
- Medium — quality improvements, UX polish, coexistence. Fixed in Phase 3 (before public release).
- Deferred — architectural refactors without clear user value today. Tracked in the TODO, not shipped.
The SecretStore Pattern
One of our most impactful fixes: we were storing AI API keys and OAuth tokens in plaintext in the WordPress options table. A database dump would hand over every credential to anyone who read it.
The fix was a small library we now call SecretStore:
- Derive a 32-byte encryption key from WordPress salts (
AUTH_KEY+SECURE_AUTH_KEY) viasodium_crypto_generichash. - Encrypt with
sodium_crypto_secretbox— authenticated encryption with a per-encryption random nonce. - Prefix the ciphertext with a version marker (
eseo:v1:) so we can identify encrypted values and migrate legacy plaintext transparently. - On read, check the marker. If present, decrypt. If absent, return as-is (this is the migration path).
Because the key is derived from salts specific to each WordPress install, moving a database dump to another install doesn’t decrypt the tokens. The attacker has to also steal wp-config.php — which is a meaningfully higher bar.
The SSRF Guard on AI OG Image Sideload
Our AI OG image feature generates an image via OpenAI, then downloads it from the URL OpenAI returns. That URL is untrusted data — a compromised OpenAI response, a DNS poisoning attack, or a deliberate test by a penetration tester could redirect us to internal network resources.
Hardening, in layers:
- Scheme check — reject anything that isn’t
https://. Nofile://, nojavascript:, no data URIs (we handle base64 separately). - Host allow-list — only OpenAI’s documented CDN hosts (
oaidalleapiprodscus.blob.core.windows.net,files.openai.com). - DNS resolution check — resolve the host, reject any IP in private/loopback/link-local space even if the hostname is allow-listed (guards against DNS poisoning of an allowed host).
- MIME sniff — after download, verify the file is actually an image.
- Size cap — reject anything over 8 MB.
Five checks in sequence. The first four operate on metadata before any bytes cross the wire; the fifth after download. None is individually sufficient; together they make SSRF exploitation non-trivial.
Rate-Limit Primitives
Every AI endpoint in the plugin runs through a transient-backed rate limiter. The design is deliberately boring — no leaky-bucket math, no distributed state. Each endpoint specifies (max_calls_per_window, window_seconds), keyed on user_id + action.
What the limiter defends against:
- Compromised admin cookie. An attacker with session access can’t run a million generation calls in an hour and drain the AI budget.
- CSRF chain. Even if the attacker bypasses the nonce, the rate limit caps the damage.
- Accidental double-submit. An editor double-clicks Generate. The second call is rate-limited, preventing duplicate DB writes.
Current limits, after real-world usage tuning: generate-meta at 10/min, generate-alt at 30/min (lightweight), generate-alt-batch at 1/min (heavy), generate-og-image at 5/min (most expensive call in the system).
What We Didn’t Fix
Transparency about deferred work:
- Async OG image generation. Moving the 60-second AI call off the publish pipeline via Action Scheduler is correct long-term. Deferred because we don’t want to adopt Action Scheduler as a dependency just for this one feature.
- True vision alt-text. Our AI alt-text endpoint currently sends filename + title to the AI, not the actual image bytes. Results are good but fundamentally a caption from metadata, not vision. Fixing this requires per-provider vision API integration work and real cost implications — tracked for a later release.
- WPML / Polylang hreflang. High effort for a small current user base.
- Full REST controller test coverage. We shipped starter tests (14 new cases) but the full controller matrix is still incomplete.
What the Process Taught Us
- Multi-agent audit works. Each agent surfaced real findings the others missed. The 5% false-positive rate was trivial to verify.
- End-to-end tests catch what unit tests can’t. The three embarrassing bugs all lived in untested code paths. Unit tests and phpstan would never have caught them. A single smoke test against real data would have.
- Public documentation is a stability gate. We found our own docs contained one broken recipe. Writing documentation forces you to re-read your own code with fresh eyes.
- CI is cheap safety. Adding GitHub Actions that run lint + phpunit + an HTML-entity-in-SQL grep is a 20-minute project that prevents entire classes of regression.
The CI Changes That Prevent Regression
Before the audit, neither plugin had CI. Adding GitHub Actions was a 30-minute project with an outsized payoff. The pipeline runs on every push and pull request:
- PHP lint on 8.3 and 8.4. Catches syntax errors across both supported PHP versions.
- phpunit on 8.3 and 8.4. All unit tests must pass on both.
- HTML-entity-in-SQL guard. Grep across
src/for>,<,&. Any hit fails the build. - phpcs (in progress). Enforces WordPress Coding Standards on new code.
- phpstan (in progress). Static analysis at level 8.
The HTML-entity guard alone would have prevented one of the three feature-breaking bugs. A lesson for any plugin team: CI checks should encode the specific failures you’ve seen before.
Documentation as a Forcing Function
We wrote 16 new documentation files alongside the audit. Individual files covered architecture, settings, REST API, module system, extension recipes, AI providers, Search Console integration, testing.
The surprise: writing docs is a forcing function for catching latent bugs. Three specific examples:
- Writing the REST API doc forced us to verify every endpoint’s
permission_callback. One was using__return_true— we fixed it. - Writing the extension recipes forced us to actually run each recipe. Recipe #5 (override the RedirectMatcher via container rebinding) didn’t work because the matcher wasn’t container-bound by class. We updated both the recipe and the container binding.
- Writing the SecretStore architecture doc forced us to document what happens when
AUTH_KEYisn’t defined. We realised we had no graceful fallback — added one.
How the Audit Shaped Future Development Process
Beyond the one-off fixes, the audit changed how we develop. New practices we’re keeping:
- Every new feature ships with at least one unit test covering the happy path.
- Every new external HTTP call goes through a mockable seam (see
AbstractHttpAiProvider::httpPost). - Every new user-input surface gets a CI grep added to catch the specific failure mode we most fear.
- Quarterly re-audits against the same agent pipeline, tracking finding-count deltas as a stability metric.
- Public changelog entries for security fixes (with CVE-format descriptions when warranted).
Related Reading
- Audit reports themselves — shipped in the plugin repos under
audit/. - Core Web Vitals in 2026 — performance findings informed real user improvements.
Public Transparency as a Signal
Publishing this audit has one subtle benefit beyond the direct educational value: it signals something about how we build. Plugin buyers evaluating their options look for confidence signals — test coverage, changelogs, response times on issues, and increasingly, public audit trails. Most plugin vendors don’t publish. Those that do earn the benefit of the doubt.
Frequently Asked Questions
Where can I read the full audit reports?
All 24 markdown reports (ten per plugin plus four synthesis docs) are shipped inside the plugin directories themselves, under audit/. They’re in the public GitHub repos.
How long did the audit take?
Initial audit across both plugins, including report generation — about 4 hours. Implementing the Phase 1 critical fixes — about 3 hours. Phase 2/3 hardening — another 4 hours. So under two working days, end to end.
Are you planning regular re-audits?
Yes. Once per minor release, running the same agent pipeline against the changed code. The output diff between audits is a nice stability signal.
Would you audit someone else’s plugin this way?
Yes, and we’d be interested. If you own a plugin and want it audited publicly, get in touch.