XML Sitemaps in WordPress: What to Include, What to Skip
XML sitemaps are one of the oldest, most misunderstood pieces of WordPress SEO. Every plugin generates them. Almost no one reads them. Google does, and what you put in — and more importantly, what you leave out — has meaningful consequences for crawl budget, index coverage, and how quickly your new content shows up in search.
Here’s the 2026 take: what an XML sitemap actually does, which URLs belong in yours, which ones should never appear, and how to ship a sitemap that plays well with WordPress, WooCommerce, and multisite.
What a Sitemap Does (And Doesn’t Do)
An XML sitemap is a structured list of URLs you’d like search engines to know about, with optional metadata — last-modified date, change frequency, priority. Googlebot uses it as a hint, not a command: URLs in your sitemap get crawled faster, and URLs missing from your sitemap still get crawled (from internal + external links), but less aggressively.
What sitemaps don’t do:
- Force indexing. Google decides what to index based on quality, not on sitemap presence.
- Control ranking. The
<priority>field has been ignored since ~2017. - Replace robots.txt. Sitemaps tell bots what exists; robots.txt tells them what not to visit.
What Belongs in Your Sitemap
Only URLs you actually want crawled and ranked. That list is smaller than most WordPress sites realise:
- Published posts and pages
- Category and tag archives (if you’ve optimised them into real landing pages — otherwise, leave them out)
- Custom post types that represent meaningful content (products, case studies, documentation)
- Author archives, but only if your site has more than one author and each author has a real presence
- Key image URLs (via an image sitemap for image-heavy sites)
What Does Not Belong in Your Sitemap
Removing these is often the single highest-leverage sitemap fix:
- Noindexed URLs. If you’ve told Google not to index a page, don’t also tell it to crawl the URL. Pick one story.
- Thin category/tag pages. A tag archive with one post isn’t a landing page, it’s a duplicate.
- Attachment pages. WordPress generates one per upload by default. These are near-empty and dilute your crawl budget.
- Search result pages.
/?s=queryis not content. - Login, register, admin, and account URLs. Nobody should search for these.
- Canonical-elsewhere URLs. If a page points
rel="canonical"at another URL, the canonical URL is what belongs in the sitemap.
The Right Sitemap Structure for WordPress
Google officially recommends a sitemap index that points to type-specific sitemaps rather than one monolithic file. WordPress plugins that do this well serve:
/sitemap.xml— an index that lists the child sitemaps below/post-sitemap.xml— all posts/page-sitemap.xml— all pages/taxonomy-sitemap.xml— categories + tags (when worth indexing)/image-sitemap.xml— media attachments with valid alt text/product-sitemap.xml— WooCommerce products, if applicable
Each child sitemap should cap at 50,000 URLs or 50 MB (uncompressed). For sites larger than that — publishing platforms, large e-commerce stores — paginate each type (/post-sitemap-1.xml, -2.xml, etc.).
News + Video Sitemaps: The Specialised Formats
Two extensions matter for specific site types:
- Google News sitemap. Only for sites approved for Google News. Must list articles from the last 2 days, with
<news:publication_date>and<news:title>per entry. If you’re not a news publisher, skip this. - Video sitemap. For sites with embedded video content — courses, demos, documentation. Gives Google the duration, thumbnail, and description it needs for video-rich snippets.
Our Emnes SEO Pro plugin ships both extensions behind toggle settings — off by default, on when your site actually needs them.
WordPress’s Core Sitemap vs. Plugin Sitemaps
Since WordPress 5.5, core ships a built-in sitemap at /wp-sitemap.xml. It covers the basics — posts, pages, users — but it’s aggressively minimalist. It doesn’t emit image sitemaps, news sitemaps, or custom post type filtering beyond public/private.
If you install a dedicated SEO plugin — Yoast, Rank Math, AIOSEO, or Emnes SEO — disable the core sitemap to avoid duplicate feeds. Most SEO plugins do this automatically. Verify with:
curl -I https://yoursite.com/wp-sitemap.xml
A 404 or 301 means you’re good. A 200 means you have two sitemaps active.
Submitting Your Sitemap to Search Engines
You only need to tell each search engine about your sitemap once:
- Google Search Console — Sitemaps panel. Enter
sitemap.xmland click Submit. Check back in 48 hours to confirm it’s been fetched. - Bing Webmaster Tools — same flow.
- robots.txt — add the line
Sitemap: https://yoursite.com/sitemap.xml. Every major bot reads it.
DuckDuckGo, Kagi, and Neeva read Bing’s index. You don’t need to submit separately.
Debugging a Sitemap That Isn’t Working
- Search Console says “couldn’t fetch”. Verify the URL returns 200 and is valid XML. Test with
curl -s https://yoursite.com/sitemap.xml | xmllint --noout -. - URLs are submitted but not indexed. Check the Index Coverage report. Most common cause is a
noindexmeta tag contradicting the sitemap’s inclusion. - Sitemap is being fetched but shows 0 URLs. Your plugin is rendering an empty index. Check for HTML-entity-encoded operators in SQL or broken schema flags — we saw this in our own audit.
- Duplicate sitemaps are competing. Core sitemap + plugin sitemap both live. Disable one.
The Real Job of a Sitemap: Crawl Budget Efficiency
Every site has a crawl budget — an implicit cap on how much Googlebot is willing to request per day. For small sites it’s invisible. For sites with more than about 10,000 URLs it becomes decisive: if crawl budget is spent on pages you don’t care about, the pages you do care about wait longer for reindexing.
A clean sitemap directs crawl budget where you want it. Three techniques, in ascending order of importance:
- Include only canonical URLs. If
?utm_source=xadds a tracking query, the canonical is still the clean URL. Sitemaps should list the clean version only. - Use
lastmodaccurately. A sitemap that ships the current timestamp on every URL every day trains Google to ignore the field. Only updatelastmodwhen the content actually changed. - Split into type-specific child sitemaps. Googlebot fetches each child independently and adjusts fetch frequency per child. A fast-changing blog sitemap can get fetched hourly; a slow-changing pages sitemap gets fetched weekly.
The Taxonomy Sitemap Trap
Category and tag archives are the biggest crawl-budget sink on most WordPress sites. WordPress generates an archive for every category, every tag, every author, every year/month/day. A site with 500 posts and 300 tags produces 800+ archive URLs before you add anything.
Decision tree for whether a taxonomy belongs in the sitemap:
- Does it have more than 3 posts? If no, it’s thin content. Noindex or merge with a larger archive.
- Does it have unique intro copy / a custom description? If yes, it’s a landing page. Include.
- Is it the way users actually browse your site? If yes, include.
- Does it just recycle post previews with no added value? If yes, noindex.
Most WordPress sites should noindex tag archives by default and selectively index categories. The exception: content sites where category pages have been genuinely curated as topic hubs.
Image Sitemaps: Quietly Lucrative
Image search is a ~20% chunk of Google’s query traffic that most WordPress teams ignore. An image sitemap lists every indexable image on the site with metadata — caption, title, geolocation, license — that helps Images pick which image to show for which query.
A useful image sitemap entry looks like:
<url>
<loc>https://example.com/post/</loc>
<image:image>
<image:loc>https://example.com/app/uploads/hero.jpg</image:loc>
<image:caption>Screenshot of the WordPress admin dashboard showing the SEO settings panel</image:caption>
<image:title>WordPress SEO settings</image:title>
</image:image>
</url>
The caption is the most important field. It should be a full descriptive sentence, not the filename. Good captions feed directly into image-search ranking; bad ones (“IMG_4823.jpg”) contribute nothing.
Large-Scale Sitemap Strategy: Pagination and Sharding
Sitemaps cap at 50,000 URLs or 50 MB uncompressed. Sites that exceed these limits need a sitemap index that points to multiple child sitemaps. The canonical structure for a large WordPress site:
/sitemap.xml— index/post-sitemap-1.xml,/post-sitemap-2.xml, …/page-sitemap.xml(usually one file)/product-sitemap-1.xml,/product-sitemap-2.xml, …/image-sitemap-1.xml,/image-sitemap-2.xml, …
Order pages within each child sitemap by recency (most recent first). Googlebot fetches the first few URLs of each sitemap more aggressively — put your most important content there.
Multisite Sitemaps: Per-Site vs Network-Wide
Multisite WordPress installs can ship sitemaps two ways:
- Per-site sitemaps. Each subsite has its own
/sitemap.xml. Right default for most multisite installs where sites target different audiences. - Consolidated network sitemap. A single
/sitemap.xmlat the network level listing all subsites’ content. Right for content-repository installs where one brand publishes across many subdomains.
Our Emnes SEO Pro plugin ships a multisite consolidation mode behind a toggle. Default is per-site.
Sitemap Ping vs Passive Discovery
Google deprecated the ?pinger ping endpoint in June 2023. Plugins that still wget google.com/ping?sitemap=... are wasting a request.
The two remaining discovery methods:
- Manual submission via Search Console.
- Passive discovery via robots.txt
Sitemap:directive.
After the initial submission, Googlebot refetches the sitemap on its own schedule — typically once per 24 hours for active sites, once per 3–7 days for quieter sites.
The Search Console Sitemap Status Codes
The Sitemaps report in Search Console reports one of four states per submitted sitemap:
| Status | What it means | Action |
|---|---|---|
| Success | Fetched and parsed, all URLs valid | None |
| Couldn’t fetch | HTTP error, timeout, or invalid XML | Verify sitemap URL returns 200 with valid XML |
| Pending | Submitted but not yet processed | Wait 48 hours |
| Has errors | Parsed but contains invalid entries | Check “see details” for specific errors |
“Couldn’t fetch” is the most common error, and the most common cause is a sitemap that renders HTML (WordPress error page) instead of XML. Test with curl -I and check the Content-Type header — it should be application/xml.
Sitemap Hygiene: A Monthly Checklist
- Verify
/sitemap.xmlstill returns 200 OK and valid XML. - Check Search Console’s Sitemaps report for errors.
- Compare the “URLs submitted” number to the “URLs indexed” number. A growing gap indicates content issues, not sitemap issues.
- Spot-check three random URLs from the sitemap — do they all return 200, or has one been trashed without being removed from the feed?
- Confirm no URLs marked
noindexappear in the sitemap. - Confirm canonicals align: sitemap URLs point at the URLs you want ranked.
IndexNow: The Sitemap Alternative That’s Gaining Traction
IndexNow is a push-based discovery protocol developed by Microsoft and Yandex, with participation from Bing, Seznam, Naver, and Yep. Instead of waiting for crawlers to discover new content, you notify the search engine directly when a URL is added, updated, or deleted.
How it works:
- Generate a key file at your site root (e.g.,
/a1b2c3d4.txt) containing the same key. - Whenever a URL changes, POST to
https://www.bing.com/indexnow?url=...&key=a1b2c3d4. - Bing, Yandex, and partners fetch the URL quickly.
Google hasn’t joined IndexNow, so it’s not a replacement for sitemaps. It’s a complement for Bing + other search-engine visibility. Yoast and Rank Math both ship IndexNow integration on their paid tiers.
RSS Feeds vs Sitemaps: Different Purposes
WordPress ships both RSS feeds and XML sitemaps. They serve different audiences:
- RSS — reader-facing, designed for feed aggregators (Feedly, Inoreader), newsletter tools (Mailchimp), and internal republishing.
- XML sitemap — crawler-facing, designed for search engines.
Google can discover content via RSS as a secondary signal but sitemaps remain authoritative. Don’t substitute one for the other.
Canonical URLs vs Sitemap URLs: Must Align
A common inconsistency: the <link rel="canonical"> tag points at one URL, and the sitemap lists a different URL for the same content. Google doesn’t always side with the sitemap — sometimes the canonical wins, sometimes neither.
Rule: sitemap URLs must match canonicals. If a page canonicalises to a different URL, that different URL is what belongs in the sitemap.
Related Reading
- Schema Markup in WordPress — structured data that pairs with sitemap discovery.
- WordPress Redirect Manager Guide — how 301 redirects interact with sitemap entries.
- Google Search Console Handbook — where to submit and monitor sitemaps.
Discovery vs Indexing: Different Problems
A sitemap solves discovery. It does not solve indexing. Discovery is “Google knows this URL exists”. Indexing is “Google chose to include it in the search index after evaluating its quality”. The two often diverge: Search Console’s Sitemaps report might show 1,200 URLs submitted but only 800 indexed. The 400 gap isn’t a sitemap problem — it’s a content quality, internal linking, or canonical problem that needs addressing separately.
Per-Post-Type Sitemap Filters
Custom post types frequently ship default-included in sitemaps even when they shouldn’t be. Examples from real WordPress sites:
- Team member post types that are referenced on one “About” page but individually noindexed.
- Testimonial post types that appear as sliders rather than standalone pages.
- Event post types where expired events are kept for archival but shouldn’t re-enter search.
Every good SEO plugin exposes per-post-type inclusion toggles. Audit them when setting up a new site.
Sitemap Compression
XML sitemaps compress exceptionally well — typical gzip ratios are 85-90%. Google supports gzipped sitemaps (/sitemap.xml.gz) but modern HTTP compression (Content-Encoding: gzip) on an uncompressed URL accomplishes the same thing with less plumbing. If your server emits gzip or br on .xml responses (most do by default), you don’t need separate .xml.gz files.
Sitemaps for News and Publishing Sites
News sites that qualify for Google News get a dedicated News sitemap that must be refreshed whenever a new article publishes. The shape: include only articles from the last 2 days, include <news:publication_date> in ISO 8601, include <news:title>. Articles roll off after 2 days; only the most recent window is ever live. This is deliberate — News indexing rewards recency and the sitemap format enforces it.
Sitemaps and Large-Scale WooCommerce
WooCommerce stores with 10,000+ products hit sitemap scale concerns that blogs rarely do. The pattern that works at scale: paginate product sitemaps into /product-sitemap-1.xml through -N.xml, with 1000 products per file. This keeps each sitemap under 1MB even with verbose product fields, and lets Google process them in parallel.
Out-of-stock products are a tricky category. Recently-out-of-stock products should stay in the sitemap for a week or two (capture existing rankings). Long-term out-of-stock should be either noindexed or 410’d — an empty-stock page isn’t worth Google’s crawl budget.
robots.txt and Sitemap Interaction
Sitemap-listed URLs that are blocked by robots.txt are a red flag in Search Console’s Index Coverage report. Fix by picking one story: if the URL is blocked, remove it from the sitemap. If it’s in the sitemap, unblock it.
Sitemap Rendering Performance
On large WordPress sites the sitemap itself can be a performance problem. A sitemap that queries tens of thousands of posts on every fetch will time out. Plugins that solve this well cache the generated XML in a transient, regenerate on content changes, and paginate large URL sets.
Our Emnes SEO plugin caches every sitemap fragment in a transient keyed by type and page number. The cache invalidates on save_post, deleted_post, term changes, and module toggle updates. Typical generation time drops from 400-800ms per fetch to 10-30ms on a cached hit — the difference between “Googlebot times out occasionally” and “sitemap always loads instantly”.
Frequently Asked Questions
Do I need to resubmit my sitemap after every post?
No. Google refetches on its own schedule (typically daily for active sites). Resubmitting manually does nothing beyond the first time.
Should I add my sitemap to robots.txt?
Yes. A one-line Sitemap: directive in robots.txt is the one place every search engine looks by default.
What’s the ideal changefreq value?
Leave it out entirely. Google has publicly said it ignores changefreq and priority. Sitemap tooling that still emits them wastes bytes.
Can one sitemap URL appear in multiple sitemaps?
Technically yes, practically no. A well-structured site uses one sitemap per content type with no overlap.