Next.js
An image of a sitemap's code with the next.js logo superimposed over top to describe the contents of this post on advanced sitemaps with Next.js and the pages router.

Continuing to develop our website with Next.js has led us to continually optimize it as well. One area of optimization remained the connection between this site (what you're reading this blog post on) and our CMS. Next.js promises a lot, and on most fronts it delivers, but one area didn't come with all the batteries included for optimal performance; SEO.

When you're operating a headless CMS, a lot happens exclusively at the build time for the client application. The original build of our site used next-sitemap to generate sitemaps during the postbuild, and that worked really well, we rarely modified the config file in the first 2 years. Eventually, we realized that new content published in the CMS, while instantly available from Next.js via dynamic routing, would not be included in our sitemaps until the next build.

Additionally, we discovered that advanced sitemap features (e.g. video tags) needed to be added for our site's pages and media to be indexed thoroughly.

This post will explain how we implemented both into our Next.js application using the Pages router and next-sitemap.

Adding Sitemap Extensions with next-sitemap

Our blog features many video posts and we want those to be indexed correctly by the search engines. Google defines 3 sitemap extensions; news, video, and images which are used to inform the crawler about additional information hosted on your website's pages. We'll focus on video since that's the extension we needed to solve for and there's already an example on next-sitemap for how to add news extensions.

Through some trial, error, and reading the source code I learned the correct shape for adding the video extension to a urlset was this:

{
  videos: [
    {
      title: post.title,
      description: post.description,
      contentLoc: {
        href: `${post.url}#VideoElementAnchor`,
      },
      publicationDate: new Date(post.date).toISOString(),
      thumbnailLoc: {href: post.video_thumbnail.url},
    },
  ],
}

According to Google's documentation, a video extension is: "required to provide either a <video:content_loc> or <video:player_loc> tag. [Google] recommend[s] that your provide the <video:content_loc> tag, if possible."

When done correctly, a URL containing a video extension will output like this in your sitemap:

<url>
    <loc>https://cuttlesoft.com/blog/pycon-italia-keynote/</loc>
    <lastmod>2024-08-22T14:53:49.308Z</lastmod>
    <changefreq>daily</changefreq>
    <priority>0.7</priority>
    <video:video>
        <video:title>PyCon Italia Keynote &#45; Cuttlesoft&#44; Custom Software Developers</video:title>
        <video:thumbnail_loc>https://static.cuttlesoft.com/wp-content/uploads/17205147/featured-image.png</video:thumbnail_loc>
        <video:description>Join Emily Morehouse in her inspirational keynote at PyCon Italia 2023&#44; where she shares insights on passion&#44; risk&#45;taking&#44; and reinvention&#46;</video:description>
        <video:content_loc>https://cuttlesoft.com/blog/pycon-italia-keynote/#FeaturedVideo</video:content_loc>
        <video:publication_date>2023-07-31T16:45:00.000-06:00</video:publication_date>
        <video:live>no</video:live>
    </video:video>
</url>

So how did we achieve this?

In next-sitemap, you have an optional transformation function "which runs for each relative-path in the sitemap". This is an async function that goes in your next-sitemap.config.js, it will start out looking something like this:

/** @type {import('next-sitemap').IConfig} */

module.exports = {
  siteUrl: 'https://cuttlesoft.com/',
  generateRobotsTxt: true,
  // other configuration rules

  transform: async (config, path) => {
      return {
         loc: path,
         changefreq: config.changefreq,
         priority: config.priority,
         lastmod: new Date().toISOString(),
      }
}

Disclaimer: what we're about to do is not very performant, but in this situation, we don't care because it only adds a trivial amount of time to the postbuild action of building our Next.js site. That said, if you have thousands or tens of thousands of URLs to loop through, then you're mileage will vary.

Next, we'll add a GET request to fetch all of our site's posts from the API so that we can retrieve some metadata about them and determine if they contain a video or not. Here's an example JSON response for our video-formatted posts:

[
    {
        "id": 17850,
        "date": "2023-07-31T16:45:00",
        "modified": "2024-08-21T18:24:31",
        "slug": "pycon-italia-keynote-reflections-on-passion-risk-taking-and-re-invention",
        "link": "https://cuttlesoft.com/blog/2023/07/31/pycon-italia-keynote-reflections-on-passion-risk-taking-and-re-invention/",
        "title": {
            "rendered": "PyCon Italia Keynote: Reflections on Passion, Risk-Taking, and Re-Invention"
        },
        "content": {},
        "format": "video"
        "video_thumbnail": {
            "url": "https://cuttlesoft.com/uploads/2023/07/31/pycon-italia-keynote-thumbnail.jpg"
        }
    },
    {...},
]

We've fetched our video posts from the CMS's API, so let's add this to our transformation function from before and then insert video extensions to the correct URLs when we generate our sitemap.

transform: async (config, path) => {
  const response = await fetch(`${process.env.CMS_API_URL}/posts?format=video`)
  const posts = await response.json()

  const videoPostIndex = posts.findIndex((post) => {
    const postUrl = new URL(post.link)
    return postUrl.pathname === `${path}`
  })

  return {
    loc: path,
    changefreq: config.changefreq,
    priority: config.priority,
    lastmod: new Date().toISOString(),
    ...(videoPostIndex > -1
      ? {
          videos: [
            {
              title: posts[videoPostIndex].title,
              description: posts[videoPostIndex].description,
              contentLoc: {
                href: `${path}#FeaturedVideo`,
              },
              publicationDate: new Date(posts[videoPostIndex].date).toISOString(),
              thumbnailLoc: {href: posts[videoPostIndex].video_thumbnail.url},
              live: false,
            },
          ],
        }
      : {}),
  }
},

Let me explain what's going on here:

  1. we're fetching the video posts every time next-sitemap iterates over a path
  2. we then check if the current path matches any of the URLs in the array of posts, returning the index of the matching post object with posts.findIndex
  3. if the path has a match, then videoPostIndex will return the index of that post from our posts array, otherwise it returns -1
  4. when a match is found, we insert the videos extension for the current path, adding our video metadata to the sitemap's output (see Google's Video Sitemap Reference for all required and supported tags)

Et voila, now when we run our postbuild with next-sitemap, the paths that match our video formatted posts will receive video extensions to their urlset in our generated sitemap!

Now that we've covered sitemap extensions, let's move on to creating dynamic sitemaps.

Dynamic Sitemaps with the Next.js Pages Router

Next.js' official documentation already has great examples for building dynamic, server-side sitemaps. Our problem was we're not using the new App Router, and we wanted to continue using parts of the next-sitemap library. Our search for a solution led us to this Stack Overflow answer and this blog post by Guillermo de la Puente.

What we eventually achieved was a combination of what we learned from both resources, but first, let's discuss what we needed to solve.

What's a Dynamic Sitemap?

As we covered in the introduction to this post, our sitemap was generated once during the postbuild of our Next.js application, a static sitemap. This meant that any new content, published after a build, would not be included in the sitemap until the next time we build, and could cause issues with our pages being indexed properly by the search engines. We needed our sitemaps to be dynamic.

A dynamic sitemap automatically reflects real-time changes on your site, ensuring the search engines index new and updated pages. It's a key tool for large or frequently changing websites to maintain accurate search engine visibility with minimal manual input.

Adding Dynamic Sitemaps to the Pages Router in Next.js

The first step we'll take is creating a new directory in pages. We chose to call ours sitemaps.

pages/
├── [...slug].js
├── _app.js
├── _document.js
├── blog
│   └── [[...slug]].js
// ✨ new folder
└── sitemaps

To that new directory, add a file -- ours is gonna be pages.xml.js, and this should be the skeleton for your logic:

// pages/sitemaps/pages.xml.js

import {getServerSideSitemapLegacy} from 'next-sitemap'

const generateSiteMap = (res, pages) => {
  const fields = []
  return getServerSideSitemapLegacy(res, fields)
}

export const getServerSideProps = async (ctx) => {
  // API call to gather URLs
  const pages = []

  // generate sitemap with pages data
  generateSiteMap(ctx, pages)

  return {
    props: {},
  }
}

// silence Next.js errors
export default function Sitemap() {}

There are two important things to call out about the code in this file:

  1. We're importing getServerSideSitemapLegacy from next-sitemap and we'll use this to build the sitemap's structure.
  2. Adding getServerSideProps tells Next.js that we're running this on every request, it will fetch the latest URLs, call the generateSiteMap function, and ensure the sitemap is always up-to-date with the site's current content.

With this, we can add our API request and the code to iterate over our pages to generate and then return a dynamic sitemap. Except, we're going to add one more thing...

In the posts response from our CMS, there's a key that tells us if a page should be indexed or not. The seo.index key will have a value of either index or noindex. So let's iterate over the pages array, but skip any page object that contains an index: "noindex".

Here's our completed logic for generating a dynamic sitemap for the pages in our website:

// pages/sitemaps/pages.xml.js

import {getServerSideSitemapLegacy} from 'next-sitemap'

const generateSiteMap = (res, pages) => {
  const fields = []

  pages.map((page) => {
    if (page.seo.index === 'index') {
      fields.push({
        loc: page.link,
        lastmod: new Date(page.modified).toISOString(),
      })
    }
  })

  return getServerSideSitemapLegacy(res, fields)
}

export const getServerSideProps = async (ctx) => {
  const request = await fetch(`${process.env.CMS_API_URL}/pages`)
  const pages = await request.json()

  generateSiteMap(ctx, pages)

  return {
    props: {},
  }
}

// silence Next.js errors
export default function Sitemap() {}

The final step is to add the sitemap to next-sitemap's configuration, this can be done by adding the config option additionalSitemaps.

additionalSitemaps: [
    'https://cuttlesoft.com/sitemaps/pages.xml',
],

Check your next-sitemap configuration carefully, and be sure to exclude the URLs likely to be in this dynamic sitemap, else they'll end up recorded twice; once in the dynamic sitemap, and once in the sitemap generated by next-sitemap at build time. These duplicate records can lead to indexing issues for your site.

Splitting Dynamic Sitemaps with the Pages Router in Next.js

The previous section covered creating individual, dynamic sitemaps, but what about splitting sitemaps when you have too many URLs? Google recommends that your sitemap be 50MB or about 50,000 URLs. That'll fit most websites, but read on if that doesn't describe your site, or if you prefer to split up your sitemaps for better organization.

The previous section also introduced us to getServerSideSitemapLegacy which helped us generate sitemaps easily, but there's also a getServerSideSitemapIndexLegacy function that builds index sitemaps, which will be essential for this next step. As the name suggests, this function will return a sitemap index;

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <sitemap>
        <loc>https://cuttlesoft.com/sitemaps/posts/sitemap-1.xml</loc>
    </sitemap>
</sitemapindex>

Create a new directory in pages/sitemaps, ours is called posts and add an index.xml.js file to it with the following:

// pages/sitemaps/posts/index.xml.js

import {getServerSideSitemapIndexLegacy} from 'next-sitemap'

const URLS_PER_SITEMAP = 1000

export const getServerSideProps = async (ctx) => {
  const request = await fetch(`${process.env.CMS_API_URL}/posts`)

  // our CMS provides the total in a header
  const count = await request.headers.get('x-wp-total')

  const sitemaps = Array(Math.ceil(count / URLS_PER_SITEMAP))
    .fill('')
    .map((v, index) => `https://cuttlesoft.com/sitemaps/posts/sitemap-${index}.xml`)

  return getServerSideSitemapIndexLegacy(ctx, sitemaps)
}

// silence Next.js errors
export default function Sitemap() {}

This file retrieves the number of posts from the CMS's API and then determines how many pages to create based on the value of URLS_PER_SITEMAP. For example, with a value of 1000 and 8200 posts, there would be 9 sitemaps (sitemap-0.xml... sitemap-8.xml), with the first 8 containing 1000 URLs and the final 9th the remaining 200.

Now for the sitemap pages; we'll add another file to this new directory but we're going to call it [page].js without the additional xml extension. Grab the code below, which looks a lot like our dynamic sitemap from above but with an important change.

Note: In our testing with Next.js 13, we found that *.xml.js would not work with Next's dynamic routing.

import {getServerSideSitemapLegacy} from 'next-sitemap'

const URLS_PER_SITEMAP = 1000

const generateSiteMap = (res, posts) => {
  const fields = []

  posts.map((post) => {
    fields.push({
      loc: post.link,
      lastmod: new Date(post.modified).toISOString(),
      // video extensions covered in the previous section
      ...(post.format === 'video' ? {videos: [{...}]} : {}),
    })
  })

  return getServerSideSitemapLegacy(res, fields)
}

export const getServerSideProps = async (ctx) => {
  if (!ctx.params?.page || isNaN(Number(ctx.params?.page))) {
    return {notFound: true}
  }

  const page = Number(ctx.params?.page)
  const params = new URLSearchParams({page: page, per_page: URLS_PER_SITEMAP})
  const request = await fetch(`${process.env.CMS_API_URL}/posts?${params}`)
  const posts = await request.json()

  if (posts.length === 0) {
    return {notFound: true}
  }

  generateSiteMap(ctx, posts)

  return {
    props: {},
  }
}

// silence Next.js errors
export default function Sitemap() {}

To make this work with URLs that resemble what we've added to our sitemap index, we'll need to add rewrite rules to next.config.js.

// Rewrites
// https://nextjs.org/docs/api-reference/next.config.js/rewrites
rewrites: async () => {
  return [
    {
      source: '/sitemaps/posts/sitemap.xml',
      destination: '/sitemaps/posts/index.xml',
    },
    {
      source: '/sitemaps/posts/sitemap-:page.xml',
      destination: '/sitemaps/posts/:page',
    },
  ]
},

Caching Dynamic Sitemaps with Next.js

Last step, and something we advise you consider adding is caching your dynamic sitemaps. This is really easy, and achievable with adding one line of code -- but before you grab it consider how long to cache your sitemap. For us, we looked at the frequency with which we post and took the ceiling for a given type. That meant we cached pages for a day (we've never published more than one new page in less than 24 hours), and posts for half a day.

To cache your sitemaps, add the following just before the generateSiteMap in your code:

// the rest of your dynamic sitemap logic

ctx.res.setHeader(
    'Cache-Control', 'public, s-maxage=86400, stale-while-revalidate=120'
)

generateSiteMap(ctx, posts)

This header adds the following:

  • Cache-Control: The cache is set to be valid for 86,400 seconds. During this time, if a request is made, Next.js will serve the cached response.
  • Stale-While-Revalidate: If the cache is hit within 86,400 seconds but within the stale-while-revalidate period (e.g., within 120 seconds), the cached response will still be served immediately, and the content will be revalidated in the background.

Using Multiple Sitemap Indices

Be careful with nesting sitemap indices within other indices; as Google clearly states "Sitemap index files can't contain other index files". If you're following this tutorial you should not add the dynamic sitemap index created in the steps above to the additionalSitemaps configuration of next-sitemap. Instead, submit your dynamic sitemap index using Google Search Console:

Google Search Console interface showing the process of submitting additional sitemap indices for a Next.js developed website. The screen displays the 'Sitemaps' section under the 'Index' menu in the left sidebar. In the main content area, there's a form with a text input field labeled 'Add a new sitemap'. Below the input field is a blue 'Submit' button. The image illustrates where to enter the URL of your dynamic sitemap index (e.g., 'https://yoursite.com/sitemaps/posts/sitemap.xml') created during Next.js development. Below the form, a table lists previously submitted sitemaps, their status, and the last date of submission. This step is crucial for ensuring search engines properly index dynamically generated content from your Next.js application, improving SEO performance for websites with frequently updated content or large numbers of pages.

Crawling Towards Excellence

Implementing these advanced SEO techniques for Next.js can significantly improve your website's visibility and performance. We specialize in Next.js development, design, and optimization so your Next.js applications reach their full potential. If you're looking to take your Next.js project to the next level or need assistance with complex implementations like dynamic sitemaps, we'd love to help. Contact our team of expert Next.js Developers.

Related Posts

Featured image for TWIL blog post showcasing a command-line tip with Prettified Prettier Alias to enhance code formatting efficiency.
August 2, 2023 • Frank Valcarcel

TWIL 2023-07-28

Dive into this week’s TWIL for a tech nugget on command-line efficiency! Learn how to streamline your coding workflow with a Prettified Prettier Alias and make your development process more productive.

Cuttlesoft's leadership team in a software design and planning meeting at our Denver headquarters
December 21, 2019 • Frank Valcarcel

Cuttlesoft Ranked a 5-Star Agency!

Cuttlesoft achieves levels of flexibility and efficiency that set us apart from the competition. Which is why our 5-star ranking on Clutch, a premier B2B buying guide, shows we can deliver.