Creating Large Dynamic Sitemaps with Next.js, Drupal, and next-sitemap

A sitemap is an XML file containing the list of indexable URLs of a domain. When sitemaps become large, they are split into 1 sitemap index file that point to multiple sitemap files. Learn more about splitting sitemaps with Google’s documentation. next-sitemap is a library that conveniently generates the sitemap XML document after reading the Next.js build manifests or when given a list of URLs. Check out some real examples, like the Google sitemap index or the sitemap of this website.

Static sitemaps generated at build time with next-sitemap

Static routes generated at build time are automatically picked up by next-sitemap. That is the case for both static pages or paths generated by getStaticPaths. It works out of the box!

                                        
                                            // next-sitemap.config.js

/** @type {import('next-sitemap').IConfig} */
const config = {
  siteUrl: process.env.SITE_URL, // or whatever your domain is
};

module.exports = config;
                                        
                                    

You may add other options, like paths to exclude, additionalPaths or generateRobotsTxt. Then, you’d automate its generation after building the project. To do that, simply add it to the postbuild step in package.json:

                                        
                                            // package.json
{
  ...
	"scripts": {
    "build": "next build",
    "postbuild": "next-sitemap",
    ...
  },
  ...
}
                                        
                                    

How to build large dynamic sitemaps at runtime

Let’s say you have user generated URLs. You might go with pulling all URLs at build time in the next-sitemap config file, but then your sitemaps would only be updated when deploying. So let’s switch the approach to generate them on demand.

You’ll need new routes to render the sitemap index and each of the sitemap pages. Sitemaps should be at the root level, with clean URLs like /server-sitemap.xml and /server-sitemap-0.xml, /server-sitemap-1.xml, etc. Since Next.js doesn’t let us do dynamic page names like server-sitemap-[page].ts, we can leverage rewrites.

Create the following pages:

                                        
                                            /pages
  /server-sitemap
    /index.ts <-- this corresponds to the sitemap index
    /[page].ts <-- this corresponds to an individual sitemap
                                        
                                    

Then, add the rewrites in the Next.js config:

                                        
                                            // next.config.js

/** @type {import('next').NextConfig} */
const config = {
  ...
  rewrites: async () => [
    {
      source: '/server-sitemap.xml',
      destination: '/server-sitemap',
    },
    {
      source: '/server-sitemap-:page.xml',
      destination: '/server-sitemap/:page',
    },
  ],
};
                                        
                                    

next-sitemap provides two APIs to generate server side sitemaps:

  • getServerSideSitemapIndex to generate the sitemap index file.

  • getServerSideSitemap to generate a single sitemap file.

For the index file, we just need to pull the amount of sitemap pages that will exist, and pass their URLs to getServerSideSitemapIndexLegacy.

                                        
                                            // server-sitemap/index.ts
// route rewritten from /server-sitemap.xml

const URLS_PER_SITEMAP = 45000;

async function getPaths(context): Promise["paths"]> {
    // Build paths for all `node--page`.
    const nodes = await drupal.getResourceCollectionFromContext(
        "node--story",
        context,
        {
          params: {
            "filter[status]": 1,
            "fields[node--story]": "path,created",
            sort: "-created",
          },
        }
      )
    return nodes.map((node) => {
        return {
            params: {
                slug: node.path,
            },
        }
    })
}

export const getServerSideProps: GetServerSideProps = async (ctx) => {
    // Method to source urls from cms
    const urls = await getPaths(ctx)
    const SITE_URL = process.env.NEXT_PUBLIC_BASE_URL ?? 'https://localhost:3000' + '/news'
    const count = urls.length;
    const totalSitemaps = Math.ceil(count / URLS_PER_SITEMAP);

    const sitemaps = Array(totalSitemaps)
    .fill('')
    .map((v, index) => {
        return {
            loc: `${SITE_URL}/server-sitemap-${index}.xml`,
        }
    });

    return getServerSideSitemapLegacy(ctx, sitemaps)
}

// Default export to prevent next.js errors
export default function Sitemap() {}
                                        
                                    

For the individual sitemaps, we need to fetch their corresponding page and pass the URLs getServerSideSitemapLegacy.

                                        
                                            // server-sitemap/[page].ts
// route rewritten from /server-sitemap-[page].xml


const URLS_PER_SITEMAP = 45000;

async function getPaths(context): Promise["paths"]> {
    // Build paths for all `node--page`.
    const nodes = await drupal.getResourceCollectionFromContext(
        "node--story",
        context,
        {
          params: {
            "filter[status]": 1,
            "fields[node--story]": "path,created",
            sort: "-created",
          },
        }
      )
    return nodes.map((node) => {
        return {
            params: {
                slug: node.path,
            },
        }
    })
}

const paginate = (items, page = 1, perPage = 10) => {
    const offset = perPage * (page - 1);
    const totalPages = Math.ceil(items.length / perPage);
    const paginatedItems = items.slice(offset, perPage * page);
  
    return {
        previousPage: page - 1 ? page - 1 : null,
        nextPage: (totalPages > page) ? page + 1 : null,
        total: items.length,
        totalPages: totalPages,
        items: paginatedItems
    };
};

export const getServerSideProps: GetServerSideProps<
  any,
  { page: string }
> = async ctx => {
  if (!ctx.params?.page || isNaN(Number(ctx.params?.page))) {
    return { notFound: true };
  }
  const page = Number(ctx.params?.page);

  // this would load the items that make dynamic pages
  const urls = await getPaths(ctx)
  const SITE_URL = process.env.NEXT_PUBLIC_BASE_URL ?? 'https://localhost:3000' + '/news'
  const urlPaginated = paginate(urls, page + 1, URLS_PER_SITEMAP);

  if (urlPaginated.items.length === 0) {
    return { notFound: true };
  }

  const fields = urlPaginated.items.map(item => ({
    loc: `${SITE_URL}${item.params.slug}`,
    lastmod: new Date().toISOString(),
  }));

  return getServerSideSitemapLegacy(ctx, fields);
};

// Default export to prevent next.js errors
export default function SitemapPage() {}
                                        
                                    

Caching the dynamic sitemaps

Since the sitemaps are hitting our API or DB to load many items, we don’t want to execute those queries too often.

With the Cache-Control header, Next.js allows caching at the framework level the result of server-side functions, including getServerSideProps. It works automatically when deployed to Vercel. Otherwise, you’ll need to set it up with Redis or similar.

                                        
                                            ...

const cacheMaxAgeUntilStaleSeconds =  60; // 1 minute
const cacheMaxAgeStaleDataReturnSeconds =  15 * 60; // 15 minutes

ctx.res.setHeader(
  'Cache-Control',
  `public, s-maxage=${cacheMaxAgeUntilStaleSeconds}, stale-while-revalidate=${cacheMaxAgeStaleDataReturnSeconds}`
);

return ...
                                        
                                    

Learn more about Vercel caching here. Note that the response size can’t exceed 10 MB!