Sign Up for Free

RunKit +

Try any Node.js package right in your browser

This is a playground to test code. It runs a full Node.js environment and already has all of npm’s 400,000 packages pre-installed, including sitemap-stream-parser with all npm packages installed. Try it out:

var sitemapStreamParser = require("sitemap-stream-parser")

This service is provided by RunKit and is not affiliated with npm, Inc or the package authors.

sitemap-stream-parser v1.7.0

Get a list of URLs from one or more sitemaps

node-sitemap-stream-parser

A streaming parser for sitemap files. It is able to deal with GBs of deeply nested sitemaps with hundreds of URLs in them. Maximum memory usage is just over 100Mb at any time.

Usage

The main method to extract URLs for a site is with the parseSitemaps(urls, url_cb, done) method. You can call it with both a single URL or an Array of URLs. The url_cb is called for every URL that is found. The done callback is passed an error and/or a list of all the sitemaps that were checked.

Examples:

var sitemaps = require('sitemap-stream-parser');

sitemaps.parseSitemaps('http://example.com/sitemap.xml', console.log, function(err, sitemaps) {
    console.log('All done!');
});

or

var sitemaps = require('sitemap-stream-parser');

var urls = ['http://example.com/sitemap-posts.xml', 'http://example.com/sitemap-pages.xml'];

all_urls = [];
sitemaps.parseSitemaps(urls, function(url) { all_urls.push(url); }, function(err, sitemaps) {
    console.log(all_urls);
    console.log('All done!');
});

Sometimes sites advertise their sitemaps in their robots.txt file. To parse this file to see if that is the case use the method sitemapsInRobots(url, cb). You can easily combine those 2 methods.

var sitemaps = require('sitemap-stream-parser');

sitemaps.sitemapsInRobots('http://example.com/robots.txt', function(err, urls) {
    if(err || !urls || urls.length == 0)
        return;
    sitemaps.parseSitemaps(urls, console.log, function(err, sitemaps) {
        console.log(sitemaps);
    });
});

Metadata

RunKit is a free, in-browser JavaScript dev environment for prototyping Node.js code, with every npm package installed. Sign up to share your code.
Sign Up for Free