Sign Up for Free

RunKit +

Try any Node.js package right in your browser

This is a playground to test code. It runs a full Node.js environment and already has all of npm’s 400,000 packages pre-installed, including pdfdata with all npm packages installed. Try it out:

var pdfdata = require("pdfdata")

This service is provided by RunKit and is not affiliated with npm, Inc or the package authors.

pdfdata v0.2.7

node.js client library for (PDF data extraction as-a-service)


A node.js client library for, the API for PDF data extraction. is designed to be incredibly easy to use while providing impeccable PDF data extraction quality over a range of configurable extraction targets (text, forms, metadata, images, tables, and more all the time). While's API is itself an approachable HTTP+JSON affair, pdfdata-node provides an idiomatic, promise-based JavaScript library that any node.js developer can have up and running in less than a minute.

For detailed documentation and extensive examples, head over to our API docs.

Quick Start


$ npm install pdfdata


You will need a API key to use this library. (If you don't have one already, you can get one free by registering.)


First, you'll need to plug in your API key; there are two ways you can do this. Either provide it as a constructor argument to the result of requiring the pdfdata module:

var pdfdata = require("pdfdata")("test_YOUR_API_KEY_HERE");

Or, you can set the PDFDATA_APIKEY environment variable appropriately for your operating system, e.g.:


and then omit the extra argument when requiring pdfdata:

var pdfdata = require("pdfdata")();

Running a proc (data extraction process)

Assuming you have a PDF document test.pdf in your current directory which contains text you'd like to extract:


This will yield something like this:

{ type: 'proc',
  id: 'proc_156870e759a',
  created: '2016-08-14T03:18:07Z',
  source_tags: [],
  operations: [ { op: 'text' } ],
   [ { type: 'doc',
       id: 'doc_8e96ec0533ac3e1e988b7d1ca27bfdc096b82ddc',
       filename: 'document.pdf',
       tags: [ 'acquired:2016-08-08', 'acquired:2016-08-14' ],
       created: '2016-08-08T19:35:16Z',
       expires: '2016-09-13T03:18:07Z',
        [ { op: 'text',
            [ { text: '\n                              Center        for    Bioinformatics                &\n                                     Molecular           Biostatistics\n                                   (University   of California, San  Francisco)\n\n                            Year 2005                                                     Paper dlbcl\n\n\n\n\n\n                          Microarray        Gene     Expression       Data     with\n                            Linked      Survival     Phenotypes:...' } ]
          } ] } ] }

There are many different data extraction operations available; unstructured text as is shown above, as well as access to bitmap image data, metadata, and structured data options like forms, and custom named-region page template extractions.

Learn more

Seriously, please check out our API documentation, which includes a tonne of examples, descriptions of all of the data extraction operations offers, and details about important things like data retention policies, usage limits, and so on.

Questions? We're on Twitter @pdfdataio, or you can contact us otherwise.


(This is only relevant if you are modifying / contributing to pdfdata-node.)

Set your environment, e.g.:

export PDFDATA_ENDPOINT=https://localhost:8081/v1


Run the tests via npm test, or node_modules/mocha/bin/mocha --watch if you want to watch for changes while developing.



RunKit is a free, in-browser JavaScript dev environment for prototyping Node.js code, with every npm package installed. Sign up to share your code.
Sign Up for Free