Introducing Dookie, A Better Way To Import/Export MongoDB Data

by Valeri Karpov@code_barbarianFebruary 12, 2016

Getting data into and out of MongoDB is a pain. Although MongoDB 3.0 and 3.2 introduced a lot of great new features, changes in MongoDB's authentication have introduced a lot of nasty quirks to the shell's copyDatabase() function. Similarly, the mongodump/mongorestore and mongoimport/mongoexport binaries that come with MongoDB are only useful for a small handful of use cases. Mongoimport and mongoexport only let you import/export a single collection, and mongodump/mongorestore produce binary data that isn't human readable. Neither is an adequate solution to "I want to put some sample data into my GitHub repo so I don't have to generate data by pointing and clicking in my web app."

One common pattern I see for this task is "seed scripts," complex scripts that generate a data set and put it into MongoDB. This pattern seems to be particularly popular among Fullstack Academy grads. However, seed scripts are limited: they build up data imperatively rather than declaratively, and building multiple seed scripts for different data sets requires a lot of discipline. Seed scripts are only marginally better than writing shell script wrappers around mongoimport/export, which is why I wrote dookie, a tool that allows you to pre-process JSON or YAML data before inserting it into MongoDB.

How Dookie Works

Dookie has two fundamental operations, 'push' and 'pull'. In other words, you can push data into MongoDB and pull data out of MongoDB. Let's say you have the following JSON object:

{
  "people": [
    {
      "_id": {
        "$oid": "561d87b8b260cf35147998ca"
      },
      "name": "Axl Rose"
    },
    {
      "_id": {
        "$oid": "561d88f5b260cf35147998cb"
      },
      "name": "Slash"
    }
  ],
  "bands": [
    {
      "_id": "Guns N' Roses",
      "members": [
        "Axl Rose",
        "Slash"
      ]
    }
  ]
}

This object represents the state of a MongoDB database. The database has 2 collections: "people" and "bands". The "people" collection has two documents, and the "bands" collection has one. In the MongoDB shell, this database looks like what you see below. Note that the objects with $oid keys in the JSON representation become MongoDB ObjectIds - dookie supports MongoDB extended JSON syntax.

> show collections
bands
people
> db.bands.find().pretty()
{ "_id" : "Guns N' Roses", "members" : [ "Axl Rose", "Slash" ] }
> db.people.find().pretty()
{ "_id" : ObjectId("561d87b8b260cf35147998ca"), "name" : "Axl Rose" }
{ "_id" : ObjectId("561d88f5b260cf35147998cb"), "name" : "Slash" }
>

Dookie's "pull" operation lets you export this data from MongoDB into a JSON file. The below command exports the database named "test" into the file export.json.

$ ./node_modules/.bin/dookie pull --db test --file ./export.json
Writing data from mongodb://localhost:27017/test to ./export.json
$ head export.json
{
  "people": [
    {
      "_id": {
        "$oid": "561d87b8b260cf35147998ca"
      },
      "name": "Axl Rose"
    },
    {
      "_id": {

You can also use dookie from Node.js. Dookie has a pull() function that takes in a MongoDB connection string and returns a promise. The below script writes the contents of the "test" database to export.json.

const dookie = require('dookie');
const fs = require('fs');
dookie.pull('mongodb://localhost:27017/test').then(function(res) {
  fs.writeFileSync('./export.json', res);
});

The "push" operation does the opposite. Once you have export.json, you can then replace the test database with the contents of export.json.

$ mongo
MongoDB shell version: 3.2.0
connecting to: test
> db.dropDatabase();
{ "dropped" : "test", "ok" : 1 }
> ^C
bye
$ dookie push --db test --file ./export.json
Writing data from ./export.json to test
Success!
$ mongo
MongoDB shell version: 3.2.0
connecting to: test
Server has startup warnings:
> show collections
bands
people
> db.bands.find().pretty()
{ "_id" : "Guns N' Roses", "members" : [ "Axl Rose", "Slash" ] }
>

You can also do the same thing from Node.js. This is very handy for mocha tests. These days my API integration tests almost always have dookie.push() in a beforeEach() hook.

const dookie = require('dookie');
const fs = require('fs');
const data = JSON.parse(fs.readFileSync('./export.json', 'utf8'));
dookie.push('mongodb://localhost:27017/test', data).then(function() {
  console.log('done!');
});

The "push" operation also supports YAML, so if you have a file named export.yml like you see below:

people:
  - _id:
      # MongoDB extended JSON syntax
      $oid: 561d87b8b260cf35147998ca
    name: Axl Rose
  - _id:
      $oid: 561d88f5b260cf35147998cb
    name: Slash

bands:
  - _id: Guns N' Roses
    members:
      - Axl Rose
      - Slash

You can import it using dookie push. In my experience YAML a better language for dookie data sets than JSON, because it's easier to read and supports comments.

Pre-processors For Dookie Push

So far you've used dookie to import and export data as-is. However, dookie's push operation has some powerful syntactic sugar inspired by the CSS pre-processor stylus. I occasionally describe dookie as "stylus for MongoDB data sets."

The first helper you'll learn about is $extend. Let's say your web app allows you to search through Arnold Schwarzenegger movies, and you want to create a data set to test your search API. Every movie has some things in common, and you don't want to copy/paste these details everywhere. The $extend syntax lets your documents inherit properties from "variables." In dookie, a "variable" is a top-level field that starts with "$" (because MongoDB collection names can't start with "$").

$movie:
  deleted: false
  leadActor: Arnold Schwarzenegger

movies:
  - $extend: $movie
    name: Jingle All The Way
    supportingActors:
      - Jake Lloyd
  - $extend: $movie
    name: "Terminator 2: Judgment Day"
    supportingActors:
      - Linda Hamilton
      - Robert Patrick

If you run dookie push on the above YAML file, you'll get a database with 1 collection that contains 2 documents, both with the correct deleted and leadActor fields.

$ dookie push --db test --file ./test.yml
Writing data from ./test.yml to test
Success!
$ mongo
MongoDB shell version: 3.2.0
connecting to: test
> show collections
movies
> db.movies.find().pretty()
{
    "_id" : ObjectId("56bcbeb496339ea4340a71b3"),
    "name" : "Jingle All The Way",
    "supportingActors" : [
        "Jake Lloyd"
    ],
    "deleted" : false,
    "leadActor" : "Arnold Schwarzenegger"
}
{
    "_id" : ObjectId("56bcbeb496339ea4340a71b4"),
    "name" : "Terminator 2: Judgment Day",
    "supportingActors" : [
        "Linda Hamilton",
        "Robert Patrick"
    ],
    "deleted" : false,
    "leadActor" : "Arnold Schwarzenegger"
}

The $extend syntax is even more powerful when you combine it with $require. Let's say you want to write multiple test files that leverage the above data set. For instance, suppose you wanted to test what happens when a movie has deleted set to true, so you wrote the below file deleted.yml.

$require: "./test.yml"

movies:
  - $extend: $movie
    deleted: true
    name: Hercules in New York

The $require syntax above pulls in the test.yml file from the $extend example, including all of its variables. The $require syntax also allows you to attach an additional movie. Once you push the above file, you get a 'movies' collection with 3 documents.

$ dookie push --db test --file ./deleted.yml
Writing data from ./deleted.yml to test
Success!
$ mongo
MongoDB shell version: 3.2.0
connecting to: test
> db.movies.find().pretty()
{
    "_id" : ObjectId("56bcc074b57de15135356a0f"),
    "name" : "Jingle All The Way",
    "supportingActors" : [
        "Jake Lloyd"
    ],
    "deleted" : false,
    "leadActor" : "Arnold Schwarzenegger"
}
{
    "_id" : ObjectId("56bcc074b57de15135356a10"),
    "name" : "Terminator 2: Judgement Day",
    "supportingActors" : [
        "Linda Hamilton",
        "Robert Patrick"
    ],
    "deleted" : false,
    "leadActor" : "Arnold Schwarzenegger"
}
{
    "_id" : ObjectId("56bcc074b57de15135356a11"),
    "deleted" : true,
    "name" : "Hercules in New York",
    "leadActor" : "Arnold Schwarzenegger"
}

Moving On

Stop using seed scripts and mongoimport/mongoexport scripts for your MongoDB and Node.js test data! Dookie lets you build and compose data sets in a declarative manner, and stand up the data sets from mocha tests or from the command line. Dookie has been the single biggest time-saver in my MongoDB REST API workflow over the last year. Check dookie out on npm and save yourself a lot of headache with MongoDB sample data.

Found a typo or error? Open up a pull request! This post is available as markdown on Github