The newly released MongoDB 3.4 is filled with powerful new features. In particular, there are 27 new aggregation concepts, the most exciting of which are $graphLookup and $facet. In this article, I'll show how $graphLookup works and how you can use it in Node.js.

What Does $graphLookup Do?

$graphLookup is a new aggregation framework stage that recursively searches through a collection. For a concrete example, let's insert 5 documents into the customers collection:

db.customers.insertMany([
  { _id: 1, name: 'Luke Skywalker', friends: [2, 3] },
  { _id: 2, name: 'Han Solo', friends: [1, 3, 4] },
  { _id: 3, name: 'Leia Organa', friends: [1, 3] },
  { _id: 4, name: 'Lando Calrissian', friends: [2, 5] },
  { _id: 5, name: 'Tendra Risant', friends: [4] }
]);

Luke Skywalker has 2 friends, Han Solo and Leia Organa. But how do you determine who Luke's 2nd degree connections (friends of friends) are? That's where $graphLookup comes in:

db.customers.aggregate([
  { $match: { _id: 1 } }, // Only look at Luke Skywalker
  {
    $graphLookup: {
      from: 'customers', // Use the customers collection
      startWith: '$friends', // Start looking at the document's `friends` property
      connectFromField: 'friends', // A link in the graph is represented by the friends property...
      connectToField: '_id', // ... pointing to another customer's _id property
      maxDepth: 1, // Only recurse one level deep
      as: 'connections' // Store this in the `connections` property
    }
  }
]);

The result of this aggregation looks like this:

{
    "_id" : 1,
    "name" : "Luke Skywalker",
    "friends" : [
        2,
        3
    ],
    "connections" : [
        {
            "_id" : 4,
            "name" : "Lando Calrissian",
            "friends" : [
                2,
                5
            ]
        },
        {
            "_id" : 1,
            "name" : "Luke Skywalker",
            "friends" : [
                2,
                3
            ]
        },
        {
            "_id" : 3,
            "name" : "Leia Organa",
            "friends" : [
                1,
                3
            ]
        },
        {
            "_id" : 2,
            "name" : "Han Solo",
            "friends" : [
                1,
                3,
                4
            ]
        }
    ]
}

Because of maxDepth, $graphLookup will only explore up to one level deep, which corresponds to friends and friends of friends. If you remove maxDepth, $graphLookup will recursively go through the whole graph.

db.customers.aggregate([
  { $match: { _id: 1 } }, // Only look at Luke Skywalker
  {
    $graphLookup: {
      from: 'customers',
      startWith: '$friends',
      connectFromField: 'friends', // <-- skip `maxDepth`
      connectToField: '_id',
      as: 'connections'
    }
  },
  { $project: { 'connections.name': 1 } }
]);

// Output

{
    "_id" : 1,
    "connections" : [
        {
            "name" : "Tendra Risant"
        },
        {
            "name" : "Han Solo"
        },
        {
            "name" : "Leia Organa"
        },
        {
            "name" : "Luke Skywalker"
        },
        {
            "name" : "Lando Calrissian"
        }
    ]
}

$graphLookup and Bacon Numbers

$graphLookup can also search a separate collection. Let's use this to compute the Arnold number for a small data set of actors and movies.

db.actors.insertMany([
  { _id: 1, name: 'Arnold Schwarzenegger' },
  { _id: 2, name: 'James Earl Jones' },
  { _id: 3, name: 'Harrison Ford' },
  { _id: 4, name: 'Tommy Lee Jones' }
]);
db.movies.insertMany([
  { _id: 1, name: 'Conan the Barbarian', actors: [1, 2] },
  { _id: 2, name: 'The Empire Strikes Back', actors: [2, 3] },
  { _id: 3, name: 'The Fugitive', actors: [3, 4] }
]);

Getting the Arnold number for actors is somewhat tricky, but you can compute the Arnold number for movies using $graphLookup using the depthField option for $graphLookup.

db.actors.aggregate([
  { $match: { _id: 1 } }, // Only look at Arnold Schwarzenegger
  {
    $graphLookup: {
      from: 'movies', // Explore the movies collection
      startWith: '$_id', // Start with movies that contain Arnold's _id
      connectFromField: 'actors', // Match actors in one movie...
      connectToField: 'actors', // to actors in another movie
      as: 'connections',
      // Add a 'steps' property to each connection subdoc that contains
      // the number of steps needed to get to this movie
      depthField: 'steps'
    }
  }
]);

// Output

{
    "_id" : 1,
    "name" : "Arnold Schwarzenegger",
    "connections" : [
        {
            "_id" : 3,
            "name" : "The Fugitive",
            "actors" : [
                3,
                4
            ],
            "steps" : NumberLong(2) // 2 steps to get here, Conan -> Empire -> Fugitive
        },
        {
            "_id" : 2,
            "name" : "The Empire Strikes Back",
            "actors" : [
                2,
                3
            ],
            "steps" : NumberLong(1) // Needed 1 step to get here, Conan -> Empire
        },
        {
            "_id" : 1,
            "name" : "Conan the Barbarian",
            "actors" : [
                1,
                2
            ],
            "steps" : NumberLong(0) // Started here
        }
    ]
}

With a little $unwind and $group magic, you can sort actors by the Arnold number for the movies they're in and take the closest one:

db.actors.aggregate([
  { $match: { _id: 1 } }, // Only look at Arnold Schwarzenegger
  {
    $graphLookup: {
      from: 'movies', // Explore the movies collection
      startWith: '$_id', // Start with movies that contain Arnold's _id
      connectFromField: 'actors', // Match actors in one movie...
      connectToField: 'actors', // to actors in another movie
      as: 'connections',
      // Add a 'steps' property to each connections subdoc that contains
      // the number of steps needed to get to this movie
      depthField: 'steps'
    }
  },
  // Generate 1 doc for each element in the connections array
  { $unwind: '$connections' },
  // Generate 1 doc for each element in an actors array
  { $unwind: '$connections.actors' },
  // Now we have 1 doc for each movie/actor pair, so order by Arnold number
  { $sort: { 'connections.steps': 1 } },
  // Skip Arnold
  { $match: { 'connections.actors': { $ne: 1 } } },
  // And pick the doc with the smallest Arnold number for each Actor
  {
    $group: {
      _id: '$connections.actors',
      arnoldNumber: { $first: { $add: ['$connections.steps', 1] } }
    }
  }
]);

// Output

// Tommy Lee Jones -> Harrison Ford (The Fugitive) ->
// James Earl Jones (The Empire Strikes Back) -> Arnold (Conan the Barbarian)
{ "_id" : 4, "arnoldNumber" : 3 } // Tommy Lee Jones
{ "_id" : 3, "arnoldNumber" : 2 } // Harrison Ford
{ "_id" : 2, "arnoldNumber" : 1 } // James Earl Jones

So $graphLookup doesn't get you the right Arnold number on its own. But, thanks to the fact that MongoDB aggregations are pipelines, you can massage the data with a few extra stages to get the correct Arnold number.

Using $graphLookup in Node.js

There are no version restrictions for using $graphLookup with the MongoDB Node.js driver or mongoose. The below scripts work fine back to MongoDB driver 2.1.0 and mongoose 4.5.0.

const mongodb = require('mongodb');

mongodb.MongoClient.connect('mongodb://localhost:27017/test').
  then(db => db.collection('actors').aggregate([
    { $match: { _id: 1 } }, // Only look at Arnold Schwarzenegger
    {
      $graphLookup: {
        from: 'movies', // Explore the movies collection
        startWith: '$_id', // Start with movies that contain Arnold's _id
        connectFromField: 'actors', // Match actors in one movie...
        connectToField: 'actors', // to actors in another movie
        as: 'connections',
        // Add a 'steps' property to each connections subdoc that contains
        // the number of steps needed to get to this movie
        depthField: 'steps'
      }
    },
    // Generate 1 doc for each element in the connections array
    { $unwind: '$connections' },
    // Generate 1 doc for each element in an actors array
    { $unwind: '$connections.actors' },
    // Now we have 1 doc for each movie/actor pair, so order by Arnold number
    { $sort: { 'connections.steps': 1 } },
    // Skip Arnold
    { $match: { 'connections.actors': { $ne: 1 } } },
    // And pick the doc with the smallest Arnold number for each Actor
    {
      $group: {
        _id: '$connections.actors',
        arnoldNumber: { $first: { $add: ['$connections.steps', 1] } }
      }
    }
  ]).toArray()).
  then(res => console.log(res)).
  catch(error => console.error('error', error));
const mongoose = require('mongoose');

mongoose.connect('mongodb://localhost:27017/test');

var Actor = mongoose.model('Actor', new mongoose.Schema());

Actor.aggregate([
  { $match: { _id: 1 } }, // Only look at Arnold Schwarzenegger
  {
    $graphLookup: {
      from: 'movies', // Explore the movies collection
      startWith: '$_id', // Start with movies that contain Arnold's _id
      connectFromField: 'actors', // Match actors in one movie...
      connectToField: 'actors', // to actors in another movie
      as: 'connections',
      // Add a 'steps' property to each connections subdoc that contains
      // the number of steps needed to get to this movie
      depthField: 'steps'
    }
  },
  // Generate 1 doc for each element in the connections array
  { $unwind: '$connections' },
  // Generate 1 doc for each element in an actors array
  { $unwind: '$connections.actors' },
  // Now we have 1 doc for each movie/actor pair, so order by Arnold number
  { $sort: { 'connections.steps': 1 } },
  // Skip Arnold
  { $match: { 'connections.actors': { $ne: 1 } } },
  // And pick the doc with the smallest Arnold number for each Actor
  {
    $group: {
      _id: '$connections.actors',
      arnoldNumber: { $first: { $add: ['$connections.steps', 1] } }
    }
  }
]).
then(res => console.log(res)).
catch(error => console.error('error', error));

However, I'd recommend using MongoDB driver >= 2.2.19 and Mongoose >= 4.7.6 for full support for MongoDB 3.4. Mongoose 4.8.0 will also include a graphLookup() helper function for its chainable aggregation pipeline builder:

const promise = Actor.aggregate().
  match({ _id: 1 }).
  graphLookup({
    from: 'movies', // Explore the movies collection
    startWith: '$_id', // Start with movies that contain Arnold's _id
    connectFromField: 'actors', // Match actors in one movie...
    connectToField: 'actors', // to actors in another movie
    as: 'connections',
    // Add a 'steps' property to each connections subdoc that contains
    // the number of steps needed to get to this movie
    depthField: 'steps'
  }).
  exec();

Moving On

The $graphLookup stage is just one of the many new aggregation features in MongoDB 3.4. The $graphLookup stage lets you recursively search MongoDB collections to find friends of friends and Bacon numbers. Make sure you upgrade and check it out!

In a future article I'll discuss another exciting new feature from MongoDB 3.4, the $facet stage, AKA the new favorite pipeline stage for every MongoDB-based ecommerce platform.

Found a typo or error? Open up a pull request! This post is available as markdown on Github
comments powered by Disqus