A Node.js Perspective on MongoDB 3.4: $facet

by Valeri Karpov@code_barbarianJanuary 19, 2017

MongoDB 3.4 has 27 new aggregation concepts. Last week I covered the new $graphLookup stage, which recursively searches collections. This week I'll cover the new $facet stage and the related $bucket and $bucketAuto operators. These new features are tailor-made for building powerful product search pages, and will quickly become your favorite aggregation features if you're working on an ecommerce product.

What is Faceted Search?

Here's the view you see when searching for products on Amazon:

The "Refine By" menu on the left is a classic example of faceted search. When you only want to see products that are eligible for Amazon prime, the counts in the "Refine By" menu update to only reflect products that are eligible for Prime. For example, there are 26 products in the $10-$25 range, but only 7 of those are eligible for Prime.

Updating these counts in a sane and performant way was a huge pain before MongoDB 3.4. Your options were to either re-structure your data specifically for faceted search or execute a separate aggregation for every single individual "facet" you want to filter by.

Let's take a look at a more concrete example. Here's a collection of 4 books:

db.books.insertMany([
  {
    title: 'Professional AngularJS',
    author: 'Valeri Karpov',
    price: 37,
    year: 2015,
    tags: ['JavaScript', 'AngularJS']
  },
  {
    title: 'The 80/20 Guide To ES2015 Generators',
    author: 'Valeri Karpov',
    price: 10,
    year: 2016,
    tags: ['JavaScript', 'ES6']
  },
  {
    title: 'Total Recall: My Unbelievably True Life Story',
    author: 'Arnold Schwarzenegger',
    price: 32,
    year: 2013,
    tags: ['Biography']
  },
  {
    title: "Arnold's Bodybuilding for Men",
    author: 'Arnold Schwarzenegger',
    price: 14,
    year: 1984,
    tags: ['Fitness']
  }
]);

In order to enable faceted search by year, you need to run an aggregation query that buckets books by the year they were published.

db.books.aggregate([
  // Count the number of books published in a given year
  {
    $group: {
      _id: '$year',
      count: { $sum: 1 }
    }
  },
  // Sort by year descending
  { $sort: { count: -1, _id: -1 } }
]);

// Output
{ "_id" : 2016, "count" : 1 }
{ "_id" : 2015, "count" : 1 }
{ "_id" : 2013, "count" : 1 }
{ "_id" : 1984, "count" : 1 }

To drill down into just books by Arnold Schwarzenegger, you'd need to add a $match query.

db.books.aggregate([
  // Just Arnold's books
  { $match: { author: 'Arnold Schwarzenegger' } },
  // Count the number of books published in a given year
  {
    $group: {
      _id: '$year',
      count: { $sum: 1 }
    }
  },
  // Sort by year descending
  { $sort: { count: -1, _id: -1 } }
]);

// Output
{ "_id" : 2013, "count" : 1 }
{ "_id" : 1984, "count" : 1 }

However, you need to execute a separate aggregation for every other facet. For example, in MongoDB 3.2 there would be no good way to get both the author counts and the year counts in a single aggregation. You'd have to execute the below aggregation separately to get the author counts.

db.books.aggregate([
  // Count the number of books published in a given year
  {
    $group: {
      _id: '$author',
      count: { $sum: 1 }
    }
  },
  // Sort by author name ascending
  { $sort: { count: -1, _id: 1 } }
]);

// Output
{ "_id" : "Arnold Schwarzenegger", "count" : 2 }
{ "_id" : "Valeri Karpov", "count" : 2 }

Furthermore, there was no way to bucket books by price. The $group stage lets you count up the number of books that cost $10 exactly, but there's no way to group books by which ones are in the $10-$25 price range versus the $25-$50 range. At BookaLokal and SixPlus we gave up on using aggregation for this and just used a Node.js stream to count up the different facets. However, $facet and $bucket make it possible to count up facets in a single aggregation.

Introducing `$facet` and `$bucket`

$facet is fundamentally an aggregation stage that executes multiple aggregation pipelines in parallel and spits out the combined result when the pipelines are all done. You can think of it as async.parallel() or Promise.all() for aggregation pipelines.

Sounds complicated, so here's a concrete example. Remember the 2 separate aggregations for counting books by year and books by author? Here's the same 2 aggregations combined into a single aggregation with $facet. The $facet stage lets you execute both pipelines in parallel and store the output from both pipelines in a single document.

db.books.aggregate([
  {
    $facet: {
      // The `years` property will be the output of the 'count by year' pipeline
      years: [
        // Count the number of books published in a given year
        {
          $group: {
            _id: '$year',
            count: { $sum: 1 }
          }
        },
        // Sort by year descending
        { $sort: { count: -1, _id: -1 } }
      ],
      // The `authors` property will be the output of the 'count by authors' pipeline
      authors: [
        // Count the number of books published in a given year
        {
          $group: {
            _id: '$author',
            count: { $sum: 1 }
          }
        },
        // Sort by author name ascending
        { $sort: { count: -1, _id: 1 } }
      ]
    }
  }
]);

// Output
{
    "years" : [
        {
            "_id" : 2016,
            "count" : 1
        },
        {
            "_id" : 2015,
            "count" : 1
        },
        {
            "_id" : 2013,
            "count" : 1
        },
        {
            "_id" : 1984,
            "count" : 1
        }
    ],
    "authors" : [
        {
            "_id" : "Arnold Schwarzenegger",
            "count" : 2
        },
        {
            "_id" : "Valeri Karpov",
            "count" : 2
        }
    ]
}

One big advantage of using $facet is that you can combine $match with $facet. If you execute a separate aggregation for every facet, you have to execute the same $match query for every facet, which can be a lot of wasted work. With $facet, you can execute $match once and use then $facet.

db.books.aggregate([
  // Just Arnold's books
  { $match: { author: 'Arnold Schwarzenegger' } },
  {
    $facet: { /** same $facet stage as before */ }
  }
]);

What about grouping by price? That's what the $bucket operator does for you. Here's $bucket without $facet.

db.books.aggregate([
  {
    $bucket: {
      // Bucket by price
      groupBy: '$price',
      // With 3 price ranges: [0, 10), [10, 25), [25, 50)
      boundaries: [0, 10, 25, 50]
    }
  }
]);

// Output. The `_id` is the lower bound of the price range
{ "_id" : 10, "count" : 2 }
{ "_id" : 25, "count" : 2 }

If you don't care about setting up ranges manually, $bucketAuto will automatically break your documents up into a specified number of buckets.

db.books.aggregate([
  {
    $bucketAuto: {
      // Bucket by price
      groupBy: '$price',
      // And break it up into 2 buckets
      buckets: 2
    }
  }
]);

// Output. `_id` represents the range of each bucket
{ "_id" : { "min" : 10, "max" : 32 }, "count" : 2 }
{ "_id" : { "min" : 32, "max" : 37 }, "count" : 2 }

You can combine $bucketAuto with the established $facet query to get counts for every facet:

db.books.aggregate([
  {
    $facet: {
      // The `years` property will be the output of the 'count by year' pipeline
      years: [/* ... */],
      // The `authors` property will be the output of the 'count by authors' pipeline
      authors: [/* ... */],
      price: [
        {
          $bucketAuto: {
            // Bucket by price
            groupBy: '$price',
            // And break it up into 2 buckets
            buckets: 2
          }
        }
      ]
    }
  }
]);

// Output
{
    "years" : [/* ... */],
    "authors" : [/* ... */],
    "price" : [
        {
            "_id" : {
                "min" : 10,
                "max" : 32
            },
            "count" : 2
        },
        {
            "_id" : {
                "min" : 32,
                "max" : 37
            },
            "count" : 2
        }
    ]
}

`$facet` and `$bucket` in Node.js

There are no version restrictions that I'm aware of on using $facet, $bucket, and $bucketAuto in mongodb and mongoose. The below scripts work at least as far back as mongodb@2.1.0 and mongoose@4.5.0:

const mongodb = require('mongodb');

mongodb.MongoClient.connect('mongodb://localhost:27017/test').
  then(db => db.collection('books').aggregate([
    {
      $facet: {
        // The `years` property will be the output of the 'count by year' pipeline
        years: [
          // Count the number of books published in a given year
          {
            $group: {
              _id: '$year',
              count: { $sum: 1 }
            }
          },
          // Sort by year descending
          { $sort: { count: -1, _id: -1 } }
        ],
        // The `authors` property will be the output of the 'count by authors' pipeline
        authors: [
          // Count the number of books published in a given year
          {
            $group: {
              _id: '$author',
              count: { $sum: 1 }
            }
          },
          // Sort by author name ascending
          { $sort: { count: -1, _id: 1 } }
        ],
        price: [
          {
            $bucketAuto: {
              // Bucket by price
              groupBy: '$price',
              // And break it up into 2 buckets
              buckets: 2
            }
          }
        ]
      }
    }
  ]).toArray()).
  then(res => console.log(JSON.stringify(res, null, '  '))).
  catch(error => console.error('error', error));

const mongoose = require('mongoose');

mongoose.connect('mongodb://localhost:27017/test');

var Book = mongoose.model('Book', new mongoose.Schema());

Book.aggregate([
  {
    $facet: {
      // The `years` property will be the output of the 'count by year' pipeline
      years: [
        // Count the number of books published in a given year
        {
          $group: {
            _id: '$year',
            count: { $sum: 1 }
          }
        },
        // Sort by year descending
        { $sort: { count: -1, _id: -1 } }
      ],
      // The `authors` property will be the output of the 'count by authors' pipeline
      authors: [
        // Count the number of books published in a given year
        {
          $group: {
            _id: '$author',
            count: { $sum: 1 }
          }
        },
        // Sort by author name ascending
        { $sort: { count: -1, _id: 1 } }
      ],
      price: [
        {
          $bucketAuto: {
            // Bucket by price
            groupBy: '$price',
            // And break it up into 2 buckets
            buckets: 2
          }
        }
      ]
    }
  }
]).
then(res => console.log(res)).
catch(error => console.error('error', error));

Mongoose 4.8.0 will include a helper for $facet for its chainable aggregation pipeline builder:

MyModel.aggregate().facet({
  field1: [/* pipeline 1 */],
  field2: [/* pipeline 2 */]
}).exec();

Moving On

MongoDB 3.4 has an incredible variety of new aggregation features, including $facet and $graphLookup. If you have graph data or an ecommerce search page, make sure to upgrade to take advantage of these new features so you can delete hundreds of lines of unnecessary code. Next week, I'll cover the new decimal type. The decimal type enables accurate base 10 arithmetic and avoids the myriad fundamental flaws of floating point arithmetic, so you can finally stop copy/pasting _.round(price, 2).

Found a typo or error? Open up a pull request! This post is available as markdown on Github

A Node.js Perspective on MongoDB 3.4: $facet

What is Faceted Search?

Introducing $facet and $bucket

$facet and $bucket in Node.js

Moving On

Introducing `$facet` and `$bucket`

`$facet` and `$bucket` in Node.js