MongoDB 3.2 introduced the $lookup
aggregation framework pipeline stage, which let you pull documents from a separate collection into your aggregation framework pipeline. Before MongoDB 3.6, $lookup
could only do left outer joins with equality matching. In other words, suppose you had a collection of users, a collection of stocks, and a collection that mapped users to the stocks they hold. The $lookup
stage can give you an array of stocks a user holds. But in MongoDB 3.2 and 3.4, $lookup
could not give you just the stocks that had gone up in price since the customer bought them.
const { MongoClient } = require('mongodb');
run().catch(error => console.error(error.stack));
async function run() {
const client = await MongoClient.connect('mongodb://localhost:27017/test');
const db = client.db('test');
await db.dropDatabase();
// Create 2 users
const users = [
{ name: 'Benjamin Graham' },
{ name: 'Warren Buffett' }
];
await db.collection('User').insertMany(users);
// Create 4 stocks with their approximate `currentPrice`
const stocks = [
{ ticker: 'AAPL', currentPrice: 172.5 },
{ ticker: 'ORCL', currentPrice: 51 },
{ ticker: 'BRK.B', currentPrice: 202 },
{ ticker: 'LMT', currentPrice: 360 }
];
await db.collection('Stock').insertMany(stocks);
// Create a many-to-many mapping of users to the stocks they hold, with
// the `basePrice` that they originally bought the stock at
const stockHoldings = [
{ userId: users[0]._id, stock: 'AAPL', shares: 5, basePrice: 170 },
{ userId: users[0]._id, stock: 'ORCL', shares: 10, basePrice: 50 },
{ userId: users[1]._id, stock: 'BRK.B', shares: 5, basePrice: 200 },
{ userId: users[1]._id, stock: 'LMT', shares: 5, basePrice: 370 }
];
await db.collection('StockHolding').insertMany(stockHoldings);
// The only way to use `$lookup` in MongoDB 3.2 and 3.4
const docs = await db.collection('StockHolding').aggregate([
{
$lookup: {
from: 'Stock',
localField: 'stock',
foreignField: 'ticker',
as: 'stock'
}
},
{
$unwind: '$stock'
},
{
$project: {
_id: 0,
ticker: '$stock.ticker',
currentPrice: '$stock.currentPrice',
basePrice: 1,
shares: 1
}
}
]).toArray();
// [ { shares: 5, basePrice: 170, ticker: 'AAPL', currentPrice: 172.5 },
// { shares: 10, basePrice: 50, ticker: 'ORCL', currentPrice: 51 },
// { shares: 5, basePrice: 200, ticker: 'BRK.B', currentPrice: 202 },
// { shares: 5, basePrice: 370, ticker: 'LMT', currentPrice: 360 } ]
console.log(docs);
}
MongoDB 3.6 introduces support for much more sophisticated lookups with the new $expr
operator. In particular, $expr
allows you to do a $lookup
that only pulls stock holdings that have appreciated in value. In this article, I'll show you how to use the $expr
operator with queries, as well as with $lookup
.
I'll use Node.js and the MongoDB Node.js driver directly. You should use MongoDB driver >= 3.0.0
or Mongoose >= 5.0.0
because those are the the versions that support MongoDB 3.6.
Using $expr
With Queries
The $expr
operator allows you to query based on computed properties. This is especially powerful with $lookup
, but is also useful for queries. For example, let's say you wanted to find all stock holdings where the total cost of the stock was more than $1000. In other words, find all documents in the 'StockHolding' collection where shares * basePrice > 1000
. In older versions of MongoDB you could do this with the $where
operator, but the $where
operator suffers from numerous restrictions, performance limitations, and security issues. MongoDB recommends using $expr
as a replacement for $where
.
Here's how you structure this query using $expr
. The $expr
operator gives you access to aggregation operators, even in your queries. For this query, you can use the $gt
aggregation operator and the $multiply
operator. Note that the $gt
aggregation operator syntax differs slightly from that of the $gt
query operator, which is a distinct operator.
const docs = await db.collection('StockHolding').find({
// Equivalent to `$where: 'this.shares * this.basePrice > 1000'`
$expr: {
$gt: [
{ $multiply: ['$shares', '$basePrice'] },
1000
]
}
}).toArray();
// [ { _id: 5a8754447aa7d81e1082bcb2,
// userId: 5a8754447aa7d81e1082bcaa,
// stock: 'LMT',
// shares: 5,
// basePrice: 370 } ]
console.log(docs);
$expr
and $lookup
With a single query, you can find the stock holdings where the total cost of the holding was at least $1000. What about stock holdings where the current value of the holding is at least $1000? With the data as structured, you need to use $lookup
because the stock's currentPrice
is not tracked in the individual holding document. You need to do a $lookup
that matches stocks by ticker and by the product of shares
and currentPrice
.
The new $lookup
syntax uses the let
and pipeline
properties as a replacement for localField
and foreignField
. Instead of doing an exact equality match looking for documents in the from
collection where the value of foreignField
is equal to the value of localField
, let
and pipeline
let you define more sophisticated lookups. Specifically, let
defines which properties from the local collection (in this case 'StockHolding') that you want to use, and pipeline
defines a nested aggregation pipeline that computes the output.
For example, below is a basic $lookup
that looks up the corresponding stock for each holding using localField
and foreignField
.
const docs = await db.collection('StockHolding').aggregate([
{
$lookup: {
from: 'Stock',
localField: 'stock',
foreignField: 'ticker',
as: 'stock'
}
}
]).toArray();
Below is an equivalent aggregation using let
and pipeline
. Note that variables declared in let
may only be used in a $expr
operator.
const docs = await db.collection('StockHolding').aggregate([
{
$lookup: {
from: 'Stock',
// Escape 'let' because its a reserved word in JS
// `let` is where you pull in variables from the 'StockHolding' table
// to use in your `$expr`
'let': { stock: '$stock' },
pipeline: [
{
$match: {
$expr: {
// You can only use variables from the `let` property in
// a `$expr` operator
$eq: ['$ticker', '$$stock']
}
}
}
],
as: 'stock'
}
}
]).toArray();
In this simple case, the let
and pipeline
approach is more complex, but the new approach is also more flexible. For example, the below pipeline finds all stock holdings whose current value is greater than $1000 based on the shares
property from the holding document and the currentPrice
property from the stock document.
const docs = await db.collection('StockHolding').aggregate([
{
$lookup: {
from: 'Stock',
// Escape 'let' because its a reserved word in JS
// `let` is where you pull in variables from the 'StockHolding' table
// to use in your `$expr`
'let': { ticker: '$stock', shares: '$shares' },
pipeline: [
{
$match: {
$expr: {
// Weird but `$expr` must have exactly one key, so you need to
// use `$and`, otherwise you get an error 'MongoError: An
// object representing an expression must have exactly one field'
$and: [
// Fields prefixed with one '$' are in the 'Stock' collection,
// that is, the `from` collection. Fields prefixed with '$$'
// are from the `let` above
{
$gt: [{ $multiply: ['$$shares', '$currentPrice'] }, 1000]
},
{ $eq: ['$ticker', '$$ticker'] }
]
}
}
}
],
as: 'stock'
}
},
{
$unwind: '$stock'
}
]).toArray();
// [ { _id: 5a875c8714749222508d3a24,
// userId: 5a875c8714749222508d3a1d,
// stock:
// { _id: 5a875c8714749222508d3a20,
// ticker: 'BRK.B',
// currentPrice: 202 },
// shares: 5,
// basePrice: 200 },
// { _id: 5a875c8714749222508d3a25,
// userId: 5a875c8714749222508d3a1d,
// stock:
// { _id: 5a875c8714749222508d3a21,
// ticker: 'LMT',
// currentPrice: 360 },
// shares: 5,
// basePrice: 370 } ]
console.log(docs);
Moving On
The new $expr
operator and the new let
and pipeline
syntax for $lookup
go a long way towards replicating SQL joins in MongoDB. The pipeline
syntax goes beyond the simple left-outer joins that you could do with localField
and foreignField
, enabling deeply nested lookups and matching based on date and number comparisons. MongoDB 3.6 has several more exciting new aggregation features in addition to the new $lookup
syntax, so make sure you upgrade to take advantage of all of the new features. Just make sure you upgrade to MongoDB driver >= 3.0.0
or mongoose >= 5.0.0
first.