Collations are another great new feature in MongoDB 3.4. You can think of collations as a way to configure how MongoDB orders and compares strings. In this article, I'll demonstrate some basic uses of collations and show how to use them in Node.js with the MongoDB driver and mongoose.
Ignoring Diacritics
At a previous company I was tasked with implementing a city search bar much like Airbnb's:
The problem is how to make "San Jose" match "San José" with the acute accent over the 'e'. Before collations, your best bet would be to use a module like diacritics to remove all diacritics from the city. In practice you would have a displayName
that would include diacritics for display, and a searchName
with diacritics removed for searching.
With collations, searching with diacritics is easy. Let's say you insert 2 documents, one with "San Jose" as the California city is commonly spelled, and another with "San José".
> db.cities.insertMany([{ name: 'San Jose' }, { name: 'San José' }])
{
"acknowledged" : true,
"insertedIds" : [
ObjectId("58af53b1dd6258670ac02a5b"),
ObjectId("58af53b1dd6258670ac02a5c")
]
}
If you use the new collation()
function, you can make MongoDB ignore the differences in diacritics using the strength
option. The collation arguments take experience to become comfortable with. For now, remember that strength: 1
means MongoDB will ignore case and diacritics.
> db.cities.find({ name: 'San Jose' }).collation({ locale: 'en_US', strength: 1 }).pretty()
{ "_id" : ObjectId("58af560ce96c6b1ca7e5b922"), "name" : "San Jose" }
{ "_id" : ObjectId("58af560ce96c6b1ca7e5b923"), "name" : "San José" }
>
Keep in mind that collations do not currently work with regular expression search, so db.cities.find({ name: /^San Jose/ })
will not match "San José".
Case Insensitive Sorting
Collations aren't just useful for matching, they also help with sorting. By default MongoDB sorts strings by their characters' ASCII order (modulo non-ASCII characters), so 'Alpha' comes before 'Zeta' comes before '_' comes before 'alpha'.
> db.words.insertMany([{ v: 'Alpha', }, { v: 'Zeta' }, { v: '_' }, { v: 'alpha' }, { v: 'zeta' }])
{
"acknowledged" : true,
"insertedIds" : [
ObjectId("58af6376b6f40a81313d78db"),
ObjectId("58af6376b6f40a81313d78dc"),
ObjectId("58af6376b6f40a81313d78dd"),
ObjectId("58af6376b6f40a81313d78de"),
ObjectId("58af6376b6f40a81313d78df")
]
}
> db.words.find({}).sort({ v: 1 })
{ "_id" : ObjectId("58af6376b6f40a81313d78db"), "v" : "Alpha" }
{ "_id" : ObjectId("58af6376b6f40a81313d78dc"), "v" : "Zeta" }
{ "_id" : ObjectId("58af6376b6f40a81313d78dd"), "v" : "_" }
{ "_id" : ObjectId("58af6376b6f40a81313d78de"), "v" : "alpha" }
{ "_id" : ObjectId("58af6376b6f40a81313d78df"), "v" : "zeta" }
>
The caseLevel
option, if set, sorts so that 'alpha' and 'Alpha' come before 'zeta' and 'Zeta'.
> db.words.find({}).sort({ v: 1 }).collation({ locale: 'en_US', caseLevel: true })
{ "_id" : ObjectId("58af6376b6f40a81313d78dd"), "v" : "_" }
{ "_id" : ObjectId("58af6376b6f40a81313d78de"), "v" : "alpha" }
{ "_id" : ObjectId("58af6376b6f40a81313d78db"), "v" : "Alpha" }
{ "_id" : ObjectId("58af6376b6f40a81313d78df"), "v" : "zeta" }
{ "_id" : ObjectId("58af6376b6f40a81313d78dc"), "v" : "Zeta" }
>
Ordering Numeric Strings
Another annoying issue with sorting strings is handling numbers. For example, let's say you insert a bunch of files named 'invoice_1', 'invoice_2', 'invoice_10', and 'invoice_100'. In conventional sort order, 'invoice_2' will come after 'invoice_10' and 'invoice_100'.
> db.files.insertMany([{ name: 'invoice_1' }, { name: 'invoice_2' }, { name: 'invoice_10' }, { name: 'invoice_100' }])
> db.files.find().sort({ name: 1 })
{ "_id" : ObjectId("58af6568b6f40a81313d78e0"), "name" : "invoice_1" }
{ "_id" : ObjectId("58af6568b6f40a81313d78e2"), "name" : "invoice_10" }
{ "_id" : ObjectId("58af6568b6f40a81313d78e3"), "name" : "invoice_100" }
{ "_id" : ObjectId("58af6568b6f40a81313d78e1"), "name" : "invoice_2" }
If you turn on the numericOrdering
flag, MongoDB will sort numeric substrings based on their numeric value rather than by ASCII characters. In other words, the order will be 'invoice_1', 'invoice_2', 'invoice_10', 'invoice_100', which makes more sense in this case.
> db.files.find().sort({ name: 1 }).collation({ locale: 'en_US', numericOrdering: true })
{ "_id" : ObjectId("58af6568b6f40a81313d78e0"), "name" : "invoice_1" }
{ "_id" : ObjectId("58af6568b6f40a81313d78e1"), "name" : "invoice_2" }
{ "_id" : ObjectId("58af6568b6f40a81313d78e2"), "name" : "invoice_10" }
{ "_id" : ObjectId("58af6568b6f40a81313d78e3"), "name" : "invoice_100" }
>
Collations in Node.js
Version 2.2.10 of the MongoDB driver and Mongoose 4.8.0 include helpers for collations. Here's an example of using a collation with find()
using the MongoDB driver:
const mongodb = require('mongodb');
let db
mongodb.MongoClient.connect('mongodb://localhost:27017/test').
then(_db => { db = _db }).
then(() => db.dropDatabase()).
then(() => db.collection('files').insertMany([
{ name: 'invoice_1' },
{ name: 'invoice_2' },
{ name: 'invoice_10' },
{ name: 'invoice_100' }
])).
then(() => db.collection('files').
find({}, { collation: { locale: 'en_US', numericOrdering: true } }).
sort({ name: 1 }).
toArray()
).
then(docs => console.log(docs));
// Output
[ { _id: 58af7b819ed98b28c1f2bb52, name: 'invoice_1' },
{ _id: 58af7b819ed98b28c1f2bb53, name: 'invoice_2' },
{ _id: 58af7b819ed98b28c1f2bb54, name: 'invoice_10' },
{ _id: 58af7b819ed98b28c1f2bb55, name: 'invoice_100' } ]
And using the mongoose query builder's collation()
helper function:
const mongoose = require('mongoose');
mongoose.connect('mongodb://localhost:27017/test');
var File = mongoose.model('File', new mongoose.Schema({ name: String }));
File.find().sort({ name: 1 }).collation({ locale: 'en_US', numericOrdering: true }).
then(docs => console.log(docs));
// Output
[ { _id: 58af7b819ed98b28c1f2bb52, name: 'invoice_1' },
{ _id: 58af7b819ed98b28c1f2bb53, name: 'invoice_2' },
{ _id: 58af7b819ed98b28c1f2bb54, name: 'invoice_10' },
{ _id: 58af7b819ed98b28c1f2bb55, name: 'invoice_100' } ]
Moving On
Collations are powerful, but far from the only great new feature in MongoDB 3.4. I previously wrote about the Decimal type, the $facet
aggregation operator, and the $graphLookup
aggregation operator. Check out those articles and learn how to take advantage of MongoDB 3.4 in Node.js!