How to count number of occurrences of each key in a MongoDB collection efficiently (one pass?)?



  • I'm a MongoDB noob and I have a MongoDB collection with millions of documents, each with tens of keys (average ~ 60, large variance) from a set of around 100. I'd like to get a count of each key across the entire collection.

    For example, if the documents in the collection were:

    {"_id":0, "foo": 0, "bar":1, "baz":2}
    {"_id":1, "foo": 0, "baz":7, "qux":11, "quux":13}
    {"_id":2, "foo": 1, "bar":1, "quux":3}
    

    then the desired output would be:

    {"_id":3, "foo":3, "bar":2, "baz":2, "qux":1, "quux": 2}
    

    I can "explode" the collection with $objectToArray, $unwind, $group, and then $count, but it's slow.

    Is there something that could do this efficiently in one pass through the collection? Something like,

    [notional psuedocode]
    output={}
    foreach document:
      foreach key:
        if output.key exists:
          output.key+=1
        else:
          output.key=1
    

    => output: {key1: key1_count, ...}



  • One of the shortest one could be this one:

    const counts = {}
    db.collection.aggregate([
       { $project: { data: { $objectToArray: "$$ROOT" } } }
    ]).forEach(doc => {
       doc.data.forEach(val => {
          counts[val.k] = typeof counts[val.k] == "undefined" ? val.v : counts[val.k] + val.v;
       })
    })
    

    I tried to find solution of $objectToArray, $group but most likely it will not work, because the size of document is limited to 16 MiBytes. Having "millions of documents" this limit will certainly exceeded.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2