Mongoid batch inserts

how-to
Mar 27, 20134 mins

In SQL land, all databases support batch inserts. Batch inserts are an effective and efficient mechanism to insert a lot of similar data. That is, instead of issuing x insert statements, you execute 1 insert with x records. This is much more efficient because the insert statement doesn’t need to be re-parsed x times, there is only 1 network trip as opposed to x, and in the case of transactions, there is only 1 transaction instead of x. When compared to x inserts, batch inserts are always faster.

As it turns out, MongoDB supports batch inserts! And just like in SQL land, Mongo’s batching feature is much faster at inserting a lot of data in one insert rather than x inserts.

For example, the Mongo Ruby driver’s insert method takes a collection; thus, you can insert an array of hashes quite efficiently. Even if you are using a ODM like Mongoid, you can still perform batch inserts as all you need to do is get a reference to the model object’s underlying collection and then issue an insert with an array of hashes matching the collection’s intended document structure.

For instance, to insert a collection of Tag models (each having 3 fields: name, system_tag, and account_id) in one fell swoop I can do the following:

Batch inserts with Mongoid model example
<span class='line-number'>1</span>
<span class='line-number'>2</span>
<code class='ruby'><span class='line'><span class="n">tags</span> <span class="o">=</span> <span class="o">[</span><span class="s1">'a'</span><span class="p">,</span> <span class="s1">'bunch'</span><span class="p">,</span> <span class="s1">'of'</span><span class="p">,</span> <span class="s1">'tags'</span><span class="o">].</span><span class="n">collect</span> <span class="p">{</span> <span class="o">|</span><span class="n">tag</span><span class="o">|</span> <span class="p">{</span><span class="nb">name</span><span class="p">:</span> <span class="n">tag</span><span class="p">,</span> <span class="n">system_tag</span><span class="p">:</span> <span class="kp">true</span><span class="p">,</span> <span class="n">account_id</span><span class="p">:</span> <span class="nb">id</span><span class="p">}</span> <span class="p">}</span>
</span><span class='line'><span class="no">Tag</span><span class="o">.</span><span class="n">collection</span><span class="o">.</span><span class="n">insert</span> <span class="n">tags</span>
</span>

In the code above, the insert takes a collection of hashes; what’s more, the insert is tied to the tags collection via the Tag.collection call.

Batch inserts are always faster if you have a lot of similar documents – in our case, we saw a tremendous performance increase when employing batching.

andrew_glover

When Andrew Glover isn't listening to “Funkytown” or “Le Freak” he enjoys speaking on the No Fluff Just Stuff Tour. He also writes articles for multiple online publications including IBM's developerWorks and O'Reilly’s ONJava and ONLamp portals. Andrew is also the co-author of Java Testing Patterns, which was published by Wiley in September 2004; Addison-Wesley’s Continuous Integration; and Manning’s Groovy in Action.

More from this author