On Tuesday of this week, I had the pleasure of attending MongoNYC. I’ve dabbled a bit with MongoDB and am a big fan of the project. The conference had 4 tracks split across different rooms so I only managed to go to a quarter of the talks. FYI: all the talks are going to be put out on video soon if anyone cares to go through them.
Anyways, this post is about 10 things I learned at the MongoDB conference that surprised me. While some of them are limitations, I remain very impressed with the project and encourage you to check it out.
1) MongoDB uses 1 BFGL (big f***ing global lock) for separating reads and writes. That is, when one thread is writing, no one else can do a read or write. However, you can have multiple reads going on at the same time (when no writes are going on). This is fairly standard. What is surprising is that there is a single global lock across collections! That means, you currently don’t get concurrency across write operations on different collections. That said, the 10gen team said that improving concurrency is their main goal for the rest of the year. The situation should improve with the 2.0 release (which is supposed to come out soon).
2) MongoDB does not have a statistical query plan optimizer! Rather, the first time a query is issued , the database runs 3 alternative plans concurrently. The first plan that finishes is used for subsequent queries. Every little while, this process is redone to make sure that the characteristics of the data haven’t changed the ideal query plan.
3) The Mongos (pronounced “mongo”-”ess”) process is only needed when you do sharding. If you only have unsharded collections and are doing replication, you actually don’t need to have a mongos. So what if you want to read from secondaries in an unsharded but replicated setup? Well … you specify “slaveok” and the driver can talk to a secondary. I’m not completely sure how this setup would behave with respect to load-balancing. As I understood it, mongos provides routing services for shards as well as load-balancing features.
4) MongoDB supports Master-Slave replication as well as Replica-Sets. Master-Slave is the “old” way of doing things in MongoDB and, while it works, it is somewhat “deprecated”. That’s fine because Replica-Sets are a superset of Master-Slave replication anyways. A lot of terminology in tools and commands follows the old scheme. This doesn’t bother me (for some reason, I actually like it)
5) MongoDB supports “slave-delay” in secondary replicas. That means, you can have a secondary that trails the primary by some period of time. It is important that the oplog in the primary have enough space to hold all of the operations that occurred in the meantime. This is standard stuff but I was surprised that one of standard commands (rs.isMaster()) that show you all the status of a replica set does not even include the delayed secondaries. The logic I was given was that these secondaries are different … they can never become primaries. Nor, should they be used for reads. The main purpose of this feature is for disaster recovery and reducing the pain of “butter fingers” (for instance, deleting a whole bunch of documents by mistake).
6) MongoDB should likely not be run in a 32-bit environment. This is probably obvious to a lot of people but I am ashamed to say that I missed it. When I checked out the logs for one of my test apps (running on a 32-bit Ubuntu machine), low and behold, I saw warning messages related to not running MongoDB on a 64 bit machine.
7) MongoDB automatically tells you in the log about any query that takes longer than 100ms to run. This is NEAT!
8 ) MongoDB can be setup to do profiling if a query starts taking longer than a certain amount of time. I was told that there exist production environments where the admins leave profiling turned on with high thresholds to guard against unexpected situations where performance is degraded.
9) MongoDB doesn’t support multi-master replication. They think this is good because it keeps things logically simple. Also, the system is far simpler as it doesn’t have to worry about write-collisions across multiple masters. I like this.
10) In a replicated setup, MongoDB supports something called an Arbiter whose job is to break ties. The Arbiter doesn’t actually have any data on it. This is related to the consensus algorithm used in MongoDB, which is thankfully not as complex as Paxos.
Overall, I had a great time at MongoNYC. It was a gathering of highly skilled systems people. I learned a lot and am even more excited about the future of MongoDB!
This is the first real post on my tech blog. If you have thoughts, comments, or suggestions for improvement, please send them my way!