Things on my mind in 2015

I’m attempting to restart my blogging efforts in 2015. For this first post, I wanted to capture the major technologies that are on my mind:

Golang - after developing some decent sized projects in Python, I really appreciate Go’s static typing. I love the channel concept and easy ability to duck type. Here’s hoping we get a Go debugger in 2015!

Docker - I’ve been excited about Docker for almost a year now. The latest tools around Docker clustering (Docker Swarm) and Docker orchestration (Fig) further my enthusiasm.

Microservices - I’m finally getting an opportunity to build a 24/7 service at work, and am beginning to appreciate the power of microservice architectures like those at Netflix.

Deep Belief Networks – I’ve been a fan of Neural Networks since my undergrad days more than a decade ago. The advancements in training deep neural networks has revolutionized domains such as speech and object recognition. I’m excited to learn more about this and other Machine Learning algorithms during 2015.

I wish everyone a fantastic 2015!

Hosting my WordPress blog entirely on Amazon S3

Recently, I started going over my monthly bills to see if there was an easy way to cutdown on expenses. One entry that stuck out like a sore thumb was the $60 a month I spend (or should I say, spent) for a VM on EC2. I’ve used this VM for educational projects, hackathons, and testing things in a cloud environment. But none of these uses justified having a continuously running VM. With one exception: my blog is also hosted on this VM, and just shutting down my EC2 instance would have meant losing the blog. So I had to migrate the blog somewhere. Ever since I saw Werner Vogel’s post on how he moved his blog over to Amazon’s S3 service, I’ve been meaning to try that out. However, I wanted to keep using WordPress to edit my blog. I had an idea on how to do this and yesterday I finally had time to do it!

S3 stands for Simple Storage Service. It is a cheap, durable web service offered as part of Amazon AWS. Since Feb 2011, they’ve added the ability to hosts static web sites using an S3 bucket. There are two caveats that come with this: the site is static, and you can’t officially point your naked domain at the S3 service. Dealing with the second problem isn’t that bad and has several solutions. I use GoDaddy for my nameservers, and I simply forwarded my naked domain (slowping.com) to the www.slowping.com. Of course, I modified my www CNAME record to point at my S3 provided “website endpoint”.

All that is left is to resolve the fact that S3 can only host static files but I want to continue to use WordPress to edit my blog. The idea I had centered around having a local VM that mirrored my existing WordPress installation. I then proceeded to edit the hosts file on my home machine(s) whenever I wanted to access the WordPress blog on my local VM. The edit is very simple – I just add my domain name and the IP of the local VM. This makes editing my blog via a web browser very simple. Whenever I want to see the “actual” static blog site on S3, I just undo the hosts file modification. Simple as that!

Did I forget anything? Oh yes … the very important part of converting the dynamic WordPress site into a static one. I performed all these steps on the WordPress installation on my local VM. First, I disabled all the existing plugins (I didn’t have many and this probably made my life easier). I then installed/setup Disqus for commenting and Statcounter tracking hits. Both were fairly easy to do. I tried exporting the comments in my old WordPress system to Disqus, but that didn’t work out for me. I didn’t get any error messages – rather the Disqus system says it takes them 24 hours to process exports. If I get time, I’ll try to debug this further. Otherwise, goodbye old comments :’( Once I had the plugins sorted out, I ran Ammon Shepherd’s awesome “wpstatic” script to create a folder with a static version of my blog. In fact, I wrote a small script that called Ammon’s script first, and then called the “s3cmd” tool to upload the static files into S3.

The overall experience wasn’t that painful. Two big advantages I will get from hosting the blog on S3 are lowered cost and good scalability. If any of my future posts get on HN, it’ll be a good test to see if the new setup holds up under load. In terms of cons, I need to backup my local VM periodically. Also, the hosts file technique I mentioned works well when I want to edit my blog at home. Right now, I have no way to edit the WordPress blog when I am away from home. A few things don’t work correctly yet (e.g. the search box in the blog header, broke my old permalink structure). I’ll continue to iterate and improve.

References:

1) No Server Required – Jekyll & Amazon S3

2) Host Your Static Website on Amazon S3

3) Converting WordPress to static HTML

Self-driving Lego Mindstorms Robot

I was inspired by David Singleton’s blog post earlier this year on his weekend project – a self-driving RC car. I decided to independently replicate this project using a Lego Mindstorms robot instead of an RC car. I had almost no experience with Neural Networks, so I wasn’t sure I’d be able to make it work at all … but lo and behold … it worked.

Lego Mindstorms Robot

Setup

All my code (written in Python) runs on a Windows 7 machine. This box talks to the Lego NXT brick via Bluetooth. I use an Android phone for grabbing camera images. The phone is connected to my home network via wifi.

Driving the Lego Mindstorms robot

I used the fantastic nxt-python project to interface by computer to the Lego Mindstorms robot. The physical robot was a slight modification of the “Quick Start” robot in the Mindstorms instruction manual. This is a tracked robot, with independent motors controlling the left and right tracks. I added a “holder” where I could securly place my camera phone. I implemented a “keyboard teleop” mode, wherein I type commands into a python CLI and get my robot to make the appropriate movement.

Simple Holder Mechanism

Getting images from the camera phone

I initially thought about writing my own app to capture images from my phone (an Android Nexus S). However, I found a free app called IP Webcam that allowed me to take snapshots from the phone via a URL. The lowest resolution at which I could get images was 176×144; I processed these images on the desktop before sending them to the neural network.

Processing the images on desktop

I used the Python Imaging Library to first convert the images to greyscale and then lower their resolution to 100×100.

Enter the Neural Network

This was the key part of the project. I went through Andrew Ng’s lectures on Neural Networks, and played around with the assignments on the topic (recognizing hand-written digits using Neural Networks). Luckily, I found the pybrain project, which provides a very easy interface for using Neural Nets in Python. Similar to David, I used a three level network. The first layer had 100×100 nodes. The hidden layer had 64 units (I used the same number that worked for David). Unlike David, I only had three output units – forward, left and right.

Training the “brain”

I built a “driving course” in my living room. I drove around the course only 10 times and trained network for about an hour (I was very excited to know if this was going to work and couldn’t wait :) ).

Auto-drive mode

The code for auto-drive mode was pretty similar to training mode. I took an image from the camera phone, processed it (greyscale and lowered the res to 100×100) and activated it against the neural net I had trained. The output is one of three commands (forward, left or right), which I send to the same “drive(cmd)” function I used in training mode. I put a 250 ms sleep between each command to ensure the robot had enough time to complete its motion.

The Self-Driving Lego Mindstorms Robot comes to life!

It worked! Mostly. About 2/3 of the time, the robot could go through the entire course without any “accidents”. About 1/3 of the time, the robot’s motion takes it to a point where it can only see the track (sheets of white paper). When it gets to that state, it keeps going forward instead of making a turn. I think if I implement a “spin-90-degrees” command, that would help the robot get out of that situation. But all-in-all, I’m pretty happy with the results.

Next steps

I’ll do a more detailed write up and post my code once I get some more time. I might also try to modify the physical robot to improve the position of the camera. I suspect a better camera view would improve the driving performance of my robot even more.

Credits

Big thanks go to my fiance who puts up with me and my geeky habits. She also took the video I posted. I’d like to reiterate that this project was inspired by David Singleton’s self-driving RC car so many thanks go to him. A big big thank you to Prof. Andrew Ng for the amazing Stanford Machine Learning class that is freely provided online. And a thanks to the following projects that made mine possible: nxt-python, pybrain, python-imaging-library, and the free IP Webcam Android App.

Ten things I didn’t know about MongoDB

On Tuesday of this week, I had the pleasure of attending MongoNYC. I’ve dabbled a bit with MongoDB and am a big fan of the project. The conference had 4 tracks split across different rooms so I only managed to go to a quarter of the talks. FYI: all the talks are going to be put out on video soon if anyone cares to go through them.

Anyways, this post is about 10 things I learned at the MongoDB conference that surprised me. While some of them are limitations, I remain very impressed with the project and encourage you to check it out.

1) MongoDB uses 1 BFGL (big f***ing global lock) for separating reads and writes. That is, when one thread is writing, no one else can do a read or write. However, you can have multiple reads going on at the same time (when no writes are going on). This is fairly standard. What is surprising is that there is a single global lock across collections! That means, you currently don’t get concurrency across write operations on different collections. That said, the 10gen team said that improving concurrency is their main goal for the rest of the year. The situation should improve with the 2.0 release (which is supposed to come out soon).

2) MongoDB does not have a statistical query plan optimizer! Rather, the first time a query is issued , the database runs 3 alternative plans concurrently. The first plan that finishes is used for subsequent queries. Every little while, this process is redone to make sure that the characteristics of the data haven’t changed the ideal query plan.

3) The Mongos (pronounced “mongo”-”ess”) process is only needed when you do sharding. If you only have unsharded collections and are doing replication,  you actually don’t need to have a mongos. So what if you want to read from secondaries in an unsharded but replicated setup? Well … you specify “slaveok” and the driver can talk to a secondary. I’m not completely sure how this setup would behave with respect to load-balancing. As I understood it, mongos provides routing services for shards as well as load-balancing features.

4) MongoDB supports Master-Slave replication as well as Replica-Sets. Master-Slave is the “old” way of doing things in MongoDB and, while it works, it is somewhat “deprecated”. That’s fine because Replica-Sets are a superset of Master-Slave replication anyways. A lot of terminology in tools and commands follows the old scheme. This doesn’t bother me (for some reason, I actually like it)

5) MongoDB supports “slave-delay” in secondary replicas. That means, you can have a secondary that trails the primary by some period of time. It is important that the oplog in the primary have enough space to hold all of the operations that occurred in the meantime. This is standard stuff but I was surprised that one of standard commands (rs.isMaster()) that show you all the status of a replica set does not even include the delayed secondaries. The logic I was given was that these secondaries are different … they can never become primaries. Nor, should they be used for reads. The main purpose of this feature is for disaster recovery and reducing the pain of “butter fingers” (for instance, deleting a whole bunch of documents by mistake).

6) MongoDB should likely not be run in a 32-bit environment. This is probably obvious to a lot of people but I am ashamed to say that I missed it. When I checked out the logs for one of my test apps (running on a 32-bit Ubuntu machine), low and behold, I saw warning messages related to not running MongoDB on a 64 bit machine.

7) MongoDB automatically tells you in the log about any query that takes longer than 100ms to run. This is NEAT!

8 ) MongoDB can be setup to do profiling if a query starts taking longer than a certain amount of time. I was told that there exist production environments where the admins leave profiling turned on with high thresholds to guard against unexpected  situations where performance is degraded.

9) MongoDB doesn’t support multi-master replication. They think this is good because it keeps things logically simple. Also, the system is far simpler as it doesn’t have to worry about write-collisions across multiple masters. I like this.

10) In a replicated setup, MongoDB supports something called an Arbiter whose job is to break ties. The Arbiter doesn’t actually have any data on it. This is related to the consensus algorithm used in MongoDB, which is thankfully not as complex as Paxos.

Overall, I had a great time at MongoNYC. It was a gathering of highly skilled systems people. I learned a lot and am even more excited about the future of MongoDB!

This is the first real post on my tech blog. If you have thoughts, comments, or suggestions for improvement, please send them my way!

Hello World!

My plan is to use this blog to capture my latest thoughts on distributed systems and mobile computing. I’m also planning to post related papers and links to relevant articles. Specifically, the topics I intend to cover include:

  1. Traditional techniques for building scalable and reliable web applications (e.g. sharding, replication)
  2. SQL databases in general, and MySQL in particular
  3. NoSQL databases (I’m a big fan of MongoDB)
  4. Large scale data processing (via MapReduce or Hadoop)
  5. Data Storage topics (e.g. HDFS, BigTable, GFS)
  6. Cloud Computing (e.g. EC2, Google App Engine)
  7. Asynchronous Programming (ala NodeJS)
  8. Programming Languages
  9. HTML5, CSS, JSON and Javascript
  10. Tools to measure network and application performance (including stress testing.)
  11. Dealing with spatial data (storage and retrieval, visualizations, etc.)

On the mobile computing front, I plan to post on Android and iOS development. Apart from general application development, I’m most excited about augmented reality applications, location-based apps, 3D and TV integration.

I hope the information on this blog will be useful to others. For myself, I hope to get better at my craft and master the art of systems building.