The idea of building and storing user timelines (think Twitter) in Riak confused me at first. It sounds like such a spot-on case for time series databases. Yet Yammer managed to make the idea pretty popular. The whole thing lacked implementation though, because they kept their to themselves, which I don’t blame them for at all.
So let’s have a look at how simple it is to build one. You can see a timeline of all Apple Macintosh products above, but that’s not what we want to build.
Instead we want to build something like the Twitter timeline. A user follows many other users, and wants to look at a feed built from their activities, so something like the timeline shown below.
How do you model a timeline in Riak?
For every user we store one object in Riak. Every timeline contains a list of tweet ids, or whatever activity you’re referencing, or it can contain the whole tweets. Something like this should work:
If you want to store more data, turn the list into an array of hashes containing whatever information is necessary to rebuild the timeline later.
Adding entries to the timeline
To add new entries to a timeline, prepend them to the existing list of entries, here’s some Ruby code to show how it’s done.
The code assumes you take care of followership somewhere else. You can store that data in Riak too, but the code is oblivious to its whereabouts.
Conflicts, siblings, oh my!
The fun starts when two clients update the same timeline, you get a conflict and siblings. The strength of a simple data structure like the example above is that they’re easy to merge together while still keeping ordering based on the ids. The ids are ordered only in this example, Twitter somewhat makes sure they are.
When you get a conflict, a smart Riak library like
Ripple helps you find out about it. To
add on the earlier example, here’s a version of add
that detects conflicts.
Suddenly you have two or more objects instead of one, each containing a different timeline. To turn them into a single list, you merge all of them together, discard the duplicates, and restore order based on the id. Here’s some Ruby to do that.
You iterate over all timeline objects and keep adding unique activities to a new list, returning that when done.
Sort, and done!
All that’s left to do is sort the result.
There’s the whole code. To spare you the pain of having to write your own library, all this is bundled into a gem called riaktivity. If you’re doing Python, the Brett Hoerner has got you covered with timak. Be sure to watch the original talk by Yammer, it’s well worth it.
There’s ups and downs to this approach, and things that need to be taken care of. More on that and modeling data for Riak in general is covered in the Riak Handbook, the definitive guide on Riak.