Next week is OUCE in Southampton UK, where I'll be presenting on my recent work in time-series storage. I've never been to an OUCE, or Southampton, and I'm really looking forward to it.
Since you get to Southampton via London (from here), and since I haven't been back there in more than a year, I arranged to have a day or two there, and scored an opportunity to talk to the Cassandra London Meetup on Monday.
There is still time to sign up for OUCE, so if you're interested in OpenNMS talks and/or training, and can make your way to Southampton next week, consider doing so.
Unfortunately the meetup has a waiting list, but if you're in the London area and would like to chat Cassandra and/or OpenNMS over a pint, I'm your man.
I had tentatively planned on attending OpenNMS Dev Jam this year, but then I got a new job, and it became mandatory.
The eighth annual Dev Jam will be held June 23-28 at the University of Minnesota.
NP: In Absentia, The Mars Volta
Datastax's yearly Cassandra Summit has grown into a two day event this year, June 11-12 in San Francisco. If you are a user of Cassandra (or are considering it), then you probably want to attend this conference (use code SFSummit25 for 25% off).
I'll be there to present on virtual nodes; Find me if you want to chat about databases, network management, beer, or the Can't Hug Every Cat phenomenon.
NP: The Malkin Jewel, The Mars Volta
I'm fortunate enough to be speaking at Berlin Buzzwords again this year. As usual, chats over beer or currywurst (or both) are always welcome. Hope to see you there!
As usual, if you think we'll cross paths and want to meet for beer, coffee, or currywurst, let me know!
See you in Banja Luka!
This year, the organizers are arranging for a number of hackathons and workshops to precede and follow the main conference. One of those will be a Cassandra event hosted by Acunu and Datastax (date to be announced).
If you're in the Berlin area (or can be), and are interested in search, data analysis and NoSQL (and especially if you're interested in Cassandra), I'd recommend you plan to attend.
I was recently reminded of a quote from a German General, Kurt von Hammerstein-Equord, in 1933:
I divide my officers into four classes; the clever, the lazy, the industrious, and the stupid. Each officer possesses at least two of these qualities. Those who are clever and industrious are fitted for the highest staff appointments. Use can be made of those who are stupid and lazy. The man who is clever and lazy however is for the very highest command; he has the temperament and nerves to deal with all situations. But whoever is stupid and industrious is a menace and must be removed immediately!"
So I give you, Kurt's "Magic" Quadrant. Apply this to your community/project/workplace and see if it doesn't ring true!
Last year, if you'll remember, I did a half-assed job of putting together a musically coordinated christmas light rig, and promised to Do Better this year. Lucky for me I was vague because under-promising made over-delivering a lot easier. :)
Lumen has two modes, record and playback. When launched in record mode, you use the
; keys to "play" the 8 outputs to the music. You can think of it as a sort of reverse Guitar Hero. It's not as easy as it sounds (not for me anyway). It takes practice, at which point you're so sick of the song that if you never hear it again it'll be too soon.
Playback mode can be used to either drive the lights, or to run a simulation to check your work. Here is what it looks like in simulation:
And here is what it looks like For Real:
I've also managed to work in a couple of days after the conference to poke around Berlin, (this will be my first time in Germany). If you're in the area and want to meet up for beers and/or keysignings, or if you have suggestions for sights to see, drop me a line.
April is shaping up to be a busy month; I have several trips lined up.
The hackathon should also be a real treat. It's always great to spend a little face-time with the people you normally only interact with online.
As always, if my travel plans intersect with yours (or if you live in Austin, Columbia, or San Francisco), and you want to chat over a beer (or coffee, tea, etc), drop me a line.
If you've been around for more than a few years, you've probably bore witness to how susceptible the tech industry is to hype. Some new-shiny comes along, people lose their minds, and seemingly overnight The Next Big Thing has spread like wildfire. Like it or not you find yourself bombarded by blog posts, tweets, articles, and water cooler chat from wild-eyed co-workers. Clearly, Ted Dziuba knows what I'm talking about.
Ironically though, what Ted is missing is the corollary, the equally annoying contrarian who takes it upon himself to set the world right by refuting The Next Big Thing, usually with straw-men and a lot of hand-waving. Seriously dude, don't feed the trolls.
Because I can be a contrarian too, let's have a closer look at some of Ted's points.
The idea is that object relational databases like MySQL and PostgreSQL have lapsed their useful lifetimes, and that document-based or schemaless databases are the wave of the future. Never mind of course that MySQL was the perfect solution to everything a few years ago when Ruby on Rails was flashing in the pan. Never mind that real businesses track all of their data in SQL databases that scale just fine. (For Silicon Valley readers, Walmart is a real business, Twitter is not.)
No, the idea is not that relational databases have lapsed their useful lifetimes, and at least in the case you later cite (Cassandra), the data-model is not considered a feature when compared to relational databases. And Twitter has indicated that they aren't using Cassandra as a replacement for everything, they are still using MySQL, so maybe there's still hope that they can be a Real Business someday.
Also, just because a "real business" like Walmart can cope using a relational database doesn't disqualify any business that can't from being "real", that's just dumb. Even where it is technically possible, there are cases when the economics of running the business preclude the costs of using say Oracle.
So you've magically changed your backend from MySQL to Cassandra. Stuff will just work now, right? Well, no. Did you know that Cassandra requires a restart when you change the column family definition? Yeah, the MySQL developers actually had to think out how ALTER TABLE works, but according to Cassandra, that's a hard problem that has very little business value. Right.
I had considered trying to explain here how the differing use-cases and data-model made this less of a problem than Ted perceived it to be, but it's probably easier to just point out that it's basically fixed (see CASSANDRA-44).
I'm not just singling out Cassandra - by replacing MySQL or Postgres with a different, new data store, you have traded a well-enumerated list of limitations and warts for a newer, poorly understood list of limitations and warts, and that is a huge business risk.
I can't speak for other NoSQL projects, but I can assure you that if you have a work-load that can be reasonably accommodated by MySQL or Postgres, then that is what we will recommend you use. For those that can't, they're just going to have to live with a newer, less understood list of limitations and warts, because otherwise there is no business.
The sooner your company admits [that you are not Google], the sooner you can get down to some real work. Developing the app for Google-sized scale is a waste of your time, plus, there is no way you will get it right. Absolutely none. It's not that you're not smart enough, it's that you do not have the experience to know what problems you will see at scale.
The takeaways here I believe are, don't prematurely optimize, Google alone has the scale to justify distributed systems, and you are dumb. One out of three, not bad.
NoSQL will never die, but it will eventually get marginalized, like how Rails was marginalized by NoSQL. In the meantime, DBAs should not be worried, because any company that has the resources to hire a DBA is likely has decision makers who understand business reality.
Any company that has decision makers who understand reality will know to use the right tool for the right job.
As I mentioned earlier, I was fortunate enough to be able to attend FOSDEM this year. The sheer scale of FOSDEM is amazing, with literally thousands of people in attendance, dozens of projects represented, and hundreds(?) of talks. It's doubly impressive when you consider that it is entirely volunteer driven and 100% sponsored (it's no cost to attend).
The NoSQL track organized by Steven Noels on Sunday turned out quite well too I thought, and it seemed to generate a lot of interest (the room was continually filled to capacity and the doors barred). There were talks from some of the usual players (MongoDB, HBase, and of course Cassandra), along with some less heard of projects (GT.M). Mine was the last talk of the morning and seemed to be pretty well received. I got a lot of great questions both during and after the session, and ended up talking shop with several attendees until the next session was starting.
Finally, here is the video of my talk, or you can view it here with the slides.
We put up lights each year for the holidays, and while I don't mind having the house decorated, I do not like having to put them up. Despite this, I feel mounting pressure each year to Do Better, which by default means more lights and decorations, which in turn mean even more work.
The year before last I had the idea that if I worked smarter I might avoid working harder, and that one of those musically synchronized setups would be pretty sweet. Problem is I came to this conclusion in October of 2008, and that didn't leave enough time to properly procrastinate before throwing something together at the last minute, so I was forced to postpone. This past year though I was able to spend a solid 11 months procrastinating, which still left a couple short weeks to hurriedly throw something together.
So long story short, I did it, I put together a controller that sets christmas lights to music. And, providing that you weren't privy to all of the nasty hacks and ugly short-cuts, it was actually kind of neat of to watch. Obviously though, I'm not entirely happy with the results, so I'm considering this year a practice run, and hope that with this as a basis to build upon, next year it will be pretty sweet. So treat the rest of this post as more of a rough brain-dump than a recipe or step-by-step, and hopefully it will prove interesting for comparison purposes next year.
Normally this is where I'd expound on some of my research, the options I investigated, costs, ease of use, etc. That's not going to happen because with all of the procrastination this project required, I simply didn't have the time. Instead I went straight for a pre-assembled parallel port relay board, and I took a page out of this guys book and mounted it in a plastic tool box.
I used some cheap extension cords, fixed to the ends of the toolbox with cable connectors, and wired it all together inside. The relay board needs a power supply, so there is an outlet inside for that.
A cord that exits the rear of the toolbox gets plugged into the mains to supply power to the whole thing, but since we're combining electricity and the great outdoors, a GFI is a must.
I scored this parallel cable in an old pile of hardware. It worked great once I got the zip drive that was attached to it off and in the trash.
Finally, I made use of an old PIII notebook.
I know, it's not much to look at. Sue me.
There are 8 pins on a parallel port that are (were) used to send character data to printers, and it's these 8 pins that are used for outputs. That means controlling the outputs is as simple as writing to that byte. The pyparallel library makes this even easier, so for example, I was able to use something like the following, ran from a cronjob to start and stop the lights each day.
python -c 'import parallel; parallel.Parallel().setData(0)'
I wired everything up to the normally closed contacts of the relay board so the lights would fail-safe. In other words, you have to turn the relay on, in order to turn the corresponding lights off. The
setData(0) above switches on all of the lights by turning the relays off, killing the lights is as easy as changing that to
Initially I had the idea that I'd whip up something to analyze an audio track; that the light show would essentially be a visualization of the waveform. That, as it turns out isn't the panacea that it would seem. Sure, the lights will flash in a way that seems vaguely in response to the music, but the results are just not as coordinated, or ordered, as the samples you see on the Internet.
So I then moved on to the idea of creating a time-series of output states that could be "played" along with the audio, but I was naive to believe that I could hand-craft this data file, so before all was said and done, I'd also written a PyGame application for keying in the outputs as the music played, and visualizing it during playback.
Finally, only after getting everything working I found out that the way "professionals" do this is actually pretty similar to what I came up with, only using MIDI, so I will definitely be looking into that before next year.
I've always wanted to go to a FOSDEM, and getting to see Brussels will be a real treat as well. I can't wait!
NP: Black & White, In Flames
From the git-svn manpage:
For the sake of simplicity and interoperating with a less-capable system (SVN), it is recommended that all git svn users clone, fetch and dcommit directly from the SVN server, and avoid all git clone/pull/merge/push operations between git repositories and branches. The recommended method of exchanging code between git branches and users is git format-patch and git am, or just 'dcommit’ing to the SVN repository.
Running git merge or git pull is NOT recommended on a branch you plan to dcommit from. Subversion does not represent merges in any reasonable or useful fashion; so users using Subversion cannot see any merges you’ve made. Furthermore, if you merge or pull from a git branch that is a mirror of an SVN branch, dcommit may commit to the wrong branch.
Or put another way. Because Subversion can't merge for shit, neither can Git if you expect to integrate the two.
NP: Blood Milk and Sky, White Zombie
Depending on the circles you travel in, you might be aware of the whole NoSQL "movement". If not, I'm not going try and explain it at this time (explaining it is sort of the problem), but you can get the general idea from wikipedia.
I've spent the last couple of days at nosqleast and one of the hot topics here is the name "nosql". Understandably, there are a lot of people who worry that the name is Bad, that it sends an inappropriate or inaccurate message. While I make no claims to the idea, I do have to accept some blame for what it is now being called. How's that? Johan Oskarsson was organizing the first meetup and asked the question "What's a good name?" on IRC; it was one of 3 or 4 suggestions that I spouted off in the span of like 45 seconds, without thinking.
My regret however isn't about what the name says, it's about what it doesn't. When Johan originally had the idea for the first meetup, he seemed to be thinking Big Data and linearly scalable distributed systems, but the name is so vague that it opened the door to talk submissions for literally anything that stored data, and wasn't an RDBMS.
I don't have a problem with projects like Neo4J, Redis, CouchDB, MongoDB, etc, but the whole point of seeking alternatives is that you need to solve a problem that relational databases are a bad fit for. MongoDB and Voldemort for example set out to solve two very different problems and lumping them together under a single moniker isn't very meaningful. This is why people are continually interpreting nosql to be anti-RDBMS, it's the only rational conclusion when the only thing some of these projects share in common is that they are not relational databases.
The cat is out of the bag though, and the "movement" has enough momentum that I don't think it's going anywhere. And, I'm not really advocating that, it's had the effect of bringing a lot of attention to some very interesting projects, and that's a Good Thing. Maybe Emil Eifrem has the right idea by encouraging people to overload the term with Not Only SQL.
I have several trips lined up for the next few weeks:
There is also a NoSQL meetup on November 2 as a part of ApacheCon; I've offered to present on Cassandra there. I'm also thinking of giving a session at BarcampApache, and I'm scheduled to sit on a "SQL vs. NoSQL" panel at OpenSQL, though I'll probably submit a session idea or two there as well.
There are a lot of Cassandra people in the Bay Area, it'd be great if we could setup a hack-a-thon/bug squashing party/meetup/whatever during ApacheCon. Ping me or post something to the list if you are interested! :)
NP: Calling Dr Love, Kiss
I rewrote my blog software again (actually, it was done months ago but I just now got around to deploying it). The last one used Turbogears, but the 1.x branch is getting long in teeth, and 2.0 came a little too late. Besides, Django is the new hotness these days.
Somehow the rewrite resulted in about half as much code, which is always cool, and I finally got to make use of mod_wsgi, (it is everything that I had ever dreamed it would be, and more :)).
All of the old permalinks should still be valid, and with any luck I managed to avoid DoS'ing everyones feed reader.
NP: Take It Out On Me, Bullet For My Valentine
Toledo is another UNESCO World Heritage Site, a city dating back to the Bronze Age with Christian, Jewish, and Moorish influences. It's a beautiful place and the six or so hours I spent there was woefully inadequate.
There are a few pictures up on flickr, but I took quite a few more that will have to wait until after I'm home.
Debconf is over. Boo. :(
Like those I've attended in the past, Debconf9 was well organized with plenty of interesting talks, in a great venue. I had loads of fun, learned a ton, and even managed to get a bit done. Many thanks to the organization team, the local team, the speakers, and the sponsors.
This year I managed to sneak an extra couple of days post-conference which will be spent in the general vicinity of Madrid. I'm going to continue dumping my camera daily so tune into my flickr stream if your interested.
As announced here, I put a Debian package together for Cassandra 0.3.0.
I don't have any (immediate )plans to upload a Cassandra package to the Debian archive, (this package isn't even policy compliant), so consider this unofficial and report any packaging bugs directly to me.
deb http://people.apache.org/~eevans/debian cassandra/ deb-src http://people.apache.org/~eevans/debian cassandra/
Yesterday was the Day Trip at Debconf, an opportunity for folks to step away from their computers (usually), and leave the venue (always) for some sort of group activity or tourism.
When the organizers first started talking about this years Day Trip there were two candidates, Valle del Jerte and Teatro romano de Merida, or "Roman theater of Merida". I'm kind of a history junkie and generally get pretty excited at the idea of touring ruins so I was heartbroken when Merida lost out. The closer it got to the scheduled day the lower my enthusiasm sank, until eventually I just opted out entirely. The obvious by-product of skipping the Day Trip though was a need to find something else to do, and the obvious choice seemed like a trip to Merida. Michael was pretty keen on the idea too.
The entire thing was thrown together very last minute (we barely made it to the bus station), but luckily there was enough time to spot a generous offer via IRC from itais (a Merida local) for transportation from the bus stop to the theater.
As promised, itais was waiting for us at the bus station and took us on a quick tour to see, among other things, a Roman acquaduct and The Arch of Trajan. He then dropped us off in front of the theater with a recommendation for a place to eat.
After an awesome meal we walked the ruins for a couple of hours, tracked down The Temple of Diana, and then settled into the main square for a couple of beers before catching a cab back to the bus station.
A few of the pictures I took can be seen here, and I'll get the rest up eventually.
The Apache Cassandra team has managed the release of 0.3.0, its very first.
It took a lot longer than I had hoped to get a release in the can, (almost a month from the approval of the last release candidate). Part of this was a lack of familiarity with ASF's processes, but part of it was poor or incomplete documentation, or lack of consensus about what is required. In the end, it boiled down to a combination of carefully studying what other poddlings had done, and a few iterations of trial-and-error.
I'm confident that this will all go much smoother for 0.4.0, (which is progressing nicely, and should be ready Real Soon Now).