Datastax's yearly Cassandra Summit has grown into a two day event this year, June 11-12 in San Francisco. If you are a user of Cassandra (or are considering it), then you probably want to attend this conference (use code SFSummit25 for 25% off).
I'll be there to present on virtual nodes; Find me if you want to chat about databases, network management, beer, or the Can't Hug Every Cat phenomenon.
NP: The Malkin Jewel, The Mars Volta
I'm fortunate enough to be speaking at Berlin Buzzwords again this year. As usual, chats over beer or currywurst (or both) are always welcome. Hope to see you there!
As usual, if you think we'll cross paths and want to meet for beer, coffee, or currywurst, let me know!
This year, the organizers are arranging for a number of hackathons and workshops to precede and follow the main conference. One of those will be a Cassandra event hosted by Acunu and Datastax (date to be announced).
If you're in the Berlin area (or can be), and are interested in search, data analysis and NoSQL (and especially if you're interested in Cassandra), I'd recommend you plan to attend.
I've also managed to work in a couple of days after the conference to poke around Berlin, (this will be my first time in Germany). If you're in the area and want to meet up for beers and/or keysignings, or if you have suggestions for sights to see, drop me a line.
April is shaping up to be a busy month; I have several trips lined up.
The hackathon should also be a real treat. It's always great to spend a little face-time with the people you normally only interact with online.
As always, if my travel plans intersect with yours (or if you live in Austin, Columbia, or San Francisco), and you want to chat over a beer (or coffee, tea, etc), drop me a line.
If you've been around for more than a few years, you've probably bore witness to how susceptible the tech industry is to hype. Some new-shiny comes along, people lose their minds, and seemingly overnight The Next Big Thing has spread like wildfire. Like it or not you find yourself bombarded by blog posts, tweets, articles, and water cooler chat from wild-eyed co-workers. Clearly, Ted Dziuba knows what I'm talking about.
Ironically though, what Ted is missing is the corollary, the equally annoying contrarian who takes it upon himself to set the world right by refuting The Next Big Thing, usually with straw-men and a lot of hand-waving. Seriously dude, don't feed the trolls.
Because I can be a contrarian too, let's have a closer look at some of Ted's points.
The idea is that object relational databases like MySQL and PostgreSQL have lapsed their useful lifetimes, and that document-based or schemaless databases are the wave of the future. Never mind of course that MySQL was the perfect solution to everything a few years ago when Ruby on Rails was flashing in the pan. Never mind that real businesses track all of their data in SQL databases that scale just fine. (For Silicon Valley readers, Walmart is a real business, Twitter is not.)
No, the idea is not that relational databases have lapsed their useful lifetimes, and at least in the case you later cite (Cassandra), the data-model is not considered a feature when compared to relational databases. And Twitter has indicated that they aren't using Cassandra as a replacement for everything, they are still using MySQL, so maybe there's still hope that they can be a Real Business someday.
Also, just because a "real business" like Walmart can cope using a relational database doesn't disqualify any business that can't from being "real", that's just dumb. Even where it is technically possible, there are cases when the economics of running the business preclude the costs of using say Oracle.
So you've magically changed your backend from MySQL to Cassandra. Stuff will just work now, right? Well, no. Did you know that Cassandra requires a restart when you change the column family definition? Yeah, the MySQL developers actually had to think out how ALTER TABLE works, but according to Cassandra, that's a hard problem that has very little business value. Right.
I had considered trying to explain here how the differing use-cases and data-model made this less of a problem than Ted perceived it to be, but it's probably easier to just point out that it's basically fixed (see CASSANDRA-44).
I'm not just singling out Cassandra - by replacing MySQL or Postgres with a different, new data store, you have traded a well-enumerated list of limitations and warts for a newer, poorly understood list of limitations and warts, and that is a huge business risk.
I can't speak for other NoSQL projects, but I can assure you that if you have a work-load that can be reasonably accommodated by MySQL or Postgres, then that is what we will recommend you use. For those that can't, they're just going to have to live with a newer, less understood list of limitations and warts, because otherwise there is no business.
The sooner your company admits [that you are not Google], the sooner you can get down to some real work. Developing the app for Google-sized scale is a waste of your time, plus, there is no way you will get it right. Absolutely none. It's not that you're not smart enough, it's that you do not have the experience to know what problems you will see at scale.
The takeaways here I believe are, don't prematurely optimize, Google alone has the scale to justify distributed systems, and you are dumb. One out of three, not bad.
NoSQL will never die, but it will eventually get marginalized, like how Rails was marginalized by NoSQL. In the meantime, DBAs should not be worried, because any company that has the resources to hire a DBA is likely has decision makers who understand business reality.
Any company that has decision makers who understand reality will know to use the right tool for the right job.
As I mentioned earlier, I was fortunate enough to be able to attend FOSDEM this year. The sheer scale of FOSDEM is amazing, with literally thousands of people in attendance, dozens of projects represented, and hundreds(?) of talks. It's doubly impressive when you consider that it is entirely volunteer driven and 100% sponsored (it's no cost to attend).
The NoSQL track organized by Steven Noels on Sunday turned out quite well too I thought, and it seemed to generate a lot of interest (the room was continually filled to capacity and the doors barred). There were talks from some of the usual players (MongoDB, HBase, and of course Cassandra), along with some less heard of projects (GT.M). Mine was the last talk of the morning and seemed to be pretty well received. I got a lot of great questions both during and after the session, and ended up talking shop with several attendees until the next session was starting.
Finally, here is the video of my talk, or you can view it here with the slides.
I've always wanted to go to a FOSDEM, and getting to see Brussels will be a real treat as well. I can't wait!
NP: Black & White, In Flames
Depending on the circles you travel in, you might be aware of the whole NoSQL "movement". If not, I'm not going try and explain it at this time (explaining it is sort of the problem), but you can get the general idea from wikipedia.
I've spent the last couple of days at nosqleast and one of the hot topics here is the name "nosql". Understandably, there are a lot of people who worry that the name is Bad, that it sends an inappropriate or inaccurate message. While I make no claims to the idea, I do have to accept some blame for what it is now being called. How's that? Johan Oskarsson was organizing the first meetup and asked the question "What's a good name?" on IRC; it was one of 3 or 4 suggestions that I spouted off in the span of like 45 seconds, without thinking.
My regret however isn't about what the name says, it's about what it doesn't. When Johan originally had the idea for the first meetup, he seemed to be thinking Big Data and linearly scalable distributed systems, but the name is so vague that it opened the door to talk submissions for literally anything that stored data, and wasn't an RDBMS.
I don't have a problem with projects like Neo4J, Redis, CouchDB, MongoDB, etc, but the whole point of seeking alternatives is that you need to solve a problem that relational databases are a bad fit for. MongoDB and Voldemort for example set out to solve two very different problems and lumping them together under a single moniker isn't very meaningful. This is why people are continually interpreting nosql to be anti-RDBMS, it's the only rational conclusion when the only thing some of these projects share in common is that they are not relational databases.
The cat is out of the bag though, and the "movement" has enough momentum that I don't think it's going anywhere. And, I'm not really advocating that, it's had the effect of bringing a lot of attention to some very interesting projects, and that's a Good Thing. Maybe Emil Eifrem has the right idea by encouraging people to overload the term with Not Only SQL.
I have several trips lined up for the next few weeks:
There is also a NoSQL meetup on November 2 as a part of ApacheCon; I've offered to present on Cassandra there. I'm also thinking of giving a session at BarcampApache, and I'm scheduled to sit on a "SQL vs. NoSQL" panel at OpenSQL, though I'll probably submit a session idea or two there as well.
There are a lot of Cassandra people in the Bay Area, it'd be great if we could setup a hack-a-thon/bug squashing party/meetup/whatever during ApacheCon. Ping me or post something to the list if you are interested! :)
NP: Calling Dr Love, Kiss
NP: Take It Out On Me, Bullet For My Valentine
As announced here, I put a Debian package together for Cassandra 0.3.0.
I don't have any (immediate )plans to upload a Cassandra package to the Debian archive, (this package isn't even policy compliant), so consider this unofficial and report any packaging bugs directly to me.
deb http://people.apache.org/~eevans/debian cassandra/ deb-src http://people.apache.org/~eevans/debian cassandra/
The Apache Cassandra team has managed the release of 0.3.0, its very first.
It took a lot longer than I had hoped to get a release in the can, (almost a month from the approval of the last release candidate). Part of this was a lack of familiarity with ASF's processes, but part of it was poor or incomplete documentation, or lack of consensus about what is required. In the end, it boiled down to a combination of carefully studying what other poddlings had done, and a few iterations of trial-and-error.
I'm confident that this will all go much smoother for 0.4.0, (which is progressing nicely, and should be ready Real Soon Now).
Johan Oskarsson has organized a meetup for folks interested in distributed structured data storage and is calling it NOSQL. The event, being held June 11th in San Fransisco, will have subject matter experts presenting on Hypertable, HBase, Voldemort, Dynomite, and Cassandra.
There were 100 slots available slots to attend and they all went in a matter of hours, so if this is the first you've heard of it, it's probably too late. Fortunately I got mine and thanks to the support of my employer I'll be there. I'm looking forward to it.