What's the latest and greatest in Hadoop? Ask this question, and many people will say "Real-time" and point to Spark. Look at Berkeley's AMP labs two-day seminar going on right now, for example.
But what is Spark, really? What are those RDD's? They stand for Resilient Distributed Datasets, but is it any clearer? We asked our illustrator to clarify this, and hopefully we got it explained.