Thursday, November 20, 2014

Big Data Cartoon - What's the latest and greatest in Hadoop?

What's the latest and greatest in Hadoop? Ask this question, and many people will say "Real-time" and point to Spark. Look at Berkeley's AMP labs two-day seminar going on right now, for example.

But what is Spark, really? What are those RDD's? They stand for Resilient Distributed Datasets, but is it any clearer? We asked our illustrator to clarify this, and hopefully we got it explained.

Wednesday, November 12, 2014

Announcing HBase Design Patterns Book

Happy to announce the "HBase Design Patterns" book, by Mark Kerzner and Sujee Maniyam. The book just went into production and can be pre-ordered using this link:

The book offers an HBase and NoSQL developer practical guidance in designing and implementing real-world applications. Subjects covered include

  • Various HBase install options
  • Single entity tables
  • Key generation
  • Storing large files
  • Dealing with time series data
  • Advanced modeling
  • Performance optimization
  • A number of labs and exercises

Based on the authors' own work, research and experience gained  in writing the open source book "Hadoop Illuminated." Oh, and did we forget to mention cartoons by RK? Each chapter has at least one.


Mark & Sujee

Tuesday, November 11, 2014

An excellent presentation by Rohit Jain about exciting new open source product Trafodion

Rohit Jain drove from Austin and presented Trafodion (Welsh for "Transaction"), pronounced "Travodion" - for those in the know. Rohit is an HP Database Distinguished and Chief Technologist. The breadth and depth of his knowledge is amazing.

In turn, the audience did not betray the expectations.  Houston is getting its Big Data people, by importing them, and people from Cloudera, Hortonworks and DataStax were all represented.

Pizza was sponsored by HP - thank you - and Rohit has already uploaded the slides to the Meetup. Here are the main slides, and the architecture

, with this note from Rohit: "There was interest in the Trafodion Distributed Transaction Management (DTM) architecture. However, it is a bit dated. Since this presentation, DTM has now been implemented as HBase co-processor code & THLOG has been integrated with the HBase HLOG."

My comment: I started Houston Hadoop Meetup in 2010, with the expectation of an imminent Big Data Boom in Houston. I am still expecting. This was the first meetup though where we had active Big Data professionals, but they were all imported, as I said, from Big Data companies. We are still yet to see native Houstonians and Houston companies doing Big Data. Again, it's coming, and our meetup is one of the focal points.

Monday, October 27, 2014

Big Data Cartoon: Big Data needs big muscle

Inspired possibly by this cartoon in New Yorker, our illustrator has set out to tell us that being in Big Data, you travel a lot, and of course avail yourself of the exercise facilities found in each and every hotel. My latest was a packed gym in downtown San Francisco.

Lately, I've been noticing that trainers at Elephant Scale have been gaining muscle weight.

Tuesday, September 30, 2014

Got an Ubuntu laptop!

Quite powerful and good-looking, from System76. (It is the one in the middle). Now I have a chance to be productive while traveling or working in friends' place.

I am planning to add Windows in a VM, stay tuned...

Sunday, September 7, 2014

Big Data Cartoon: NY is new Silicon Valley

Silicon Valley may be the leader in Big Data, but when you compare it to New York, it is underwhelming. gives 994 Hadoop jobs in NY, and 1719 in Silicon Valley.

What's more is that if you are a financial startup, then you simply must be in New York. You might have an office in TechSpaces in SF, but that's about it. This is fully supported by our illustrator and cartoon author, whose new residence is now appropriately in Manhattan.

Silicon Valley, pay attention!

Thursday, July 24, 2014

FreeEed does Concordance (R)

The latest release of FreeEed (V4.4) allows import into Concordance (R) eDiscovery management software. Here are the instructions.

It also contains a number of fixes. You can use FreeEed in so many ways:
  • Start a FreeEed server on Amazon, no hardware needed;
  • Download a virtual machine to your workstations;
  • Install in Windows, Linux, or Mac.
Download page: hereAnd all of the popcorn advantages still apply.

PS. Sneak preview: we are working on a document processing engine for today's 3V's - volume, velocity, variety. It is 10-100 times faster, and allows dynamic data sources.