Wednesday, November 26, 2014

Big Data Cartoon - What is text analytics?

Analytics may be the next big thing in Big Data, but it is very hard to define what it really is. Firstly, this word shows as misspelled in the browser and in Word or OpenOffice. Secondly, it's too vague and nebulous. As always, when in doubt, we turn to our illustrator, and our RK can illuminate us with a simple to understand cartoon that even data scientists can get.

Thursday, November 20, 2014

Big Data Cartoon - What's the latest and greatest in Hadoop?

What's the latest and greatest in Hadoop? Ask this question, and many people will say "Real-time" and point to Spark. Look at Berkeley's AMP labs two-day seminar going on right now, for example.

But what is Spark, really? What are those RDD's? They stand for Resilient Distributed Datasets, but is it any clearer? We asked our illustrator to clarify this, and hopefully we got it explained.

Wednesday, November 12, 2014

Announcing HBase Design Patterns Book

Happy to announce the "HBase Design Patterns" book, by Mark Kerzner and Sujee Maniyam. The book just went into production and can be pre-ordered using this link:

The book offers an HBase and NoSQL developer practical guidance in designing and implementing real-world applications. Subjects covered include

  • Various HBase install options
  • Single entity tables
  • Key generation
  • Storing large files
  • Dealing with time series data
  • Advanced modeling
  • Performance optimization
  • A number of labs and exercises

Based on the authors' own work, research and experience gained  in writing the open source book "Hadoop Illuminated." Oh, and did we forget to mention cartoons by RK? Each chapter has at least one.


Mark & Sujee

Tuesday, November 11, 2014

An excellent presentation by Rohit Jain about exciting new open source product Trafodion

Rohit Jain drove from Austin and presented Trafodion (Welsh for "Transaction"), pronounced "Travodion" - for those in the know. Rohit is an HP Database Distinguished and Chief Technologist. The breadth and depth of his knowledge is amazing.

In turn, the audience did not betray the expectations.  Houston is getting its Big Data people, by importing them, and people from Cloudera, Hortonworks and DataStax were all represented.

Pizza was sponsored by HP - thank you - and Rohit has already uploaded the slides to the Meetup. Here are the main slides, and the architecture

, with this note from Rohit: "There was interest in the Trafodion Distributed Transaction Management (DTM) architecture. However, it is a bit dated. Since this presentation, DTM has now been implemented as HBase co-processor code & THLOG has been integrated with the HBase HLOG."

My comment: I started Houston Hadoop Meetup in 2010, with the expectation of an imminent Big Data Boom in Houston. I am still expecting. This was the first meetup though where we had active Big Data professionals, but they were all imported, as I said, from Big Data companies. We are still yet to see native Houstonians and Houston companies doing Big Data. Again, it's coming, and our meetup is one of the focal points.