Friday, May 8, 2015

Big Data Cartoon: Spark workshop

So what is Apache Spark workshop? You would imagine that sparks are flying off the participants and, at least in the eyes of our illustrator, this is absolutely true. By the way, Elephant Scale will be teaching a one-day workshop soon, in Dallas, TX, so stay tuned.

(What is the inspiration behind this drawing? It Rembrandt's - genius painting but gruesome subject - you have been warned - here.)

Wednesday, April 29, 2015

I am a reviewer on "Real-time Analytics with Storm and Cassandra"

  • Create your own data processing topology and implement it in various real-time scenarios using Storm and Cassandra
  • Build highly available and linearly scalable applications using Storm and Cassandra that will process voluminous data at lightning speed
  • A pragmatic and example-oriented guide to implement various applications built with Storm and Cassandra

Tuesday, April 28, 2015

Big Data Cartoon - In the Big Small world

Working with Big Data, you are also working across the globe. As Sujee Maniyam puts it in his presentation on Launching your career in Big Data: "If you think of a nine-to-five job - forget it!" You are working with multiple teams at all hours. The reward, in the words of Shakespeare

Why, then the world's mine oyster.

Many people quote this, but in the play it's better. Falstaff fires Pistol, his servant, and refuses to lend him money. Pistol decides to fend for himself. Fastaff is being witty, at his own expense: Pistol will betray him.

I will not lend thee a penny.
Why, then the world's mine oyster.
Which I with sword will open.
Not a penny. I have been content, sir, you should
lay my countenance to pawn...

Wednesday, April 8, 2015

I am a reviewer on "Learning Apache Cassandra"

I am a reviewer on the new Packt Cassandra book.

What You Will Learn

  • Install Cassandra and create your first keyspace
  • Choose the right table structure for the task at hand in a variety of scenarios
  • Use range slice queries for efficient data access
  • Effortlessly handle concurrent updates with collection columns
  • Ensure data integrity with lightweight transactions and logged batches
  • Understand eventual consistency and use the right consistency level for your situation
  • Implement best practices for data modeling and access

Wednesday, March 25, 2015

Jim Scott on Zeta architecture at Houston Hadoop Meetup

Jim Scott dropped in on his trip from Chicago to Houston, and presented his Zeta architecture. Here are the major points
  • Zeta architecture is nothing new, Jim has just created pretty diagrams and popularizes it
  • It is really the Google architecture, except that Google will not confirm or deny it
  • It is the last generic Big Data architecture that you will need, and it solves the following problems:
    • Provide high server utilization. Here's why this is  important
    • Allow to scale from test to stage to production environment without re-configuring the system and without re-importing the data
Not convinced yet? Here are the slides of the presentation (provided promptly on the day of the meetup)

Thanks to Microsoft for the awesome (and very spacious) meeting room, and our task now is to fill it - so please, Meetup members in Houston, invite your friends. Remember, pizza from Saba's is always provided. 

Wednesday, March 11, 2015

Ravi Mutyala of Hortonworks talks about

The Stinger Initiative enabled Hive to support an even broader range of use cases at truly Big Data scale: bringing it beyond its Batch roots to support interactive queries – all with a common SQL access layer. is a continuation of this initiative focused on even further enhancing the speed, scale and breadth of SQL support to enable truly real-time access in Hive while also bringing support for transactional capabilities.  And just as the original Stinger initiative did, this will be addressed through a familiar three-phase delivery schedule and developed completely in the open Apache Hive community.

Ravi talked about some of the changes that came in Stinger and ones that are in progress in

This was our first meeting at the MS Office on Beltway 8, and it rocks! Thank you, Jason, for arranging this, and also for showing some of the Azure stuff at the end :)

Sunday, March 8, 2015

How to add a hard drive to HDFS on AWS

Imagine you need to add more space to your HDFS cluster that is running on Amazon EC2. Here are the simple steps you need to take

1. Add a volume in AWS EC2 console. Make sure that the volume is in the same zone as your instance, such as us-east-1c

2. Attach the volume to the instance: right click on the volume and choose "Attach Volume".

3. Make the volume available for use by formatting the hard drive, commands are here. Now you see the new volume (in my case I mounted 1 TB of space as /disk2)

4. Add this drive as one of those that HDFS should use. I have added the directory for the datanode's use as below

5. Presto! You get much more space. Repeat to taste :)