Wednesday, March 25, 2015

Jim Scott on Zeta architecture at Houston Hadoop Meetup

Jim Scott dropped in on his trip from Chicago to Houston, and presented his Zeta architecture. Here are the major points
  • Zeta architecture is nothing new, Jim has just created pretty diagrams and popularizes it
  • It is really the Google architecture, except that Google will not confirm or deny it
  • It is the last generic Big Data architecture that you will need, and it solves the following problems:
    • Provide high server utilization. Here's why this is  important
    • Allow to scale from test to stage to production environment without re-configuring the system and without re-importing the data
Not convinced yet? Here are the slides of the presentation (provided promptly on the day of the meetup)

Thanks to Microsoft for the awesome (and very spacious) meeting room, and our task now is to fill it - so please, Meetup members in Houston, invite your friends. Remember, pizza from Saba's is always provided. 

Wednesday, March 11, 2015

Ravi Mutyala of Hortonworks talks about

The Stinger Initiative enabled Hive to support an even broader range of use cases at truly Big Data scale: bringing it beyond its Batch roots to support interactive queries – all with a common SQL access layer. is a continuation of this initiative focused on even further enhancing the speed, scale and breadth of SQL support to enable truly real-time access in Hive while also bringing support for transactional capabilities.  And just as the original Stinger initiative did, this will be addressed through a familiar three-phase delivery schedule and developed completely in the open Apache Hive community.

Ravi talked about some of the changes that came in Stinger and ones that are in progress in

This was our first meeting at the MS Office on Beltway 8, and it rocks! Thank you, Jason, for arranging this, and also for showing some of the Azure stuff at the end :)

Sunday, March 8, 2015

How to add a hard drive to HDFS on AWS

Imagine you need to add more space to your HDFS cluster that is running on Amazon EC2. Here are the simple steps you need to take

1. Add a volume in AWS EC2 console. Make sure that the volume is in the same zone as your instance, such as us-east-1c

2. Attach the volume to the instance: right click on the volume and choose "Attach Volume".

3. Make the volume available for use by formatting the hard drive, commands are here. Now you see the new volume (in my case I mounted 1 TB of space as /disk2)

4. Add this drive as one of those that HDFS should use. I have added the directory for the datanode's use as below

5. Presto! You get much more space. Repeat to taste :)