Wednesday, April 8, 2015

I am a reviewer on the new Packt Cassandra book.

What You Will Learn

  • Install Cassandra and create your first keyspace
  • Choose the right table structure for the task at hand in a variety of scenarios
  • Use range slice queries for efficient data access
  • Effortlessly handle concurrent updates with collection columns
  • Ensure data integrity with lightweight transactions and logged batches
  • Understand eventual consistency and use the right consistency level for your situation
  • Implement best practices for data modeling and access

Wednesday, March 25, 2015

Jim Scott on Zeta architecture at Houston Hadoop Meetup

Jim Scott dropped in on his trip from Chicago to Houston, and presented his Zeta architecture. Here are the major points
  • Zeta architecture is nothing new, Jim has just created pretty diagrams and popularizes it
  • It is really the Google architecture, except that Google will not confirm or deny it
  • It is the last generic Big Data architecture that you will need, and it solves the following problems:
    • Provide high server utilization. Here's why this is  important
    • Allow to scale from test to stage to production environment without re-configuring the system and without re-importing the data
Not convinced yet? Here are the slides of the presentation (provided promptly on the day of the meetup)

Thanks to Microsoft for the awesome (and very spacious) meeting room, and our task now is to fill it - so please, Meetup members in Houston, invite your friends. Remember, pizza from Saba's is always provided. 

Wednesday, March 11, 2015

Ravi Mutyala of Hortonworks talks about

The Stinger Initiative enabled Hive to support an even broader range of use cases at truly Big Data scale: bringing it beyond its Batch roots to support interactive queries – all with a common SQL access layer. is a continuation of this initiative focused on even further enhancing the speed, scale and breadth of SQL support to enable truly real-time access in Hive while also bringing support for transactional capabilities.  And just as the original Stinger initiative did, this will be addressed through a familiar three-phase delivery schedule and developed completely in the open Apache Hive community.

Ravi talked about some of the changes that came in Stinger and ones that are in progress in

This was our first meeting at the MS Office on Beltway 8, and it rocks! Thank you, Jason, for arranging this, and also for showing some of the Azure stuff at the end :)

Sunday, March 8, 2015

How to add a hard drive to HDFS on AWS

Imagine you need to add more space to your HDFS cluster that is running on Amazon EC2. Here are the simple steps you need to take

1. Add a volume in AWS EC2 console. Make sure that the volume is in the same zone as your instance, such as us-east-1c

2. Attach the volume to the instance: right click on the volume and choose "Attach Volume".

3. Make the volume available for use by formatting the hard drive, commands are here. Now you see the new volume (in my case I mounted 1 TB of space as /disk2)

4. Add this drive as one of those that HDFS should use. I have added the directory for the datanode's use as below

5. Presto! You get much more space. Repeat to taste :)

Tuesday, February 24, 2015

Big Data Cartoon - What's with Pivotal?

Last week, Pivotal joined its forces with its former rival, Hortonworks, announcing that they will form a join Hadoop Core platform. In my understanding, Pivotal is giving up its own distribution of Hadoop in favor of Hortonworks Data Platform.

However, Yevgeniy Sverglik on DataCentralKnowledge quotes Cloudera and MapR as saying that there is no need for another Hadoop Core, and that this is about marketing with self serving interests.

Who is right? Our illustrator ponders.

Monday, February 23, 2015

FreeEed technologies led to DARPA project

FreeEed technologies impressed the DARPA team and led to a contract to fight human trafficking. The full press release by the main contractor, Hyperion Gray, is quoted below. While FreeEed and Elephant Scale can't have their own press release, their involvement is fully explained.

Wednesday, February 18, 2015

My lab

And all of these computers are needed, and no, you cannot replace it with one Mac :) - I can tell you why.