Tuesday, July 12, 2016

Houston Hadoop Meetup - Hacking and SQLing, July 2016

Mr. Y, the hacker, presented a report on Toorcamp 2016, unbelievable do-it, hack-it, laugh-at-it compendium, here are the slides: http://www.slideshare.net/markkerzner/toorcamp-2016

Jim Scott, the fiery orator, talked about SQL and NoSQL.

Here is a link to the blog for this talk: https://www.mapr.com/blog/how-evolve-rdbms-nosql-sql
There is a link to a video embedded in there as well.

Here is the original presentation: http://conferences.oreilly.com/strata/big-data-conference-sg-2015/public/schedule/detail/45066

Thanks to you, see you next time.

Wednesday, May 18, 2016

What I like about NetBeans

Developers live and die by their IDE, and so they have religious wars about them. But thrice-blessed* are those that use multiple IDE.

Here is what I like about IntelliJ
  • It knows the variables you are going to type and often guesses them right; it also has a type-ahead support for them and methods;
  • It runs Scala out of the box.
But here is what I like about NetBeans
  • The UI editor (formerly Matisse, now just editor). It is unsurpassed. For example, the reason I don't write desktop apps in Scala is the absence of such editor.
  • It has a team with a few special guys. The name of one of them start with Geer or Cheer, I am not sure, but he writes an excellent blog about NetBeans. 😁 (Knowledgeable people say it is a hint to this).
  • In debugging, it shows values of variables and even functions or code fragments.

Note: thrice-blessed is a hint to this quote from Shakeaspeare

Tuesday, April 26, 2016

Big Data architecture for O&G

Houston Hadoop Meetup has grown to over 800 members by now. It is lavishly hosted by the Slalom consultants in the Galleria area, and beer, wine and food are provided by Slalom.

The presenter, Dmitry Kniazev gave an overview of the Proof-Of-Concept solution created for a major Oil & Gas company. He gave a brief overview of the WITSML standard that exists in the industry to share the sensor data among different operators, and described how they tapped to it to build the near real-time alerting application that streams data into Kafka queue and processes it using Spark Streaming.

Dmitry Kniazev is as a Solutions Architect, Data Analytics at EPAM Systems (NYSE: EPAM). EPAM is a solutions integrator that outsources solutions implementation to various locations, primarily Eastern Europe. Dmitry has been working with one of the major Oil & Gas companies here in Houston for almost 4 years and participated in various Data Analytics related projects.

The slides are found here. Again, thank you for hosting, presenting, and coming to the meeting.

Sunday, March 6, 2016

Hadoop as a service at Houston Hadoop Meetup

Hadoop as a service was presented by Ajay Jha, of Altiscale. Here are the slides.

As has become customary, our host, Slalom, provided parking ticket validation, pizza, beer and wine.

This location is in a fashionable Galleria area, where downstairs the geeks can continue Caracol restaurant - mexican coastal cuisine.

Thursday, February 25, 2016

Big Data Cartoons - Paris, Jerusalem, Istanbul, Singapore, where next?

In teaching Big Data, we often travel. Lately, in our view, Big Data is picking up the world over, not only in the US. Israeli Spark meetup are just as advanced as the ones in California. So we asked our artist to show  all the places where we have been. That was too hard though, so we just used travel pointers. But the elephant is real.

(In fact, this post is written on a Turkish Airlines plane - thanks to very good WiFi).

Big Data Cartoon - Elephant Scale enters Google doodle competition

Google 4 doodle competitions will be announced in March, https://www.google.com/doodle4google/ but we are already thinking, what's next? So we asked our artist for an official entry. Our artist is so great that there is a virtual guarantee to win - if  it is not disqualified because our subtle brand promotion. So here is our spoiler.

Monday, February 15, 2016

Stoppable Hadoop cluster

The title of this post was inspired by the following lines

They dined on mince, and slices of quince,
   Which they ate with a runcible spoon

from the poem by Edward Lear, "The Owl and the Pussy-Cat", which the reader is invited to ponder at leisure.

Meanwhile, as a Big Data trainer, I often need to create what I would call a "stoppable cluster" on AWS, one that I can "pause," or put to sleep for a while. The most obvious use of it is saving money while the students are away, so that instead of $100/day, I would pay $33 per day. That would be reason enough. However, at times, as a developer, I want to stop the cluster that I am running.

If you look in the literature, it will cite two obstacles:
  1. Ephemeral nodes disappear on stop/start on AWS; and
  2. IP assignments change.
You can fix both by (1) using root drive and EBS drives; and (2) assigning elastic IPs. Amazon will not let you use more than 5 elastic IPs, but you can call them and ask nicely, and they will give you 10. 

Next, Hortonworks Ambari will check the cluster IP assignment and refuse to use the external IPs, even though Amazon promises you the right resolution:

So I use Cloudera manager, stop the cluster, stop the instance, and restart it.

Now, I try to start the Managing services again and....

alas, CM has resolved the IPs to the old internal ones!! And used that in the configuration.

Next installment - constructing proper clusters in the VPC and controlling the internal IP assignments.