Wednesday, August 31, 2016

Eventual consistency explained with Starbucks coffee

How do you explain eventual consistency to a novice?  You tell them, "Have you been to Starbucks? Yes? - Well, it's like this, only for databases."

That is a favorite example. I thought that an illustration would help, so here it is.

The orders do not go through all phases in sequence, but eventually, you get it. There may be false starts, wrong order, etc., and this is how NoSQL databases work as well.

One more architectural principle that Starbucks illustrates is decoupling. The workers at Starbucks communicate with each other through messages, encoded on a cup. Moreover, this message is hardware (cup) based, so it does not get. Decoupling is important for scaling: you can have two baristas, for example.

Saturday, August 27, 2016

In Search of Database Nirvana - Houston Hadoop&Spark Meetup

Database expert Rohit Jain presented "In search of database Nirvana". 
Below is the description, here are the slides Note that the slides have animation. To enjoy the slides to the fullest, download and view them outside SlideShare. 

See y'all at the next meetup.

In Search of Database Nirvana – one SQL engine for transactional to analytical workloads
Companies are looking for a single database engine that can address all their varied needs—from transactional to analytical workloads, against structured, semi-structured, and unstructured data, leveraging graph, document, text search, column, key value, wide column, and relational data stores; on a single platform without the latency of data transformation and replication.  They are looking for the ultimate database nirvana.
The term hybrid transactional/analytical processing (HTAP), coined by Gartner, perhaps comes closest to describing this concept. 451 Research uses the terms convergence or converged data platform. The terms multi-model or unified are also used. But can such a nirvana be achieved?  Some database vendors claim to have already achieved this nirvana.  In this talk we will discuss the following challenges on the path to this nirvana, for you to assess how accurate these claims are:
·         What is needed for a single query engine to support all workloads?
·         What does it take for that single query engine to support multiple storage engines, each serving a different need?
·         Can a single query engine support all data models?
·         Can it provide enterprise-caliber capabilities?
Attendees looking to assess query and storage engines would benefit from understanding what the key considerations are when picking an engine to run their targeted workloads. Also, developers working on such engines can better understand capabilities they need to provide in order to run workloads that span the HTAP spectrum.
Rohit Jain is the CTO at Esgyn working on Apache Trafodion™, currently in incubation. Trafodion is a transactional to analytics SQL-on-Hadoop RDBMS. Rohit worked for Tandem, Compaq, and Hewlett-Packard for the last 28 of his 40 years in application and database development. He has worked as an application developer, solutions architect, consultant, software engineer, database architect, development and QA manager, Product Manager, and CTO. His experience spans Online Transaction Processing, Operational Data Stores, Data Marts, Enterprise Data Warehouses, Business Intelligence, and Advanced Analytics, on distributed massively parallel systems.

Tuesday, July 12, 2016

Houston Hadoop Meetup - Hacking and SQLing, July 2016

Mr. Y, the hacker, presented a report on Toorcamp 2016, unbelievable do-it, hack-it, laugh-at-it compendium, here are the slides:

Jim Scott, the fiery orator, talked about SQL and NoSQL.

Here is a link to the blog for this talk:
There is a link to a video embedded in there as well.

Here is the original presentation:

Thanks to you, see you next time.

Wednesday, May 18, 2016

What I like about NetBeans

Developers live and die by their IDE, and so they have religious wars about them. But thrice-blessed* are those that use multiple IDE.

Here is what I like about IntelliJ
  • It knows the variables you are going to type and often guesses them right; it also has a type-ahead support for them and methods;
  • It runs Scala out of the box.
But here is what I like about NetBeans
  • The UI editor (formerly Matisse, now just editor). It is unsurpassed. For example, the reason I don't write desktop apps in Scala is the absence of such editor.
  • It has a team with a few special guys. The name of one of them start with Geer or Cheer, I am not sure, but he writes an excellent blog about NetBeans. 😁 (Knowledgeable people say it is a hint to this).
  • In debugging, it shows values of variables and even functions or code fragments.

Note: thrice-blessed is a hint to this quote from Shakeaspeare

Tuesday, April 26, 2016

Big Data architecture for O&G

Houston Hadoop Meetup has grown to over 800 members by now. It is lavishly hosted by the Slalom consultants in the Galleria area, and beer, wine and food are provided by Slalom.

The presenter, Dmitry Kniazev gave an overview of the Proof-Of-Concept solution created for a major Oil & Gas company. He gave a brief overview of the WITSML standard that exists in the industry to share the sensor data among different operators, and described how they tapped to it to build the near real-time alerting application that streams data into Kafka queue and processes it using Spark Streaming.

Dmitry Kniazev is as a Solutions Architect, Data Analytics at EPAM Systems (NYSE: EPAM). EPAM is a solutions integrator that outsources solutions implementation to various locations, primarily Eastern Europe. Dmitry has been working with one of the major Oil & Gas companies here in Houston for almost 4 years and participated in various Data Analytics related projects.

The slides are found here. Again, thank you for hosting, presenting, and coming to the meeting.

Sunday, March 6, 2016

Hadoop as a service at Houston Hadoop Meetup

Hadoop as a service was presented by Ajay Jha, of Altiscale. Here are the slides.

As has become customary, our host, Slalom, provided parking ticket validation, pizza, beer and wine.

This location is in a fashionable Galleria area, where downstairs the geeks can continue Caracol restaurant - mexican coastal cuisine.

Thursday, February 25, 2016

Big Data Cartoons - Paris, Jerusalem, Istanbul, Singapore, where next?

In teaching Big Data, we often travel. Lately, in our view, Big Data is picking up the world over, not only in the US. Israeli Spark meetup are just as advanced as the ones in California. So we asked our artist to show  all the places where we have been. That was too hard though, so we just used travel pointers. But the elephant is real.

(In fact, this post is written on a Turkish Airlines plane - thanks to very good WiFi).