Sunday, June 22, 2014

I am a reviewer on Apache Solr High Performance book


As always, I acknowledge my colleagues, my friend and partner Sujee, and my multi-talented family.

Next time, more of my friends who always help.

Friday, June 13, 2014

Houston Hadoop Meetup - Marco Vasquez presents Apache Spark

Invited speaker Marco Vasques told the group about his work as Data Scientist at MapR, and his use of Spark for this purpose. Thanks to YARN in Hadoop 2, Spark has become a part of every major distribution, either as a release or as early preview.

The group was quite technical and asked a lot of detailed questions. Thanks to everyone, and to MapR for sponsoring the pizza and drinks.

And here are the slides: http://www.slideshare.net/MapRTechnologies/spark-v1

Thursday, June 12, 2014

SHMsoft, Inc. Offers FreeEed as Pre-packaged Open Source Software for eDiscovery

For immediate release

SHMsoft, a leader in open source software for eDiscovery, is pleased to announce its latest offering - a complete eDiscovery application, with all components pre-installed and integrated together. This is often explained using a popcorn metaphor: a corn is a lawsuit, FreeEed is a popcorn maker, and processing is adding the lawsuit (corn) to the popcorn maker (FreeEed).

Read on...

Monday, June 9, 2014

Cartoon: Hadoop-based search

Hadoop is a perfect platform for search: it is big and strong, and attentive to details. However, don't just take my word for it. Here is the "follow the money" hint: Elastic Search announced $70 Million series C financing, with its products of ElasticSearch, Logstash, and Kibana.

Our Hadoop-based legal search, FreeEed, is also witnessing increased adoption, by law firms looking for eDiscovery alternatives, and by IT departments and government agencies who "search for open source legal search" :)

Friday, June 6, 2014

I win another bet

As my friends and students know, I like to make a bet with them, at any time, that there will be some new Big Data development within the next 30 days from the bet.

I think this one qualifies quite well: ElasticSearch just announced that they got funded to the tune of 70 million US dollars: http://www.elasticsearch.com/blog/press/elasticsearch-raises-70-million-series-c-financing/

So why this is big? It shows that not only Big Data infrastructure companies, like Cloudera, who got about 1 billion dollars a month ago, but also more vertically oriented startups are just as important.

Another bet can start today - anyone?

How to build a Hadoop cluster on AWS

Below are some excerpts from a book I am writing. Since this seems to be a matter of general interest, I decided to put this in a blog.

Very often people need to build a Hadoop cluster for work or for fun. There is nothing better than borrowed powerful hardware for this (provided that you don't forget to shut the cluster down when you are done, so head directly to Amazon AWS console:

Thursday, June 5, 2014

Real time Hadoop

Real time Hadoop is all the rage, with Storm, Spark, Shark, and a plethora of other products, initiatives and events. However, it may be hard to visualize. For example, when you do a "bring your child to work." If so, our cartoonist comes into the picture and explains it very clearly: it is an elephant surfing in the clouds. See for yourself. In fact, don't forget our complete Hadoop coloring book for kids.