Wednesday, June 15, 2011

An open-source Hadoop alternative from LexisNexis and FreeEed

As if by coincidence, on the next day after the LTN article by Evan Koblents which mentioned FreeEed, LexisNexis announced that it will open-source its Hadoop alternative for handling Big Data:

"LexisNexis announced today that it will open-source its High Performance Computing Cluster (HPCC) technology, as well as offer an enterprise version with commercial support. The company is positioning HPCC Systems, developed internally by its Risk Solutions unit, as an alternative to Apache Hadoop. A virtual machine for testing purposes will be available soon, and code will be available in a few weeks." For fuller announcement, go here.

As a first impression, what are the major comparison points?

  • LexisNexis has been using its technology for a while and has a marketing clout to match, but it announced only plans to make the VM "available soon" and code "in a few weeks." One wonders if this is a reaction to the momentum that FreeEed has been gaining. On the other hand, FreeEed is already out on GitHub;
  • LexisNexis is essentially a closed-source company, so one wonders how really open-sourced the offering is going to be. But they may be successful - look at Microsoft open-source contributions. In LexisNexis own words, "Only the core technology is being released, LexisNexis' own data linking techniques aren't being released, nor are its data sources." In contrast, FreeEed is pure open source (with commercial support options), and people are already investigating using it in ways beyond eDiscovery. This illustrates the flexibility of an open source offering.
  • LexisNexis has Roxie, a system for query and data warehousing, but FreeEed will have the same based on Cassandra.
  • LexisNexis sports ECL (Enterprise Control Language), but Cassandra has CQL (Cassandra Query Language).
  • LexisNexis's "HPCC team has been working with Amazon Web Services to make sure the product work well on AWS servers," but FreeEed team has planned on the use of EC2 from the start and is actively working on it now.
The two are not exactly competitors at this point: LexisNexis releases the technology for high performance cluster computing and its risk handling applications, but they are close in their approach to open source and to handling Big Data, so it is worth watching.


jennydenver said...

Yes, I have been watching or wondering about their attempts as well :).

PS: I just discovered (well I *have* seen or read your blog long ago I recall, I am sure) ---- But just discovered FreeEed and ediscovery, shmsoft business and all your wonderful and greatly enhancing to my excitement and confidence in this area of work/development..... your source code and notes.... Thanks.

- Matt Kaufman,

Mark Kerzner said...

Well, I am also very excited, but lots of work is still ahead