Wednesday, June 13, 2012

Big clusters for eDiscovery

Every programmer knows that special pleasure and satisfaction when his or her code works right, and with more and more testing, and with more and more data. The special joy of clusters is when it works with any size of cluster.



The SHMcloud (TM) player is now able to start and configure all the machines in a Hadoop cluster at once. This means that a cluster of 1 machine takes five minutes, the cluster of 20 machines takes five minute, and the cluster of 50 or 100 machines also takes five minutes - the latter when Amazon approves my request for more instances :)

Update: got my limit raised to 50!


Then you can verify this in the AWS console.

And, don't forget to shut them down!














Update 2: the nice folks at Amazon gave me 50 machines the next day. Now the cluster looks like this:


-rw-r--r--   1 ubuntu supergroup          0 2012-06-14 21:33 /test-output/_SUCCE
SS
drwxr-xr-x   - ubuntu supergroup          0 2012-06-14 21:32 /test-output/_logs
-rw-r--r--   1 ubuntu supergroup        172 2012-06-14 21:33 /test-output/part-0
0000

12-06-14 16:33:30   Cluster testing and verification is complete
setInitializedState for cluster of 49
12-06-14 16:33:33   Running instances: 49
12-06-14 16:33:33   Completely initialized: 49
setInitializedState for cluster of 49

Gioia gioia mille anni!

47 working nodes (49 total - memory master - work master) working together!


No comments: