The version 2.7.5 is uploaded to the FreeEed site, and it is ready to take on the Enron set (with Amazon machine, of course). And why is this so nice? -- Having FreeEed process a large set serves as regression testing: on every new release, we can re-process the set (which will grow by adding other data sources) and verify that the quality of processing did not go down and in fact improved.
The list of improvements can be always found here, but there it is for your reading pleasure:
- Smaller FreeEed download
- Capability to read remote resources as data source using URI notation. The URI syntax is documented, and the program takes you to the right web page for help. You can include ftp with user name and password, and a lot of other things - anything that is a valid URI and that the site where it resides will actually allow you to download. For example, if you want to process an Enron file from the EDRM site, you just give its URI as http://duaj3yp6waei2.cloudfront.net/edrm-enron-v2_bailey-s_pst.zip and FreeEed downloads the file.
- Processing of dozens of archive formats: http://truezip.java.net/kick-start/no-maven.html
- Processing of archives inside of archives recursively
- Command-line running is restored and can be used for scripting large jobs
- Option -enron to process Enron data set (specific test script)
No comments:
Post a Comment