Q. How to cull out junk mail?
A.
- Whatever solution you pick you should ensure it works as you expect/intend against your collection and your overall process is legally defensible. If your opponents question your process, pointing to a list you pulled off the internet or or software you used, but did not test, will not withstand much scrutiny;
- Taking the approach that a number of these e-mails are junk domains and if we can find the junk domains, maybe one can eliminate a large subset. There are about about 3,500 domain names, see here, here, and here;
- Other means of filtering email, also Bayesian spam filtering;
- Commercial tools include Nuix, Clearwell.
- There are spam blocking lists you can obtain that has a list of verified domains considered spam. These lists although not fool proof, are typically used by email admins to filter out spam at the mail gateway. Here is one compiled list at no charge (they accept donation) but keep in mind no list is 100% accurate and up to date;
- Purchase more up to date lists directly from more recognized anti-spam organizations;
- Compile an inhouse list of domains by searching out the .com of junk folders for a subset of custodians if that's an option.
A.
- Commercial tools include PinPoint Labs, EnCase, MKS Toolkit;
- Linux utilities such as md5deep, md5sum;
A.
- Research done while creating Symantec's Discovery Accelerator lead to choosing display in HTML, since performance is crucial, and nobody wants to pay for lawyers sitting waiting for page refresh;
- The three limitations are a myth: (1) Bates numbers can be replaced with file renaming and using the database and the load file to identify the files; (2) Hash fingerprints can be used to check that the files have not been modified; (3) Redactions are OK if the original and the redaction logs are kept;
- Redaction native files is not practical. Besides, native file are inconvenient to litigate - for example, printing individual files is a lot of work; paging may be different; it is hard to "hook" your work product to the native files;
- It all depends if you are the producing or the receiving side. On the receiving side, native files are easier and faster to review. On the producing side, you are getting into problems of page numbers differing between you and the other side;
- Potential of unseen text such as track changes, comments, hidden columns, etc that might escape notice from the reviewer. This issue and the increased complexity of native redaction are some of the only arguments against native review. Still, tiffing is dying a slow death for good reasons;
- Metadata! For many cases in which certain types of issues are present (e.g. questions of contract formation, to name but one), producing natively will lead to producing potentially important metadata which may never have been reviewed by the attorneys. If an attorney insists on producing native files, make sure to at least mention a clawback agreement.