Sunday, November 14, 2010

Speaking in tongues

Giuseppe Caspar Mezzofanti (1774-1849) was keeper of the Vatican library and later a cardinal, but he’s best remembered for being a hyperpolyglot, a speaker of many languages.
How many? Estimates range from 24 (in 1805) to 114 (judged after his death). The true number probably lies somewhere in between, but it’s considerable–Byron called Mezzofanti “a monster of languages, the Briareus of parts of speech.”
A Russian traveler once asked Mezzofanti for a list of the dialects he had mastered, and the cardinal sent him the name of God in 56 languages. And Gregory XVI once arranged to have a polyglot group of students waylay him in the Vatican gardens: “[O]n a sudden, at a given signal, these youths grouped themselves for a moment on their knees before his Holiness, and then, quickly rising, addressed themselves to Mezzofanti, each in his own tongue, with such an abundance of words and such a volubility of tone, that, in the jargon of dialects, it was almost impossible to hear, much less to understand, them. But Mezzofanti did not shrink from the conflict. With the promptness and address which were peculiar to him, he took them up singly, and replied to each in his own language, with such spirit and elegance as to amaze them all.”
For another prodigious librarian, see Book Lover.

Thursday, September 2, 2010

An idea for a legal service

This service would create a social portrait of the defendant (or plaintiff, for that matter) and present it to the lawyer.

Why would this service be better than a paralegal, whom the lawyer can task with doing the research? At the paralegal's rate of $100/hour, a computerized service can compete on price, and outsourcing this may be too sensitive. In addition, the service could
  • Provide nicely-formatted, standard reports;
  • Search thoroughly, and not omit any of the hundreds of social sites;
  • Keep a history in the changes in the profile, which may be telling.
I remember a lawyer who found what he needed about the opposing side's defendant by discovering his profile on a Greek singles site. This was a unique achievement by that individual, but a service would make such feats dependable and reproducible. It can also collect defensible evidence in the process.

The suggested name for the service is "Watson."

Why is it OK to discuss this idea in the open? Because anyone can come up with this or a similar idea, and it is the implementation of it that would be a competitive secret. It needs to include
  • Scalable, cloud-based architecture, that can expand on demand and provide pay-only-for-resources-used charges;
  • Rules-based engine with dozens or hundreds of rules of research, based on the case and individual's profile;
  • Specialized text- and image-analytics,  such as image search, blog and articles understanding, etc.;
  • Of course, crawling that avoids being invasive and being blocked by search engines and the social sites.
Art: Sidney Paget - Nothing could be better, said Holmes, illustration from 'The Stockbrokers Clerk by Arthur Conan Doyle 1859-1930, published in Strand Magazine, March 1893

Friday, January 8, 2010

litsupport summary for the week ending on 01/10/10

A lot of important and useful information is posted to litsupport each week. The following is a distilled summary, in the form of questions and answers.

Q. How to cull out junk mail?

  • Whatever solution you pick you should ensure it works as you expect/intend against your collection and your overall process is legally defensible. If your opponents question your process, pointing to a list you pulled off the internet or or software you used, but did not test, will not withstand much scrutiny;
  • Taking the approach that a number of these e-mails are junk domains and if we can find the junk domains, maybe one can eliminate a large subset. There are about about 3,500 domain names, see here, here, and here;
  • Other means of filtering email, also Bayesian spam filtering;
  • Commercial tools include Nuix, Clearwell
  • There are spam blocking lists you can obtain that has a list of verified domains considered spam. These lists although not fool proof, are typically used by email admins to filter out spam at the mail gateway. Here is one compiled list at no charge (they accept donation) but keep in mind no list is 100% accurate and up to date;
  • Purchase more up to date lists directly from more recognized anti-spam organizations;
  • Compile an inhouse list of domains by searching out the .com of junk folders for a subset of custodians if that's an option.
Q. Stand-alone software application that I can use to De-NIST?
Q.  Potential issues with native production (1) No way to bates stamp a native file; (2) No way to lock from editing; (3) Redaction; (4) More?
  • Research done while creating Symantec's Discovery Accelerator lead to choosing display in HTML, since performance is crucial, and nobody wants to pay for lawyers sitting waiting for page refresh;
  • The three limitations are a myth: (1) Bates numbers can be replaced with file renaming and using the database and the load file to identify the files; (2) Hash fingerprints can be used to check that the files have not been modified; (3) Redactions are OK if the original and the redaction logs are kept;
  • Redaction native files is not practical. Besides, native file are inconvenient to litigate - for example, printing individual files is a lot of work; paging may be different; it is hard to "hook" your work product to the native files;
  • It all depends if you are the producing or the receiving side. On the receiving side, native files are easier and faster to review. On the producing side, you are getting into problems of page numbers differing between you and the other side;
  • Potential of unseen text such as track changes, comments, hidden columns, etc that might escape notice from the reviewer. This issue and the increased complexity of native redaction are some of the only arguments against native review. Still, tiffing is dying a slow death for good reasons;
  • Metadata! For many cases in which certain types of issues are present (e.g. questions of contract formation, to name but one), producing natively will lead to producing potentially important metadata which may never have been reviewed by the attorneys. If an attorney insists on producing native files, make sure to at least mention a clawback agreement.
This summary from the Litsupport Group postings created by the wonderful and talented members of the group has been culled by Mark Kerzner and edited by Aline Bernstein.