- Predictive coding can offer a fifty-fold improvement over manual review in terms of efficiency for cases with large quantities of Electronically Stored Information.
- The Victorian decision of McConnell Dowell represents the first judicial approval of predictive coding in Australia and creates persuasive authority for New South Wales.
- Predictive coding better adheres to the principle of proportionality than the alternatives in many cases.
The decision in McConnell Dowell Constructions (Aust) Pty Ltd v Santam Ltd (No 1)  VSC 734 (‘McConnell Dowell’) signals the emerging judicial acceptance of predictive coding and should encourage practitioners in New South Wales to embrace the technology. Predictive coding provides solicitors with a rare opportunity to dramatically improve the time efficiency and cost effectiveness of court proceedings while also enhancing justice.
Machine learning, predictive coding and technology assisted review
Predictive coding is the application of machine learning to the process of discovery. It is a process which uses statistical modelling to make predictions about the relevance of documents in discovery in lieu of human review. Predictive coding has been frequently referred to as ‘technology assisted review’ or ‘TAR’ by the legal community. Unfortunately, this term obfuscates the range of ways in which technology assists in the review process. TAR is frequently and indiscriminately used to refer to keyword searches, concept searching, predictive coding, or all of the above. In practice, each of these technologies works differently, produces different results, and achieves different levels of accuracy. This article champions predictive coding, the most groundbreaking and disruptive of those technologies, which is changing the way in which large scale electronic discovery takes place.
Predictive coding addresses the challenges posed by electronically stored information (‘ESI’) during the process of discovery. ESI is the cause of significant cost and delay in the discovery process due to its volume, duplicability, and dispersion. Rapid and continuing developments in information technology have increased the number of potentially relevant documents that need to be analysed. Estimates suggest that 90 per cent of business records are already held in electronic format only and individuals within businesses will send and receive an average of 140 emails a day by 2018.
Predictive coding is concerned with the collection phase of discovery in which documents are gathered and reviewed for relevance. Frequently, the desired documents only make up a small proportion of the documents initially obtained. This process has been traditionally undertaken by junior associates and paralegals who manually review documents to identify those which may be relevant. Predictive coding, on the other hand, uses machine learning to identify relevant documents. The process sees a senior lawyer or small team review and code a ‘seed set’ of documents. A computer then identifies similarities and patterns within those documents and attempts to predict the coding for additional samples. When the coding of the human reviewers and the computer sufficiently coincides, the computer is able to make confident predictions for the balance of the documents.
The total number of documents reviewed by the senior lawyer is typically only a few thousand. Studies have determined that predictive coding can yield more accurate results than exhaustive manual review and, in some cases, offer a fifty-fold saving in terms of documents reviewed manually (see Maura Grossman and Gordon Cormack, ‘Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review’, (2011) 17 Richmond Journal of Law and Technology 11).