A look under the bonnet of Control Risks eDiscovery engine

The roadmap to successful information governance Editor Engaging stakeholders with effective tactics – Part 1 Headlines today are frequently dominated by d(...)
AI in 2024: Should We Still Be “Moving Fast and Breaking Things”? Editor Where are things headed with AI in the coming year? What trends should legal IT professionals be awa(...)

Newswire

3 April 2024
ProSearch Legal Discovery Experts to Share Insights, Contribute to Key Industry Initiatives at Upcoming Events
2 April 2024
Am Law 200 Firm Sullivan & Worcester Progresses with 3E

PSA partner with Konieczny Consulting to help law firms accelerate their digital transformation

Foot Anstey appoints new Executive Director of IT
27 March 2024
Syllo Bolsters Leadership Team with Appointments of Eric Wall and Oz Ben-Ami

MaxVal's Symphony Selected by Meunier Carlin & Curfman LLC to Elevate IP Management and Annuity Processes

A new era for Access Legal Proclaim - offering law firms greater flexibility, integration & speed
26 March 2024
Opus 2 appoints chief revenue officer to drive growth and expansion
21 March 2024
Legal Tech Training Vendors Join Trailblazing New LTC4 Alliance

Millnet acquires four US offices from Consilio following management buyout

PRO Partners

TravelingCoaches

iComplibyLegalRM

Worldox

SeeUnity

TigerEye

DocsCorp

BigHand

Prosperoware

RBROSolutions

WilsonAllen

Apperio

LTC4

Elite

Relativity

iTrainLegal

Peppermint

PracticeEvolve

NetDocuments

Ascertus

Katchr

Advanced

Opus2

iManage

MattersCloud

Bundledocs

TIQTime

: 17
Sep
2012; Joanna Goodman

Joanna Goodman Control Risks’ presentation on ‘The future of eDiscovery technology’ sounded like a bold attempt to predict the predictive. It was actually a practical walk through two key eDiscovery tools – concept clustering and predictive coding. Although it focused on Control Risks’ toolkit, the principles are applicable to other eDiscovery products.

While sophisticated technology impresses client organisations sufficiently to make them invest in it, users often fail to make use of its features. The main challenge is user confidence and understanding how the technology works.

Principal architect, legal technology, Tom McCotter’s analogy for document analysis in eDiscovery is condensing ‘War and Peace’ into revision notes that contain the essence of the story, characters and underlying message. Contextual clustering is a key element in the redaction process. His presentation took us through some of the tools and techniques.

How contextual clustering works
All technology produces better results from data with rich context. Preserving groups of data – with closely related contexts – allows you to retain its richness. McCotter’s demonstration applied Control Risks’ software to the 20 Newsgroups data set, which developers and others commonly use to test their software and findings.

Each of the 20 Newsgroups has 1,000 articles. The software divided the articles into six categories, three layers deep. The categories and layers are defined by two heuristics: words and terms that appear in relatively few articles; and words and terms that appear frequently in the same document (i.e. ‘hockey’ will indicate that an article should be categorised as sport). Next steps include weighting documents by significance, language and potentially by a basic level of sentiment analysis – i.e. positive terms tend to be separated by ‘and’ while negative ones are separated by ‘but’.

User reluctance around conceptual clustering often occurs because although people understand the principle, they are not clear about how the technology works. This is not helped by the fact that the term is a misnomer: rather than being about clusters per se, it is about high-dimensional similarity – identifying the multitude of factors that define a group.

When conceptual clustering is used for document analysis in eDiscovery, the software identifies some 10,000 to 20,000 dimensions for each document. A dimension can be determined by plotting two factors on a graph. For example, if you were to plot restaurants by quality and price, McDonalds and Burger King would appear close to each other. Each dimension is visualised as a point on a sphere, and cosine similarity, which determines the distance and angle between the points, is used to calculate frequency and similarity.

We were then given a look ‘under the bonnet’ of Control Risks software development model, which is designed to bring the technology closer to the users, and includes self-service technology for smaller-scale eDiscovery projects and parts of larger projects. ‘Agile Scrum Methodology’ enables developers to deliver tailored functionality in response to clients’ specific requirements within a very short timescale.

Avoiding common pitfalls
Questions from the floor raised some interesting issues. When it comes to predictive coding, which is another key feature of the Control Risks eDiscovery toolkit, the differentiator is the ability to identify the right keywords. When choosing a software solution, McCotter suggests asking different vendors to analyse the same dataset to find the most appropriate predictive modelling algorithm for your business.

Common pitfalls included document sets with multiple languages, because algorithms are language agnostic. One solution is to apply language determination software first to filter out documents in different languages for separate analysis. Spreadsheets are also worth filtering out because they tend to contain similar vocabulary, whatever their context.

Another question related to ranking results – the software gives each document a relevance score, and these scores can be used to manipulate contextual clustering and predictive coding and determine which documents should be included in the next stage of the review.

Education and engagement
Like all statistical forecasting, eDiscovery combines art and science – choosing the appropriate variables and applying the right algorithms. Senior consultant Adam Page added that investing in a comprehensive suite of tools – as provided by Control Risks – allows clients to predict eDiscovery costs. This is a critical consideration, particularly as UK practice direction 31B includes provisions related to the technology, timing and cost of eDiscovery.

The chasm between understanding the underlying principles of eDiscovery and applying the technology effectively can be bridged by educating people within user organisations so that investing in sophisticated technology is no longer a leap of faith.

For vendors, it is becoming increasingly important to ensure that their products are also sold internally, within the client organisation. Continuous training and development help users make the most of the increasingly complex software that significantly reduces the time and manpower required for eDiscovery.

As with many complex IT systems, the key message is around user engagement and embedding tools and techniques into working practices. Events like this that explain how they are developed and how they work on a practical level are surely an important step in the right direction.

Media Partnerships

We offer organizers of legal IT seminars, events and conferences a unique marketing and promotion opportunity. Legal IT Professionals has been selected official media partner for many events.

development by motivus.pt