Newswire

PRO Partners

Universal document searchability: the case for shifting the OCR goalposts

Dean SappeyThe dictionary defines the term Achilles heel as “a seemingly small but actually crucial weakness,” and that seems to be an appropriate term to use when writing about indexing and searching in document and enterprise content management systems. This article looks at why it would be unwise to assume that all documents in a content repository are completely and fully searchable despite the OCR'ing hardware and software as well as the advanced search technology that many law firms have at their disposal.

The risks are great

Law firms have invested significantly in document management systems and search technologies over the years to store and manage all their client documents. The idea was that this would deliver efficiency and complete access to each and every document related to a specific case or matter. It was simply a matter of typing a series of keywords into a search query field, and all the documents that met the search criteria would be displayed. Sounds great, in theory.

However, the reality is very different. Research indicates that as much as 30% of documents in a content repository are actually "invisible" to search. This means that about a third of the documents relating to your case are missing. The culprit is usually turns out to be image-based documents. 

Image-based documents are JPGs, TIFs, PNGs and image PDFs. These documents get profiled in a variety ways into your document management system. While many of these documents get OCR'ed, many do not, and since they are image files with no text, they do not get indexed. Instead they become invisible to your search technology. This has enormous implications for law firms and law departments.

Failure to produce documents on demand impacts the bottom line, workplace efficiency, regulatory compliance, productivity and exposes a firm to unnecessary risks, which can lead to sanctions, dismissal of claims, ultimate loss of case as well as undermining a firm's reputation.

OCR - wrong time, wrong place

So what is the answer? Perhaps the answer is not so much what, but when. 

Clearly the solution to this is OCR'ing, ie converting image-based documents to text-searchable documents so that they can be indexed when profiled into the document management system. The bigger question is when and where should the OCR'ing process start.

”Legal

Multifunction devices, scanners and OCR'ing software are commonplace in a modern law firm. The common practice for OCR'ing in most firms is at the point of entry, ie paper and electronic documents get OCR'ed as soon as they are received by the firm. This however is inefficient, costly and unreliable. Consider how much time staff spend either OCR'ing documents at their desk or feeding documents into the scanner or multifunction device. Or how documents bypass the OCR'ing process all together; documents ingested from acquisitions and imported litigation files; documents saved into the document management system using mobile technology; or how your legacy documents are bulk imported into these systems.

Now the 30% of non-searchable content starts to make sense.

OCR - right time, right place

So if OCR'ing documents at the point of entry is not the answer. What is? OCR'ing documents at the “end point,” ie when the documents have been saved into the document management system makes the most sense.

Shifting the goalposts to a backend rather than a frontend process will deliver huge benefits to law firms in terms of efficiency, productivity, searchability as well as cost savings. More importantly, a backend approach to OCR'ing will ensure that all documents are made searchable once they are saved into the content repository, irrespective of the entry point.

OCR'ing will work in two modes: one will monitor newly profiled documents so that they are OCR'ed and made available for indexing immediately; the other will OCR all the legacy documents in the system. This approach provides law firms with significant benefits:

100% searchability - all image-based documents in the document management system are OCR'ed, adding an invisible layer of text to documents. This will ensure that the document is indexed by the system. Law firms can be certain that all documents are completely searchable. 

Increased organizational productivity - staff members do not need to OCR documents. Instead, they can concentrate on more important tasks. By ensuring that every document is text-searchable, firms will be able to eliminate productivity losses and downtime looking for looking for lost or misfiled documents.

Increased efficiency through automation - firms will be able to automate the entire process so that processing can take place 24/7.

Simplified management of image-based documents - firms will be able to do away with multiple OCR'ing processes and workflows, replacing them with a single, centralized approach.

Reduced costs - firms will be able to reduce OCR'ing hardware and software requirements.

Summary

IT Administrators have been lulled into a sense of false security on document indexing and searching in content repositories. Mobile technology, document ingestion and staff workarounds have punched huge holes in OCR'ing processes and workflows. This has enormous implications for law firms and law departments. 

A backend rather than a frontend approach to OCR'ing will ensure that ALL documents in the content repository are made searchable once they are saved into the content repository, irrespective of the entry point. An automated system with complete visibility and control over image-based documents will provide IT Administrators with a renewed sense of security.

Dean Sappey is President and co-founder of DocsCorp, developers of contentCrawler, which ensures full document searchability across new and legacy documents in content repositories. DocsCorp has offices in Portland, OR; Sydney, Australia; London; UK. 
 

Copyright © 2023 Legal IT Professionals. All Rights Reserved.

Media Partnerships

We offer organizers of legal IT seminars, events and conferences a unique marketing and promotion opportunity. Legal IT Professionals has been selected official media partner for many events.

development by motivus.pt