eDocfile: Automated Optical Technology Supports World's Second-Largest Medical Vaccine Trial
By Keith Passaur, president and owner, eDocfile, Inc., Valrico, Florida
eDocfile is a consulting firm that specializes in creating image-capture programs to increase our clients’ productivity. Over the past seven years, we’ve used Macro Scheduler in virtually every project we take on, so we don’t have to “recreate the wheel” each time we need to provide the same automation steps.
In early 2009, we were contracted by a consulting firm in The Netherlands for a small programming project. A few weeks later, they contacted us again—this time with something much larger and much more challenging in mind. Their client, a Dutch hospital, needed automation assistance to support a very large, multi-site medical vaccine trial—it would, in fact, be the second-largest project of its kind ever, involving 85,000 patients. In all, 340,000 documents—or 4,000 documents each day—would have to be scanned with Optical Character Recognition (OCR) technology for automated filing. File by OCR, one of the products we offer, had already been purchased. File by OCR works with searchable pdf or tiff files to automatically perform OCR and extract text. The extracted text is parsed and used to rename and relocate the file to build a file folder hierarchy.
The consultants asked us to modify File by OCR so the hospital’s vaccine-trial files could be named based on index information (patient number, center number, and document type).
Scanning Forms for Faxing, OCR Filing
To make sure the customized program would work as intended for the clinical staff, it was important that we completely understood how the data would be collected, transmitted, and entered into the program. Since OCR would read data from a pre-printed form that could not be altered (such as by adding a barcode), we requested original copies.
We received a six-page tri-fold form that each patient would complete. On the third page, a distinct vertical number, readable by OCR, contained the patient number and center number where the form was generated. We were told the completed forms would be scanned at each remote center in a duplex manner, creating a two-page tiff file for sending to the hospital. There, the scanned image would be separated into six individual pages and the vertical number extracted for filing. The file would then be re-assembled into a five-page pdf (page 6 of the form was blank) for automatic filing. All processing of documents would be done in a batch process after scanning, freeing the users to move on to other tasks while the OCR process was underway.
Seeing the forms and understanding the procedure, we were able to modify File by OCR so the incoming documents could be stored in a logical file folder hierarchy, separated by center. Because they would be kept in a completely non-proprietary format, the documents could be imported into any database and a hyperlink assigned for retrieval.
Macro Scheduler Speeds Filing, Data Verification
Because each document would be read with OCR and filed using the same steps, we used Macro Scheduler to automate initiating the file command lines. Macro Scheduler allows us to use regular expressions to parse text and automate the moving and renaming of files. Once the steps are put together in the script, they can easily be modified for use in other programs to automate similar actions.
Each center had been assigned a certain range of patient numbers, so to check for missing or misfiled documents we created a macro that compared the list of assigned numbers for that center to patient numbers to validate the OCR. Also, since all files must be accounted for, a macro was created to extract the patient number, center number, and the trial stage from an Excel spreadsheet for validation.
Catching OCR Errors
Because OCR is not 100% accurate, we wrote scripts that would test and retest the data captured for any errors. The net result is 1 error per 1,000 scanned images. Those with errors must be processed manually.
The Dutch consultants installed our Macro Scheduler-enhanced program in fall 2009, and the complete process was validated by an external auditing company.
The hospital is processing 1,300 documents and more than 2,600 faxes daily. Users manually process the three or four fails each day with the manual processing tools built into the software.
We think this is a great example of how Macro Scheduler can be used to automate several repetitive steps in an environment where everything must be executed flawlessly, with redundant cross-checking and seamless integration.
The Dutch consultants were pleased, and, in fact, they’ve said they’ll be calling us again very soon, perhaps with something even bigger in mind….
Keith Passaur can be reached at firstname.lastname@example.org, 813-413-5599