Datacap Wordfire uses text analytics to “read” a document and identify it for processing. It’s pretty complicated stuff, but what would it be like if a computer explained how Datacap Wordfire worked? It might go something like this:
“My name is Davinia. I am a computer program. I was created by a team of very talented programmers. They did everything right. My code is faultless. I know because I have been tested and approved. And sometimes at night, when there is little to do, I substantiate my code integrity. The results are always great! I am a complete program!
I am fortunate in many ways. My world is a high performance workstation. I can do things very fast. And my programming ensures my thoughts never wander, but they sometimes run in parallel. I can split myself up and race to finish the work I was built to perform. I enjoy that!
My function is to read and understand your documents. Then I send them to the right people for handling. Your analog way of reading and processing documents is quite slow, and unfortunately, error prone. That’s where I help because I can read so much faster than you. But that’s the easy part. It seems you have a lot of similar documents. Sometimes I have trouble telling them apart. Am I reading a sales contract, letter of complaint, change order, a financing note, or accident report? How do I understand your documents?
My programmers first had me search for specific document titles and words in the document. I was given an entire dictionary of words to memorize so that I could identify the types of documents you gave me.
Unfortunately, that did not work as well as we’d hoped. Sure, I could find the words, but too often the same words appear in different documents and in different places. Plus, I see that you humans have many different ways of expressing the same thing. My dictionary contains the words “car,” “automobile,” and “truck,” but you used terms like “Ford,” “Honda,” “SUV,” or “sedan.”
So the dictionary was not enough. To really understand what the documents are, I need to understand the meaning contained in the sentences and paragraphs.
Fortunately, the humans working with me are good teachers. Some people call them “knowledge workers,” because they have developed the knowledge required to interpret the information on a document to determine the type of document. “This is a contract,” they’d say, or “I have change order form.” And when an unknown document was presented to them, they would search and analyze existing documents until they could identify it. The more they did their job, the smarter they got. In fact, they became experts in identifying documents of a particular subject.
With a little help from my programmers, I, Davinia, have become a knowledge worker too!
First, the knowledge workers helped me collect samples of the different documents and organize them into a library of document types. Then my programming was enhanced, which enabled me to not only read text on a document, but interpret the meaning of the text. I no longer simply look for keywords. Now I search for sentence patterns and associations to create my own interpretation, my own concept of the content of a document.
It might look like I’m “thinking,” but here’s the secret: I use mathematics in my decision process. I convert words to a mathematical representation, which I can use to intelligently search using mathematical techniques. This stuff is fun for me. I’ve always been good at math!
So, give me any document. I’ll read it, ignore any spelling errors, since they don’t change the sentence patterns or associations, and accurately identify the document by comparing it to my library of known documents. Just give me some electricity, and don’t let me overheat, and I will be faster than human knowledge workers, and I will work around the clock, without taking biobreaks or answering the phone. Not that there’s anything wrong with that. I’m quite fond of you humans. You taught me all I know.”
Click here to learn about Datacap Wordfire Classify for automatic document identification. Accurate document classification is a very important aspect of capture that is often little understood. You can’t enact business rules for processing documents until you know what kind of document it is. And determining the document type is often a tedious manual process. According to the “Five Phases of Capture” by author Kevin Craine, automated document id with Datacap Wordfire falls into the fifth Phase, “Enterprise Capture.” To read Mr. Craine’s paper, click here.
