Optical character recognition (OCR) extracts printed or handwritten text from images, such as posters, street signs, and product labels, as well as from documents like articles, reports, forms and invoices. At its core, OCR enables computers to read documents in the same way that a human can: by recognizing letters’ patterns and picking out text from an image.

This eliminates the need for manual data entry, thereby saving Accounts Payable teams thousands of hours in a year.

Need for change
45% of invoices are still received manually (Ardent Partners, 2022). Today, the average cost of processing an invoice is $9.25 (Ardent Partners, 2022). With transactions expected to quadruple by 2035 (Billentis,2019), this poses a significant and costly challenge for overwhelmed AP departments already short on resources.

Many best-in-class organizations have begun moving towards fully automating their AP processes. In fact, 80% of companies are expected to solely use electronic invoicing (e-invoicing)by 2025. There are a variety of factors driving this seismic shift.

New government mandates around the world resulted in B2B, B2G, and B2C e-invoicing, e-reporting, e-filing, e-auditing, and compliance requirements to combat the VAT gap.
IT systems and business models have evolved beyond traditional paper-based processes, requiring organizations to be more agile.
Digitized invoice data alone is no longer sufficient. The demand to support additional automated processes across AP, AR, and procurement has and will continue to increase over the next two to five years.
There has been a significant shift in how suppliers and buyers communicate and collaborate through emerging and established technologies.
How does OCR work?
Let’s dive right into how the OCR engine or OCR software works.

Image acquisition
A scanner reads documents and converts them to binary data. The OCR software analyzes the scanned image and classifies the light areas as background and the dark areas as text.

Preprocessing
The OCR software first cleans the image and removes errors to prepare it for reading. There are 4cleaning techniques involved here.

Tilting the scanned document slightly to fix alignment issues during the scan.
Removing any digital image spots or smoothing the edges of text images.
Cleaning up boxes and lines in the image.
Script recognition for multi-language OCR technology
Text recognition
The two main types of processes that OCR software uses for text recognition are pattern matching and feature extraction.

Pattern matching
Pattern matching works by isolating a character image and comparing it with a similarly stored pattern. Pattern recognition works only if the stored pattern has a similar font and scale to the input character. This method works well with scanned images of documents that have been typed in a known font.

Feature extraction
Feature extraction breaks down or decomposes the pattern into features such as lines, closed loops, line direction, and line intersections. It then uses these features to find the best match or the nearest neighbor among its various stored glyphs.

Post Processing
After analysis, the system converts the extracted text data into a computerized file. Some OCR systems can create annotated PDF files that include both the before and after versions of the scanned document.

Types of OCR

Data scientists classify different types of OCR technologies based on their use and application. Let’s look at 4 examples.

Simple optical character recognition software

Simple OCR works by storing many different font and text image patterns as templates. The software uses pattern-matching algorithms to compare text images, character by character, to its internal database. If the system matches the text word by word, it is called Optical Word Recognition. This solution has limitations because there are virtually unlimited font and handwriting styles, and every single type cannot be captured and stored in the database.

Intelligent character recognition software

Intelligent character recognition (ICR) technology reads the text the same way humans do. They use advanced methods that train machines to behave like humans by using machine learning software. A machine learning system called a neural network analyzes the text over many levels, processing the image repeatedly. It looks for different image attributes, such as curves, lines, intersections, and loops, and combines the results of all these different levels of analysis to get the result. Even though ICR typically processes the images one character at a time, the process is fast, with results obtained in seconds.

Intelligent word recognition

Intelligent word recognition systems work on the same principles as ICR, but process whole word images instead of preprocessing the images into characters.

Optical mark recognition

Optical mark recognition identifies logos, watermarks, and other text symbols in a document.

Limitations of template-based OCR

Traditional OCR was initially invented for blind people to convert printed characters into speech. Later, the technology was utilized to read and recognize black text against a white background. Hence, OCR doesn’t come without a few challenges.

Here are the five main limitations of traditional OCR.

Dependent on input quality

The text recognition and extraction quality directly depend on the image input quality fed to the engine. For instance, the accuracy drops drastically when the character height is below 20 pixels.

Templates and rules reliant

Traditional OCR requires templates and rules to perform. Strict rules must be set up by programming the engine to capture data from the correct fields and lines. Therefore, it cannot cope with the diversity of documents and struggles with unstructured ones.

Lack of automation

As a result of being reliant on templates and rules, traditional OCR lacks many automation possibilities. For instance, invoices come in various styles and formats, leading to many, many rules.

Adding more rules would mean more data and resources needed to be spent on training the OCR engine. There will always be more rules that need to be set up with the conventional approach, so this can become a serious bottleneck.

Expensive

As more rules and algorithms are required to be developed to increase accuracy, traditional OCR can become very expensive. In addition to that, creating these rules and algorithms does not always guarantee a high-quality output as it also depends on the image input quality.

Copes poorly with a high document variety

With traditional OCR, the output is often highly accurate when documents are simple and come with few variations. However, many businesses need to process various documents within their workflows.

The higher the document variety, the more challenging it becomes. Because the traditional OCR engine is trained with templates, it cannot keep up with a high document variety.

Traditional OCR is not perfect. But let’s also account that as markets get more demanding each year, OCR has taken multiple leaps forward to match that demand.

Next-gen OCR

The next generation of OCR is powered by both Machine Learning and AI. This revolutionizing technology is known as intelligent Document Processing (IDP).

IDP can make sense of data, categorize, organize, and convert the data automatically for the user, all of this within seconds.

One of the major advancements is that it’s not restricted to templates or rules like its conventional precedent. This makes it more scalable and affordable for businesses.

Let’s take a closer look at this solution.

The machine learning approach

With Machine Learning (ML), the OCR can be trained to recognize patterns and the meaning of content through a set of rules. This can be done through supervised learning, unsupervised learning, or combining these two training methods.

But what exactly is supervised learning, unsupervised learning, or a combination of both? Le’s try to keep it as simple as possible.

Supervised learning

Supervised learning refers to using labeled data sets to train algorithms that classify data and predict outcomes with high accuracy. The model needs to be fed with a large amount of input data to achieve this.

If you would like to predict if an email is spam and put it in a category, you need to feed the engine with enough spam emails. With enough data, the model can recognize and predict the category and thus classify an email correctly.

Unsupervised learning

Unsupervised learning is similar to supervised learning. The difference is that unsupervised learning uses unlabeled instead of labeled data. This approach is more useful when common properties are hard to identify within a data set, which gives the model more freedom.

To put it simply, unsupervised learning can replicate the human capabilities to learn.If your business needs to process receipts, you will need to feed the unsupervised learning model with many receipts. The model then takes this information to predict whether the next document is a receipt or not based on similarities.

Semi-supervised learning

In this, the input data is both labeled and unlabeled in semi-supervised learning. It is used when dealing with high volumes of data.

As semi-supervised learning combines the best of both. It is ideally used for cases where a small number of training data can bring extraordinary results in terms of accuracy.

Benefits of next-gen OCR

Advanced OCR solutions can do much more. We have listed a few benefits below.

Digitize documents within seconds – With OCR software, your organization can go paperless and have data extracted from documents in a digitalized format such as PDF, JSON, CSV, XLM, etc. This can be done in a few seconds.

Faster implementation time– Advanced OCR solutions are not totally reliant on rules and templates. Hence, it takes less time to train the engine and implement the technology.

Scalability– The next generation of OCR cloud solutions offers scalability. While it is possible to scale with template-based OCR, it can soon become too expensive for businesses.

Higher accuracy –Advanced solutions embedded within Machine Learning can get up to 98%. While manual data extraction yields higher accuracy, it is way slower and inefficient.

Reduction of manual entry mistakes – OCR can automate and reduce human error and manual data entry mistakes. With AI and Machine Learning, the error rate can be reduced even further.

Faster turnaround time – Manually verifying and extracting data can take 10-20 minutes per document, while traditional OCR can do that in less than half the time. IDP, however, can do that within 15 seconds, which equals 98% of the time saved.

Cost reduction –As AI-powered OCR automates tedious tasks and minimizes data entry mistakes, the overhead is significantly lowered. This leads us to one of the main benefits for organizations: cost reduction. Traditional OCR can reduce the cost per document to €1-2 and IDP to less than €0.50.

What is OCR used for?

Let’s highlight a few use cases below to inspire you to start using an OCR solution for similar procedures within your organization.

Receipt OCR for loyalty programs
Data extraction from IDs for customer onboarding
Automated invoice processing for accounts payable
Automating document completeness checks
Important considerations when selecting an OCR provider
It’s important to consider how invoice capture fits into the overall automation strategy of your organization. These are just a few questions to consider when researching solution providers.

How many suppliers do you do business with? Do your suppliers require maintenance of multiple invoice formats?
What are the preferred outcomes of implementing an OCR solution?
Eliminate manual data entry tasks? Improve data accuracy? Increase straight-through processing? Future-proof your AP process? Ensure on-time payments.
Do you need a short-term solution, or is this part of a long-term digital transformation strategy?
Does your OCR provider have end-to-end automation capabilities that meet your future transformation goals?
Does your organization need real-time visibility with the invoice data being captured by the OCR tool?
Does your organization do business in countries that currently have or will soon have invoicing compliance mandates?
How to start integrating OCR?
There are several things to consider when thinking about integrating OCR into your business. Such factors can be the document type, the document processing volume per month, your organization’s resources, your use case and so forth.

To help you, we have listed the following options:

Integration with OCR API
Mobile scanning solution
End-to-end solution

Optical Character Recognition and Serina

Serina has everything you need to automate your Accounts Payable. Additionally, complete, verified, and accurate invoice data takes you one step closer to realizing the full potential of automating your AP processes. Serina’s AP Automation solution allows you to receive all your invoices electronically straight into your financial system, regardless of the systems you use, the size of your business, or the digital capabilities of your suppliers.

We meet you wherever you are and wherever you may be headed in your AP Automation journey.