Apr 08, 2026 arXiv cs.LG

New Metric for OCR Quality Evaluated

Researchers have developed a new way to measure the quality of Optical Character Recognition (OCR) technology, which is used to extract text from images. The current standard metric, called Character Error Rate (CER), has a flaw: it assumes that the text has been perfectly parsed, which is often not the case. To fix this, the researchers created a new metric called the Character Error Vector (CEV), which can be broken down into three parts: parsing errors, OCR errors, and interaction errors. This allows researchers to focus on the specific part of the process that's causing the most problems. The new metric was tested on a dataset of old newspaper images and found to be more accurate than traditional methods, even when the images are degraded and the text is hard to read.

Read Original Paper Share on Twitter