Open Forem

Elena Hartmann
Elena Hartmann

Posted on

How to Get the Highest Quality Scanned Document Translation with OCR

Need to translate scanned documents but frequently encounter quality issues while using a scanned document translator online? When you translate scanned PDF document files from one language to another, it first needs to be readable by your file translator.

In this post, we'll use Pairaphrase as an example (the web-based translation management system for enterprises). Like most scanned PDF translator tools, it uses OCR (Optical Character Recognition) to read your scanned PDF file's text. Without OCR, the translation result will look like gibberish.

In this article, we'll focus on the importance of resolution and text clarity in order to achieve the most accurate scanned document translation. Continue reading to learn how to get the highest quality scanned PDF translation with these OCR troubleshooting tips.

How to Optimize a Scanned Document for OCR & Translation

1. Check Your Scanned PDF for Text Obstructions

One of the first steps to ensure you get the highest quality scanned document translation possible is to remove any text obstructions that might exist in your file. Watch the video above to get detailed examples and learn more about why this is important.

Text obstructions can include any of the following:

  • Stamps that cover text
  • Watermarks that cover text
  • Handwriting that covers text
  • Drastic creasing that obstructs your text
  • Text showing through from the other side
  • Another document layered on top of it
  • Text within embedded images
  • Very low contrast between the background & text
  • Graphics on dark backgrounds
  • Ornate / heavily scripted text
  • Handwritten text (OCR technology cannot yet accurately recognize it)
  • Text cut off mid-sentence

2. Confirm Your File is High Resolution

If you have access to the original file, scan it in again at a higher resolution (300 DPI or higher). Otherwise, it might not be readable enough. Even if it visually appears to have great resolution when you look at the file on your computer.

Hint: To find out if your document is high resolution, check the file size in your finder window / file directory. Each page should be around 1MB or higher if it's high resolution (this isn't always the case–but most often it is).

How to Scan Your Document in at a Higher Resolution

If at all possible, try to get your hands on the original paper document and re-scan it in at a high resolution. Set the scan at a high enough resolution to be read by Pairaphrase (again, aim for 300+ DPI). This is in your scanner’s settings.

Aim for a 300-600 DPI resolution for the best results.

The file size will be fairly large, but you will get a much better translation if you go this route.

The scanner software on your computer will have settings that you can change. This means you can make changes in the settings to scan the document in as a PDF and at high resolution. Aim for a 300-600 dpi resolution for the best results, and save the file after you scan it.

Is there an Easier Way to Get High-Quality Scanned Document Translation?

Unfortunately, this is currently the only way to prepare a scanned file for the highest quality translation possible. This is why you should get your hands on the original digital file that was eventually printed as the paper version of itself (if at all possible).

Source: This blog was originally published at Pairphrase.com

Top comments (0)