PDF OCR FAQ
- Why another OCR option? Why like this?
- Why do I need an API key? How do I get it?
- How much will it cost?
- Is this safe? Can I trust you/Google?
- How big can the input be?
- Will it work with complex page formats?
- Why are some characters wrong, words misplaced, etc.?
- Why are lines returned as-is, e.g., with hyphenation?
Why another OCR option? Why like this?
This interface provides access to Google Cloud Vision OCR without requiring coding or cloud-storage setup. Cloud Vision accepts large multi-page PDFs, delivers high accuracy, and—unlike Google Drive OCR—automatically ignores any low-quality text layer already embedded in the file. More background here.
Why do I need an API key? How do I get it?
Google Cloud Vision is a paid service. Each user supplies their own API key, linked to a Google Cloud project with billing enabled. An API key is a password-like string that you can create or delete at any time. Follow the video below to: 1. Set up a Google Cloud billing account. 2. Enable the Cloud Vision API. 3. Generate an API key.
Written walkthrough
-
Go to the Google Cloud Console:
While logged into your Google account, visit the Google Cloud Console. If it's your first time, you may need to click through some initial setup prompts. -
Enable billing:
Walk through the steps at console.cloud.google.com/billing to enable billing. You'll need a credit card. (See above for more on cost.) -
Select a project:
Either create a new project (click New Project in the top navigation dropdown), or, if this is your first time, just use the default ("My First Project"). -
Enable the Cloud Vision API:
In the left sidebar, go to APIs & Services → Library. Search for "Cloud Vision API" (it has a blue diamond logo). Click it, then click the blue Enable button. -
Create an API key:
Go to APIs & Services → Credentials. Click + Create Credentials and select API key.
A long string will appear. This is your API key. Copy it, store it securely, and treat it like a financial password. With this key, anyone can charge OCR processing (or other services, if you don't restrict the key) to your account. -
(Optional) Restrict the key:
In the API key management screen, click the three dots next to your key, and choose Edit. Click Restrict key, and from the dropdown, select Cloud Vision API. Don't forget to click Save. -
Use the key:
You're now ready to return to the OCR page and paste your key into the field labeled "Google Cloud API key."
How much will it cost?
The first 1,000 pages per month are free. After that, every additional 1,000 pages costs $1.50. You pay only for what you use. New accounts come with a generous $300 credit, valid for 90 days. It's possible to track your usage (e.g., see this Reddit post), but if you're working on personal projects, you likely won't exceed 1,000 pages a month.
Is this safe? Can I trust you/Google?
Your API key is never stored by this application. The open-source code reads the key from the form, sends it with the OCR request to Google, and discards it. Store your key with a password manager (e.g., 1Password) to keep it secure. If you have concerns about PDF contents, avoid uploading sensitive material.
How big can the input be?
Files up to ~128 MB are supported. Split larger files into parts. There's also a 2,000-page limit, though you'll usually hit the size cap first.
Will it work with complex page formats?
Maybe, maybe not. Cropping your PDF pages often improves both accuracy and performance.
Why are some characters wrong, words misplaced, etc.?
Most OCR results contain some errors. Google Cloud Vision tends to produce fewer errors than other options. Please review and clean up as needed, e.g., using regular expressions.
Why are lines returned as-is, e.g., with hyphenation?
This OCR option is more literal, and it won't stitch hyphenated words back together. This is another thing to review and clean up as needed.