OCR Service FAQ

Why another OCR option? Why like this?

Most Sanskrit students and scholars lack access to top-tier OCR because the technology is still slightly out of reach. This blog post explains why expanding access to Google Cloud Vision is worthwhile: it accepts large multi-page PDFs, delivers high accuracy, and—unlike Google Drive OCR—automatically ignores any low-quality text layer already embedded in the file. Larger AI models can sometimes outperform it, but they are even harder to integrate. In recent tests, Google Gemini and Cloud Vision each achieved strong results, yet made different errors, suggesting that combining their outputs could yield the best accuracy. For these reasons, I will continue advocating for Cloud Vision, and this interface strips away nearly all the complexity—no coding, no cloud-storage wrangling.

Why do I need an API key? How do I get it?

Google Cloud Vision is a paid service, and I don’t yet have funding to cover everyone’s usage. Each user therefore supplies their own API key, linked to a Google Cloud project with billing enabled. An API key is simply a password-like string that you can create or delete at any time. Follow the video below to: 1. Set up a Google Cloud billing account. 2. Enable the Cloud Vision API. 3. Generate an API key.



Written walkthrough
  1. Go to the Google Cloud Console:
    While logged into your Google account, visit the Google Cloud Console. If it's your first time, you may need to click through some initial setup prompts.
  2. Enable billing:
    Walk through the steps at console.cloud.google.com/billing to enable billing. You'll need a credit card. (See above for more on cost.)
  3. Select a project:
    Either create a new project (click New Project in the top navigation dropdown), or, if this is your first time, just use the default (“My First Project”).
  4. Enable the Cloud Vision API:
    In the left sidebar, go to APIs & Services → Library. Search for “Cloud Vision API” (it has a blue diamond logo). Click it, then click the blue Enable button.
  5. Create an API key:
    Go to APIs & Services → Credentials. Click + Create Credentials and select API key.
    A long string will appear. This is your API key. Copy it, store it securely, and treat it like a financial password. With this key, anyone can charge OCR processing (or other services, if you don't restrict the key) to your account.
  6. (Optional) Restrict the key:
    In the API key management screen, click the three dots next to your key, and choose Edit. Click Restrict key, and from the dropdown, select Cloud Vision API. Don't forget to click Save.
  7. Use the key:
    You're now ready to return to the OCR page and paste your key into the field labeled “Google Cloud API key.”

How much will it cost?

In most cases, virtually nothing. The first 1,000 pages per month are free. After that, every additional 1,000 pages costs $1.50. You pay only for what you use. New accounts come with a generous $300 credit, valid for 90 days. It's possible to track your usage (e.g., see this Reddit post), but if you're working on personal projects, you likely won't exceed 1,000 pages a month.

Is this safe? Can I trust you/Google?

Regard the use of an API key, yes, this is as safe as any other use of a password. To prevent your key from falling into the wrong hands, I recommend storing it with a password manager (e.g., 1Password). As for Google, you don’t need to worry about them doing anything with the contents of your PDF. If you're concerned, just avoid uploading sensitive material. As for me and my code, I guarantee that your API key is never stored. The code, which is entirely open source, simply reads the key from the HTML form, sends it with the OCR request to Google, and that's it.

How big can the input be?

Files up to ~128 MB are supported. Split larger files into parts. There’s also a 2,000-page limit, though you’ll usually hit the size cap first.

Will it work with complex page formats?

Maybe, maybe not. Cropping your PDF pages often improves both accuracy and performance.

Why are some characters wrong, words misplaced, etc.?

Most OCR results contain some errors. Google Cloud Vision tends to produce fewer errors than other options. Please review and clean up as needed, e.g., using regular expressions.

Why are lines returned as-is, e.g., with hyphenation?

This OCR option is more literal, and it won't stitch hyphenated words back together. This is another thing to review and clean up as needed.