YomiNinja OCR Engines — PaddleOCR vs MangaOCR vs Google Vision

At a glance

Five engines, one table

Engine	Platform	Offline?	GPU needed?	Best for	Accuracy*
PaddleOCR Default	Win · Linux · Mac	Yes	No (optional)	Most games, standard fonts	●●●●○
MangaOCR	Win · Linux	Yes	Optional (CUDA/ROCm)	Manga, VNs, stylized fonts	●●●●○
Google Cloud Vision	All	No (API)	No	Difficult fonts, max accuracy	●●●●●
Google Lens	All	No (API)	No	General fallback, no API key	●●●●○
Apple Vision	macOS only	Yes	No	macOS users, vertical text	●●●●○

* Accuracy on standard printed Japanese game text. Results vary significantly by font type, contrast, and resolution.

Default

PaddleOCR — The recommended starting point

PaddleOCR is the default OCR engine in YomiNinja and the right choice for the majority of Japanese games. It's built on PaddlePaddle, Baidu's deep learning framework, and includes models specifically trained on Japanese text recognition.

What PaddleOCR handles well

Standard printed Japanese — the kind found in most JRPG dialogue boxes, menus, and story text. This includes both horizontal and vertical text layouts, furigana (though the furigana threshold setting may need adjustment), and mixed kanji-kana sentences at normal game font sizes.

PaddleOCR's recognition speed is 200–400ms per capture

On a modern CPU without GPU acceleration, PaddleOCR processes a typical 600×100px dialogue region in 200–400 milliseconds. With GPU acceleration enabled (CUDA on NVIDIA), this drops to under 100ms. For most gameplay pacing, the CPU speed is fast enough that delays are imperceptible.

When PaddleOCR struggles

Games with highly stylized, calligraphic, or hand-drawn fonts are the main weakness. If kanji in the game looks decorative rather than standard printed type — common in older games, doujin titles, or games with a specific art style — PaddleOCR will make more errors. In these cases, switch to MangaOCR.

Low-resolution text is another weakness. At very small font sizes or low DPI, even well-designed OCR models struggle with visually similar kanji pairs (土/士, 己/巳, 大/太). Increasing the game window resolution helps significantly.

GPU acceleration with CUDA and ROCm

If you have a compatible GPU, PaddleOCR can use it for faster inference. NVIDIA cards use CUDA (requires CUDA 11.8 or compatible driver). AMD cards use ROCm (Linux only; added in v0.9.1). GPU mode is particularly useful in Auto OCR where captures fire continuously — the lower per-capture latency keeps the overlay more responsive.

Specs

Framework: PaddlePaddle (Baidu)
Runtime: Local (Python)
Language: Japanese, Chinese, English
GPU: Optional (CUDA / ROCm)
Speed (CPU): 200–400ms
Speed (GPU): <100ms

Use PaddleOCR when:

Starting out with any new game. Switch only if accuracy is noticeably poor on the specific game's font.

Stylized Fonts

MangaOCR — For visual novels, manga, and stylized game fonts

MangaOCR is a specialized OCR model developed specifically for Japanese manga and comic text. It's trained on a massive dataset of scanned manga pages, which means it's seen thousands of examples of the hand-drawn, stylized, and non-standard fonts that appear in comics — and by extension, in many visual novels and artistically-styled games.

MangaOCR recognizes fonts that PaddleOCR misreads

The core advantage over PaddleOCR is tolerance for font variation. Where PaddleOCR expects clean, printed characters, MangaOCR was trained to handle characters drawn freehand, with deliberate imperfections and stylistic flourishes. A kanji that looks like an artist drew it rather than a typographer designed it is MangaOCR's home territory.

Practical examples where MangaOCR outperforms PaddleOCR:

Visual novels with custom hand-lettered fonts
Older console games (SFC, PS1-era) with low-resolution pixel fonts
Doujin RPGs with non-standard font choices
Games with text rendered at angles or with decorative effects

On-demand vs continuous mode

MangaOCR offers two operating modes. Continuous mode keeps the model loaded in memory and processes captures as they arrive — faster per capture, but uses more RAM. On-demand mode (added in v0.9.1) loads the model only when a capture is triggered, releasing memory between recognitions. On-demand mode is recommended for machines with less than 8GB RAM.

Windows and Linux only

MangaOCR is not available on macOS. macOS users should use Apple Vision Framework (for standard text) or Google Cloud Vision (for difficult fonts) instead.

Cloud

Google Cloud Vision — Maximum accuracy, requires API key

Google Cloud Vision is the highest-accuracy option in YomiNinja, backed by Google's production OCR infrastructure. It handles unusual fonts, low contrast, and complex layouts more robustly than any local engine.

Google Cloud Vision sends captures to Google's servers

Unlike PaddleOCR and MangaOCR, Cloud Vision is not a local model. Each capture is sent as an image to Google's Vision API, which returns text recognition results. This means it requires an internet connection and has a small per-request latency (typically 300–800ms depending on network conditions).

The free tier covers moderate use

Google Cloud Vision offers 1,000 free API calls per month. For casual gaming sessions, this is often enough. Beyond the free tier, pricing is $1.50 per 1,000 requests. For heavy Auto OCR use, the cost can accumulate — but switching to a local engine for normal gameplay and only using Cloud Vision for difficult games is a practical approach.

Setting up a Google Cloud Vision API key

Go to Google Cloud Console and create a project.
Enable the Cloud Vision API for your project.
Create an API key under APIs & Services → Credentials.
Paste the API key into YomiNinja's OCR settings under Cloud Vision.

Specs

Provider: Google Cloud Platform
Runtime: Cloud API
Requires: API key + internet
Free tier: 1,000 requests/month
Latency: 300–800ms (network)

Use Cloud Vision when:

Local engines fail on a specific game's font and accuracy is critical. Not recommended for default Auto OCR use due to API cost.

Cloud · No key

Google Lens — Cloud accuracy without an API key

Google Lens provides OCR through Google's mobile lens service, which is accessible without a Cloud Platform API key. This makes it a convenient fallback for users who want cloud-quality accuracy without the Cloud Console setup.

Google Lens accuracy is comparable to Cloud Vision for most content

For standard Japanese game text, Google Lens and Google Cloud Vision give similar results. The main differences are in edge cases: Cloud Vision has better handling of very low contrast text, and Cloud Vision's bounding box coordinates are more precise — which matters for character-level hover detection.

Rate limiting is the primary concern

Google Lens doesn't provide an official API — YomiNinja accesses it through a web interface. Google may rate-limit or block requests during heavy use, which can cause captures to fail silently. If you're experiencing dropouts, switch to Cloud Vision with an API key or fall back to a local engine.

macOS only

Apple Vision Framework — The native macOS choice

Apple Vision Framework is the native OCR engine built into macOS, available to YomiNinja on Apple Silicon and Intel Macs. It runs entirely on-device using Apple's Neural Engine where available, providing fast, private text recognition without any external API or GPU setup.

Apple Vision supports vertical Japanese text

One of Apple Vision's standout features for Japanese content is explicit support for vertical text recognition. Many classic Japanese games, VNs, and some UI elements use vertical text layout (top-to-bottom, right-to-left). PaddleOCR handles this inconsistently; Apple Vision recognizes it natively.

Language support depends on macOS version

Apple Vision's language support for Japanese text varies between macOS versions. Newer macOS releases (Ventura and later) have significantly improved Japanese model quality. On older macOS versions, accuracy may be lower than PaddleOCR. Check the Apple documentation for your specific macOS version's supported languages.

Not available on Windows or Linux

Apple Vision Framework is an Apple OS API and cannot run outside macOS. Windows and Linux users should use PaddleOCR or MangaOCR.

Decision guide

Which engine for your situation

Starting a new game for the first time?

→ Start with PaddleOCR. Switch only if you notice accuracy problems.

Playing a visual novel, manga reader, or game with an artistic font?

→ Use MangaOCR. It handles stylized fonts better than any local alternative.

Both PaddleOCR and MangaOCR are inaccurate on this specific game?

→ Try Google Cloud Vision. Set up an API key — the free tier is generous enough for testing.

On macOS and want the simplest setup?

→ Use Apple Vision Framework. It's built in, no setup required, and excellent for standard Japanese text.

Have an NVIDIA or AMD GPU and want the fastest Auto OCR?

→ Enable GPU acceleration for PaddleOCR or MangaOCR in YomiNinja settings.

Performance

GPU acceleration: CUDA and ROCm

YomiNinja supports GPU-accelerated OCR for both NVIDIA and AMD hardware, added in v0.9.1:

NVIDIA (CUDA): requires CUDA 11.8 or a compatible driver. Supported on Windows and Linux. Works with PaddleOCR and MangaOCR.
AMD (ROCm): Linux only. Works with PaddleOCR. ROCm setup requires a compatible AMD GPU and the ROCm software stack.

GPU mode is most beneficial in Auto OCR, where YomiNinja continuously captures and processes frames. On CPU-only, each capture takes 200–400ms. With GPU, this drops below 100ms — the overlay feels noticeably more responsive.

For manual hotkey use (one capture at a time), the CPU speed difference is rarely perceptible during normal gameplay pacing.

Try it yourself

Start with PaddleOCR and go from there

Download YomiNinja and test OCR accuracy on your specific game. Switch engines in one click from the settings panel.

Download Free How it works →

v0.9.3 · GPL-3.0 · GitHub

Which OCR Engine Should You Use in YomiNinja?

Five engines, one table

PaddleOCR — The recommended starting point

What PaddleOCR handles well

PaddleOCR's recognition speed is 200–400ms per capture

When PaddleOCR struggles

GPU acceleration with CUDA and ROCm

Specs

MangaOCR — For visual novels, manga, and stylized game fonts

MangaOCR recognizes fonts that PaddleOCR misreads

On-demand vs continuous mode

Windows and Linux only

Specs

Google Cloud Vision — Maximum accuracy, requires API key

Google Cloud Vision sends captures to Google's servers

The free tier covers moderate use

Setting up a Google Cloud Vision API key

Specs

Google Lens — Cloud accuracy without an API key

Google Lens accuracy is comparable to Cloud Vision for most content

Rate limiting is the primary concern

Specs

Apple Vision Framework — The native macOS choice

Apple Vision supports vertical Japanese text

Language support depends on macOS version

Not available on Windows or Linux

Specs

Which engine for your situation

GPU acceleration: CUDA and ROCm

Start with PaddleOCR and go from there