Reference
Which OCR Engine Should You Use in YomiNinja?
PaddleOCR, MangaOCR, Google Cloud Vision, Google Lens, Apple Vision — full comparison for Japanese game text recognition.
At a glance
Five engines, one table
| Engine | Platform | Offline? | GPU needed? | Best for | Accuracy* |
|---|---|---|---|---|---|
| PaddleOCR Default | Win · Linux · Mac | Yes | No (optional) | Most games, standard fonts | |
| MangaOCR | Win · Linux | Yes | Optional (CUDA/ROCm) | Manga, VNs, stylized fonts | |
| Google Cloud Vision | All | No (API) | No | Difficult fonts, max accuracy | |
| Google Lens | All | No (API) | No | General fallback, no API key | |
| Apple Vision | macOS only | Yes | No | macOS users, vertical text |
* Accuracy on standard printed Japanese game text. Results vary significantly by font type, contrast, and resolution.
PaddleOCR — The recommended starting point
PaddleOCR is the default OCR engine in YomiNinja and the right choice for the majority of Japanese games. It's built on PaddlePaddle, Baidu's deep learning framework, and includes models specifically trained on Japanese text recognition.
What PaddleOCR handles well
Standard printed Japanese — the kind found in most JRPG dialogue boxes, menus, and story text. This includes both horizontal and vertical text layouts, furigana (though the furigana threshold setting may need adjustment), and mixed kanji-kana sentences at normal game font sizes.
PaddleOCR's recognition speed is 200–400ms per capture
On a modern CPU without GPU acceleration, PaddleOCR processes a typical 600×100px dialogue region in 200–400 milliseconds. With GPU acceleration enabled (CUDA on NVIDIA), this drops to under 100ms. For most gameplay pacing, the CPU speed is fast enough that delays are imperceptible.
When PaddleOCR struggles
Games with highly stylized, calligraphic, or hand-drawn fonts are the main weakness. If kanji in the game looks decorative rather than standard printed type — common in older games, doujin titles, or games with a specific art style — PaddleOCR will make more errors. In these cases, switch to MangaOCR.
Low-resolution text is another weakness. At very small font sizes or low DPI, even well-designed OCR models struggle with visually similar kanji pairs (土/士, 己/巳, 大/太). Increasing the game window resolution helps significantly.
GPU acceleration with CUDA and ROCm
If you have a compatible GPU, PaddleOCR can use it for faster inference. NVIDIA cards use CUDA (requires CUDA 11.8 or compatible driver). AMD cards use ROCm (Linux only; added in v0.9.1). GPU mode is particularly useful in Auto OCR where captures fire continuously — the lower per-capture latency keeps the overlay more responsive.
MangaOCR — For visual novels, manga, and stylized game fonts
MangaOCR is a specialized OCR model developed specifically for Japanese manga and comic text. It's trained on a massive dataset of scanned manga pages, which means it's seen thousands of examples of the hand-drawn, stylized, and non-standard fonts that appear in comics — and by extension, in many visual novels and artistically-styled games.
MangaOCR recognizes fonts that PaddleOCR misreads
The core advantage over PaddleOCR is tolerance for font variation. Where PaddleOCR expects clean, printed characters, MangaOCR was trained to handle characters drawn freehand, with deliberate imperfections and stylistic flourishes. A kanji that looks like an artist drew it rather than a typographer designed it is MangaOCR's home territory.
Practical examples where MangaOCR outperforms PaddleOCR:
- Visual novels with custom hand-lettered fonts
- Older console games (SFC, PS1-era) with low-resolution pixel fonts
- Doujin RPGs with non-standard font choices
- Games with text rendered at angles or with decorative effects
On-demand vs continuous mode
MangaOCR offers two operating modes. Continuous mode keeps the model loaded in memory and processes captures as they arrive — faster per capture, but uses more RAM. On-demand mode (added in v0.9.1) loads the model only when a capture is triggered, releasing memory between recognitions. On-demand mode is recommended for machines with less than 8GB RAM.
Windows and Linux only
MangaOCR is not available on macOS. macOS users should use Apple Vision Framework (for standard text) or Google Cloud Vision (for difficult fonts) instead.
Google Cloud Vision — Maximum accuracy, requires API key
Google Cloud Vision is the highest-accuracy option in YomiNinja, backed by Google's production OCR infrastructure. It handles unusual fonts, low contrast, and complex layouts more robustly than any local engine.
Google Cloud Vision sends captures to Google's servers
Unlike PaddleOCR and MangaOCR, Cloud Vision is not a local model. Each capture is sent as an image to Google's Vision API, which returns text recognition results. This means it requires an internet connection and has a small per-request latency (typically 300–800ms depending on network conditions).
The free tier covers moderate use
Google Cloud Vision offers 1,000 free API calls per month. For casual gaming sessions, this is often enough. Beyond the free tier, pricing is $1.50 per 1,000 requests. For heavy Auto OCR use, the cost can accumulate — but switching to a local engine for normal gameplay and only using Cloud Vision for difficult games is a practical approach.
Setting up a Google Cloud Vision API key
- Go to Google Cloud Console and create a project.
- Enable the Cloud Vision API for your project.
- Create an API key under APIs & Services → Credentials.
- Paste the API key into YomiNinja's OCR settings under Cloud Vision.
Google Lens — Cloud accuracy without an API key
Google Lens provides OCR through Google's mobile lens service, which is accessible without a Cloud Platform API key. This makes it a convenient fallback for users who want cloud-quality accuracy without the Cloud Console setup.
Google Lens accuracy is comparable to Cloud Vision for most content
For standard Japanese game text, Google Lens and Google Cloud Vision give similar results. The main differences are in edge cases: Cloud Vision has better handling of very low contrast text, and Cloud Vision's bounding box coordinates are more precise — which matters for character-level hover detection.
Rate limiting is the primary concern
Google Lens doesn't provide an official API — YomiNinja accesses it through a web interface. Google may rate-limit or block requests during heavy use, which can cause captures to fail silently. If you're experiencing dropouts, switch to Cloud Vision with an API key or fall back to a local engine.
Apple Vision Framework — The native macOS choice
Apple Vision Framework is the native OCR engine built into macOS, available to YomiNinja on Apple Silicon and Intel Macs. It runs entirely on-device using Apple's Neural Engine where available, providing fast, private text recognition without any external API or GPU setup.
Apple Vision supports vertical Japanese text
One of Apple Vision's standout features for Japanese content is explicit support for vertical text recognition. Many classic Japanese games, VNs, and some UI elements use vertical text layout (top-to-bottom, right-to-left). PaddleOCR handles this inconsistently; Apple Vision recognizes it natively.
Language support depends on macOS version
Apple Vision's language support for Japanese text varies between macOS versions. Newer macOS releases (Ventura and later) have significantly improved Japanese model quality. On older macOS versions, accuracy may be lower than PaddleOCR. Check the Apple documentation for your specific macOS version's supported languages.
Not available on Windows or Linux
Apple Vision Framework is an Apple OS API and cannot run outside macOS. Windows and Linux users should use PaddleOCR or MangaOCR.
Decision guide
Which engine for your situation
Performance
GPU acceleration: CUDA and ROCm
YomiNinja supports GPU-accelerated OCR for both NVIDIA and AMD hardware, added in v0.9.1:
- NVIDIA (CUDA): requires CUDA 11.8 or a compatible driver. Supported on Windows and Linux. Works with PaddleOCR and MangaOCR.
- AMD (ROCm): Linux only. Works with PaddleOCR. ROCm setup requires a compatible AMD GPU and the ROCm software stack.
GPU mode is most beneficial in Auto OCR, where YomiNinja continuously captures and processes frames. On CPU-only, each capture takes 200–400ms. With GPU, this drops below 100ms — the overlay feels noticeably more responsive.
For manual hotkey use (one capture at a time), the CPU speed difference is rarely perceptible during normal gameplay pacing.
Try it yourself
Start with PaddleOCR and go from there
Download YomiNinja and test OCR accuracy on your specific game. Switch engines in one click from the settings panel.