🎨 PixelRAG: The NEW Free AI That Can Read 8+ Million Docs

📺 Watch the full tutorial on YouTube

🎨 PixelRAG: The NEW Free AI That Can Read 8+ Million Docs

PixelRAG is a lightweight, zero-heavy-GPU Visual Retrieval-Augmented Generation (RAG) pipeline. While traditional RAG extracts plain text and loses tables, formatting, and structural context, PixelRAG renders documents into visual page tiles, retrieves the most relevant visual tile, and answers queries directly from the visual content.

This project is a minimal, production-ready implementation that queries the official live PixelRAG hosted API (which indexes 8.28M Wikipedia articles as screenshot tiles), downloads the matching high-resolution visual tile, and answers queries using a local Ollama LLM.

🚀 Quick Start (Windows Powershell)

Follow these exact steps to set up, install, run, and test the project.

📋 1. Prerequisites

Ensure you have Python 3.10+ installed and Ollama running. Download and verify the required Ollama models:

ollama pull gemma4:e2b

⚙️ 2. Installation

Navigate to the project directory, then create a virtual environment and install the dependencies:

python -m venv venv
.\venv\Scripts\Activate.ps1
pip install -r requirements.txt

📊 3. Run the Pipeline & Generate Report

Run the script to run the search and output the final execution log directly to ideal_output.md:

python app.py

📁 File Structure & Explanations

📝 README.md: Project setup instructions, technical specifications, and use cases.
⚙️ requirements.txt: Pinned list of minimal, fast-installing dependencies.
⚡ pixelrag_mvp.py: Core Visual RAG engine (37 lines). Queries the live Wikipedia PixelRAG API, downloads the tile, and runs terminal retrieval.
📊 app.py: Report generator (9 lines). Calls the core engine and generates the matching ideal_output.md report.
📄 ideal_output.md: Generated report showing the exact retrieval logs and answer.

💡 5 Real-World Use Cases

📊 Financial Auditing: Retrieving visual cells/tables from complex financial statements where text parsers merge columns and scramble tabular data.
🗂️ Slide Deck Content QA: Slicing presentation slides into quadrant tiles to query structural infographics, bullet lists, and visual diagrams.
📐 Engineering Blueprint Queries: Slicing high-resolution schematics into spatial grid tiles and retrieving specific sub-components based on user queries.
📸 Web Screenshot RAG: Visualizing complex dashboard layouts by rendering page snapshots, preserving visual context (headers, sidebars, charts) during retrieval.
🔬 Research Paper Navigation: Querying multi-column scientific publications (e.g. arXiv PDFs) where text flow is often interrupted by floating figures or footnotes.

🔮 5 Future Features

👁️ Native Visual Embedding Integration: Support for native vision-language embeddings like Qwen3-VL-Embedding to replace text OCR-fallback entirely.
🌐 Dynamic Playwright Web Capturing: Direct web page rendering to visual tiles using headless Playwright with custom viewports.
🧠 Ollama Multimodal Generation: Direct image-based question answering using Ollama vision models (e.g. minicpm-v4.6 or llava).
✂️ Overlapping Sliding-Window Tiling: Implementing overlapping grids to avoid cutting words or images at tile boundaries.
🔍 Hierarchical Visual Retrieval: A multi-stage search that retrieves the whole page first, then zooms in on the most relevant sub-tile.

🏷️ Keywords

PixelRAG Visual RAG Document Scanner Wikipedia Search Ollama Gemma 4 Multimodal AI RAG Pipeline Python RAG Local LLM PDF Tiling Computer Vision Information Retrieval AI Search Engine

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📺 Watch the full tutorial on YouTube

🎨 PixelRAG: The NEW Free AI That Can Read 8+ Million Docs

🚀 Quick Start (Windows Powershell)

📋 1. Prerequisites

⚙️ 2. Installation

📊 3. Run the Pipeline & Generate Report

📁 File Structure & Explanations

💡 5 Real-World Use Cases

🔮 5 Future Features

🏷️ Keywords

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
app.py		app.py
pixelrag_mvp.py		pixelrag_mvp.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

📺 Watch the full tutorial on YouTube

🎨 PixelRAG: The NEW Free AI That Can Read 8+ Million Docs

🚀 Quick Start (Windows Powershell)

📋 1. Prerequisites

⚙️ 2. Installation

📊 3. Run the Pipeline & Generate Report

📁 File Structure & Explanations

💡 5 Real-World Use Cases

🔮 5 Future Features

🏷️ Keywords

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages