How We Built a Receipt Scanner That Actually Understands Your Groceries

At a Glance

Getting text off a receipt is a solved problem. Understanding what that text means — pairing prices to products, attributing discounts, parsing weights — is where the real engineering lives

Our pipeline separates AI perception (Vision API + Gemini classify lines) from deterministic logic (a custom semantic engine pairs prices, merges discounts, and resolves quantities) — AI does the seeing, code does the thinking

Store-specific Receipt Identities treat “MILK 2% GAL” at Walmart and “MILK 2% GAL” at Whole Foods as different products, because they are — different prices, different histories

A Matching Memory system learns from user confirmations, so the same item never needs re-matching — and fuzzy matching handles the OCR variations that make exact matching fail

We built this for Zupply, our smart grocery app. This post is the full blueprint so you can build one too

Every line on a grocery receipt — the abbreviated product names, the cryptic discount codes, the tax indicators — parsed, matched, and stored as structured data. No manual entry. No spreadsheets.

At Zovia, we build Zupply — a smart grocery app that turns paper receipts into shopping intelligence. Scan a receipt, and within seconds you have structured line items with prices, quantities, discounts, store identification, and purchase history. Scan enough receipts and you can see price trends, spending patterns, and exactly how much you’re saving at each store.

The technical challenge isn’t the scanning. It’s everything that happens after.

This post breaks down the full pipeline — the architecture, the semantic engine, the matching system, and the design decisions — so you can build something similar for your own domain.

The Problem: OCR Is the Easy Part

Google Cloud Vision will give you the text on a receipt with high accuracy. That’s a solved problem. But here’s what it gives you:

PUBLIX SUPER MARKETS INC
STORE #1234
MILK 2% GAL           3.99
CKHN BRST BNLS        7.49 F
  SAVE 1.50
  YOU PAID 5.99
GRT VLU EGGS LG       2.79 F
3 @ 0.93
BAN ORGANIC           1.89 F
  1.23 lb @ 1.49/lb
SUBTOTAL              14.66
TAX                    0.00
TOTAL                 14.66

Now answer these questions:

Which prices belong to which products?
Is SAVE 1.50 a discount on the chicken or the milk?
What does 3 @ 0.93 mean for the eggs?
Is 1.23 lb @ 1.49/lb a weight or a price?
What do the F suffixes mean?
Is CKHN BRST BNLS a product name or noise?

Every grocery chain formats differently. Publix abbreviates aggressively. Walmart uses item codes. Costco uses two-column layouts. European receipts use comma decimals. Some stores put the price on the same line as the product, others on the next line.

A receipt scanner that just extracts text is a camera with extra steps. The value is in the understanding.

The Architecture: AI as Eyes, Code as Brain

Early in development, we faced a fork: let the AI do everything (send the receipt to an LLM and ask it to extract structured data), or split the work between AI and deterministic code.

We split it. This turned out to be the most important architectural decision in the pipeline.

Receipt Image
    │
    ▼
┌─────────────┐
│ Vision API  │  ← OCR: image → raw text + key-value pairs
└──────┬──────┘
       │
       ▼
┌─────────────┐
│   Gemini    │  ← Classification: each line → type (product, price, discount, weight, etc.)
└──────┬──────┘
       │
       ▼
┌─────────────────────┐
│  Semantic Engine    │  ← Logic: price pairing, discount merge, quantity parsing, totals
│  (deterministic)    │
└──────┬──────────────┘
       │
       ▼
┌─────────────────────┐
│  Matching + Storage │  ← Identity resolution, memory lookup, database save
└─────────────────────┘

Vision API extracts raw text and key-value pairs (store name, date, totals). Gemini classifies each line — is this a product, a price, a discount, a weight entry, a subtotal? It acts as “eyes only,” labeling what each line is without trying to figure out what it means.

The semantic engine — a custom, deterministic JavaScript module — takes those classified lines and applies the business logic: pairing prices to products, attributing discounts to the right items, parsing quantities and weights, detecting tax indicators, and handling edge cases like two-column OCR interleaving.

Why Not Let the AI Do It All?

Three reasons.

Testability. When the engine has a bug in discount attribution, we write a test with mock classified lines, fix the logic, and verify. No prompt tuning. No hoping the model generalizes from the fix. Deterministic code is testable code.

Maintainability. When a new receipt format appears (say, a store that puts discounts two lines below the product instead of one), we add a rule to the engine. We don’t retrain, re-prompt, or worry about regressions on other formats. The fix is surgical and isolated.

Speed. The engine processes classified lines in milliseconds. An LLM round-trip for every line item would add seconds and cost. The AI runs once to classify; the engine runs once to understand. Two calls total, not dozens.

The principle: use AI where pattern recognition is hard (reading messy text, classifying ambiguous lines), use deterministic code where the rules are known (price always follows product, discount always follows the item it applies to).

The Semantic Engine: Turning Lines Into Items

The engine is where the real complexity lives. It receives an array of classified lines from Gemini — each tagged as product, price, discount, weight, subtotal, tax, total, or other — and transforms them into structured line items.

Price Pairing

The core challenge: matching each product to its price.

Same-line pairing is straightforward — MILK 2% GAL 3.99 has the product and price together. The engine extracts the trailing number as the price and everything before it as the product name.

Cross-line pairing is trickier. When a product line and a price line arrive separately, the engine uses a simple but reliable rule: the earliest unpaired product gets the next price. This works because receipts are sequential — products and their prices appear in order, even when split across lines.

Line 1: "CKHN BRST BNLS" → type: product (unpaired)
Line 2: "7.49 F"          → type: price → pairs with line 1

Quantity Extraction

The engine recognizes five quantity patterns, checked in priority order:

Pattern	Example	Result
Parenthesized	`(4) PAPER TOWELS`	qty: 4
QTY keyword	`4 QTY PAPER TOWELS`	qty: 4
Item code prefix	`4 004900012345 TOWELS`	qty: 4, itemCode: 004900012345
Leading number	`4 PAPER TOWELS`	qty: 4 (if 1-99)
Multi-buy	`3 @ $1.99`	qty: 3, unitPrice: 1.99

The item code pattern is critical for stores like Walmart that print UPC codes on receipts. Without it, 1 004900012345 BOUNTY would be parsed as quantity 4900012345.

Discount Attribution by Position

This is a subtle but important design choice. When the engine encounters a discount line, it needs to know which product it applies to.

Some approaches try to match discounts by product name — “MILK COUPON” applies to “MILK.” But receipt discount text is often generic: “SAVE 1.50”, “MFR COUPON”, “YOU SAVED.” No product reference at all.

Our engine attributes discounts by position: a discount applies to the nearest preceding product. This is how receipts actually work — the discount prints directly after the item it discounts.

CKHN BRST BNLS    7.49     ← product
  SAVE 1.50                 ← discount → applies to chicken (preceding product)
  YOU PAID 5.99             ← net price confirmation

The engine also infers the discount type from keywords in the text:

Keyword	Discount Type
`coupon`	coupon
`member`, `club`	member
`sale`, `save`	sale
`bogo`, `promo`	promotion
(none)	instant

The result: originalPrice: 7.49, price: 5.99, discount: 1.50, discountType: "sale".

Weight and Unit Price Parsing

Produce and deli items are sold by weight. The engine extracts weight entries and associates them with the preceding product:

BAN ORGANIC          1.89
  1.23 lb @ 1.49/lb

Becomes: name: "BAN ORGANIC", price: 1.89, weight: 1.23, weightUnit: "lb", unitPrice: 1.49.

The parser handles multiple formats: 2.40 lb, 2.40/lb, 2.40kg, and rate prices like $1.49/lb or 3 @ $1.99/pc.

Tax Indicator Detection

Many grocery chains encode tax status with a trailing letter after the price:

Suffix	Meaning
`F`	Food (tax exempt)
`T`	Taxable
`N`	Non-taxable
`X`	Tax exempt

The engine strips these before parsing the price and preserves them as metadata. This matters for tax calculation accuracy — if a user’s receipt shows $0.00 tax but the line items sum to a taxable amount, the tax indicators explain why.

Number Normalization

A receipt from a European store might print 1.234,56 (one thousand two hundred thirty-four and 56 cents). An American receipt prints 1,234.56. The engine detects the format by analyzing decimal and comma positions and normalizes everything to standard floating-point numbers before any arithmetic.

Store Identification: A 4-Tier Confidence System

Knowing which store a receipt came from is essential — it determines price context, enables store-specific product matching, and powers cross-store price comparisons. But store identification from a receipt is unreliable if you rely on a single signal.

We use a 4-tier system, each with an assigned confidence level:

Tier	Signal	Confidence	Example
A	Receipt address	Very High	”1234 Main St, Anytown FL” → geocode → Publix #1234
B	Receipt company name	High	”PUBLIX SUPER MARKETS” → Places search → nearest Publix
C	User device location	Medium	GPS → nearby grocery stores → best match
D	User confirmation	Confirmed	User picks from a list

The system tries each tier in order and stops at the first success. If it reaches Tier C (medium confidence), the response includes alternate nearby stores so the user can quick-swap if the guess was wrong.

Why not just use Tier A every time? Because many receipts don’t print a full address. Some print only the store number. Some print nothing identifiable at all. The tiered approach degrades gracefully — you always get a store identification, with a confidence score that tells the user how much to trust it.

Matching Memory: The System That Learns

Raw receipt text is messy. CKHN BRST BNLS is chicken breast boneless. GRT VLU EGGS LG is Great Value large eggs. Humans can decode these abbreviations; software needs help.

Store-Specific Receipt Identities

The foundational concept is the Receipt Identity — a canonical representation of a product at a specific store. The same product at different stores is a different Receipt Identity, because the price, the abbreviation, and the purchase context are different.

Why store-specific? Because “MILK 2% GAL” at Walmart costs $2.99 and “MILK 2% GAL” at Whole Foods costs $4.29. Treating them as the same product would make price history meaningless. Every Receipt Identity carries its own price history — monthly averages, min/max, sample counts — scoped to the store where it was observed.

The 4-Tier Matching Lookup

When a new line item arrives, the system tries to match it against known products:

Tier	Source	Match Type	Signal
1	User’s personal memory	Exact match	Highest — user confirmed this before
2	User’s personal memory	Fuzzy match (≥ 85%)	High — likely OCR variation of a confirmed item
3	Global identities at this store	Exact match	Medium — other users matched this text
4	Global identities at this store	Fuzzy match (≥ 90%)	Lower — cross-user fuzzy match, higher threshold

Personal memory has a lower fuzzy threshold (85%) than global (90%) because the user has already confirmed the base mapping — OCR variations of their own past items are more trustworthy than cross-user matches.

Fuzzy Matching: Handling OCR Variations

Exact matching breaks on OCR variations. The same product scanned twice might appear as:

MILK 2% GAL
MLK 2% GAL
MILK 2 PERCENT GALLON

A fuzzy matching function scores the similarity between two normalized strings. At 85%+ similarity from a user’s own history, we accept the match. This handles the small OCR errors and abbreviation variations that make exact matching insufficient — without the maintenance burden of a synonym dictionary.

How Memory Grows

When a user confirms a suggested match (“Yes, CKHN BRST BNLS is chicken breast boneless”), that mapping is stored in their personal Matching Memory. Next time they scan a receipt from the same store with the same text, Tier 1 matches instantly — no AI, no fuzzy matching, no user interaction.

Over time, each user’s memory becomes a personalized dictionary of their grocery vocabulary. Heavy Publix shoppers build a Publix dictionary. Costco regulars build a Costco dictionary. The system gets faster and more accurate with every scan.

The Credit Safety Problem

Receipt scanning costs money — Vision API and Gemini calls aren’t free. We give free users a limited number of scans per month. This creates a critical UX constraint: never charge a user for a failed scan.

If the image is blurry and Vision can’t extract text — no credit deducted. If Vision succeeds but Gemini can’t classify any line items — no credit deducted. If the network drops during the database save — no credit deducted.

The implementation is simple but strict: credits are only deducted after the receipt is successfully saved to the database. Every step before that is a free trial. The pipeline fails fast at each stage:

1. Check credits     → Insufficient? Stop. No charge.
2. Vision OCR        → No text found? Stop. No charge.
3. Gemini classify   → No line items? Stop. No charge.
4. Engine process    → Malformed data? Stop. No charge.
5. Save to database  → Success? NOW deduct credit.

This sounds obvious, but the naive implementation — deduct first, refund on failure — is fragile. Network timeouts, partial failures, and race conditions can leave users charged for nothing. Deduct-last is simpler and correct.

Duplicate Detection

Users accidentally scan the same receipt twice. Maybe they tapped “scan” and weren’t sure it worked. Maybe they’re cleaning out a wallet and forget which ones they’ve already scanned.

The system fingerprints each receipt as storeName + total + itemCount and checks for matches within a 30-day window. If a duplicate is found, the user sees the existing receipt and can confirm whether the re-scan was intentional.

The fingerprint is deliberately coarse — it catches the common case (same receipt scanned twice) without false positives on legitimate similar receipts (two different Publix trips with similar totals but different items would need matching item counts too).

Results

The pipeline processes receipts from major US grocery chains:

Average scan time:    30-60 seconds (end to end)
Vision API latency:   3-5 seconds
Gemini classification: 2-4 seconds
Engine + matching:    5-10 seconds

The semantic engine handles same-line and cross-line price pairing, five quantity formats, position-based discount attribution, weight and unit price parsing, tax indicator detection, and number format normalization. The matching system learns from every user confirmation, building store-specific product dictionaries that improve with each scan.

Design Decisions Worth Stealing

These are the choices we’d make again. Consider them for your own receipt scanning pipeline — or any document understanding system.

AI for perception, deterministic code for logic. This is the big one. LLMs are incredible at reading messy text and classifying ambiguous content. They’re terrible at consistent arithmetic and rule application. Split the work accordingly. Your AI extracts and classifies; your code applies business rules. You get the best of both worlds: AI flexibility for the hard perception problems, deterministic reliability for the logic.

Position-based discount attribution. Don’t try to match discounts to products by name. Receipt discount text is too generic. Use position — discounts follow the items they apply to. This is how the physical receipt is laid out, and it works across every format we’ve encountered.

Store-scoped product identities. If your domain involves the same item at different venues (grocery stores, pharmacies, restaurants), scope your product identities to the venue. “Same name, different context” is a different product. This unlocks meaningful price comparisons and accurate history.

Fuzzy matching with confidence tiers. Exact matching is too brittle for OCR output. Fuzzy matching with no guardrails produces false positives. The solution is tiered thresholds — looser for high-confidence sources (user’s own history), stricter for lower-confidence sources (global cross-user data).

Deduct-last for metered features. If your app charges for API calls, deduct credits at the end of the pipeline, not the beginning. Fail-fast + deduct-last = users never pay for failures. The trust this builds is worth the engineering simplicity it costs.

Fingerprint-based duplicate detection. Don’t try to compare full receipt content for duplicates — it’s expensive and fragile. A coarse fingerprint (store + total + item count) catches 95%+ of accidental re-scans with near-zero false positives.

What’s Next

The pipeline continues to evolve:

Cross-store price comparison. Store-specific Receipt Identities enable “you paid $3.99 for milk at Publix, but it’s $2.99 at Walmart this month” — the data is already there.
Category intelligence. With enough scans, spending patterns emerge by category — dairy, produce, snacks — without the user manually categorizing anything.
Receipt format expansion. Every new grocery chain is a set of new abbreviation patterns and layout quirks. The engine’s rule-based architecture makes these incremental additions, not rewrites.

The hard lesson from building this: the gap between “we can read the receipt” and “we understand the receipt” is where all the engineering lives. OCR is table stakes. The semantic engine, the matching memory, and the store-specific identities — that’s where a receipt scanner becomes a product.

Built at Zovia Studio. We ship apps that families use every day, and we take the “every day” part seriously.