Case study · 2024
Retrieval infrastructure
Enterprise grade hybrid RAG gateway
Opinionated ingestion + retrieval façade unifying lexical + dense signals for marketplace scale (stub brief).
Rolling index rebuild P95 6m12s vs 54m baseline
- Role
- Principal architect
- Stack
- OpenSearch k-NN · Ray batch jobs · ONNX rerank · GPU pools
Context
Multimarkts-scale scenario: creatives expect millisecond-ranked inspiration grids but legacy rebuild jobs froze catalogs for almost an hour when refreshing embeddings nightly.
Approach
Introduced chunked Ray pipelines, ONNX CPU rerank to shed GPU bursts, staged shadow indices with traffic mirroring before cutover, structured observability dashboards for recall@k and drift.
Outcomes
- Embedding refresh P95 6m12s vs legacy ≈54m wall clock (shadow index + parallelism).
- Human eval recall@10 +0.11 vs dense-only retrieval on multilingual inventory.
- Hybrid RAG
- Observability