Case study · 2024

Retrieval infrastructure

Enterprise grade hybrid RAG gateway

Opinionated ingestion + retrieval façade unifying lexical + dense signals for marketplace scale (stub brief).

Rolling index rebuild P95 6m12s vs 54m baseline

Role
Principal architect
Stack
OpenSearch k-NN · Ray batch jobs · ONNX rerank · GPU pools

Context

Multimarkts-scale scenario: creatives expect millisecond-ranked inspiration grids but legacy rebuild jobs froze catalogs for almost an hour when refreshing embeddings nightly.

Approach

Introduced chunked Ray pipelines, ONNX CPU rerank to shed GPU bursts, staged shadow indices with traffic mirroring before cutover, structured observability dashboards for recall@k and drift.

Outcomes

  • Embedding refresh P95 6m12s vs legacy ≈54m wall clock (shadow index + parallelism).
  • Human eval recall@10 +0.11 vs dense-only retrieval on multilingual inventory.