Case study · 2021

NLP · Editor workflow

Text summarization with transformers

Summarization assistant for long-form Spanish newsroom copy with human-in-the-loop review.

Latency P95 −62% vs. monolithic baseline · human approval 4.3/5

Role
Lead builder
Stack
PyTorch · HuggingFace · FastAPI · bilingual tokenization

Context

Editors needed faithful abstractive summaries without hallucinated quotes. The newsroom could not afford a black-box API with zero provenance, and latency during peak hours blew past their CMS timeout.

Approach

Fine-tuned transformer stack with constrained decoding, deterministic seeding, extractive pre-filter, and a FastAPI service that streams partials for progressive review. Added lightweight human feedback capture to continuously recalibrate thresholds.

Outcomes

  • Latency P95 −62% against the previous monolithic service by batching token work and caching attention-friendly windows.
  • Editorial uplift: qualitative score 4.3 / 5 in pilot with three national desks.
  • Hallucination incidents ↓ through quote-locking on named entities detected in pre-filter stage.

Links