Case study · 2021
NLP · Editor workflow
Text summarization with transformers
Summarization assistant for long-form Spanish newsroom copy with human-in-the-loop review.
Latency P95 −62% vs. monolithic baseline · human approval 4.3/5
- Role
- Lead builder
- Stack
- PyTorch · HuggingFace · FastAPI · bilingual tokenization
Context
Editors needed faithful abstractive summaries without hallucinated quotes. The newsroom could not afford a black-box API with zero provenance, and latency during peak hours blew past their CMS timeout.
Approach
Fine-tuned transformer stack with constrained decoding, deterministic seeding, extractive pre-filter, and a FastAPI service that streams partials for progressive review. Added lightweight human feedback capture to continuously recalibrate thresholds.
Outcomes
- Latency P95 −62% against the previous monolithic service by batching token work and caching attention-friendly windows.
- Editorial uplift: qualitative score 4.3 / 5 in pilot with three national desks.
- Hallucination incidents ↓ through quote-locking on named entities detected in pre-filter stage.
Links
- Summarization
- Transformers