Amazon Nova Forge Demonstrates 17% F1 Improvement

The AWS China Applied Science team evaluated Amazon Nova Forge by fine-tuning models on a 15,372-sample Voice of Customer dataset with a 1,420-category four-level taxonomy. Their data-mixing supervised fine-tuning produced a 17% F1 improvement on the VOC task while preserving near-baseline MMLU scores and instruction-following abilities, reducing catastrophic forgetting.
Scoring Rationale
Strong official evaluation demonstrating domain gains and capability retention; limited novelty beyond service-specific comparative implementation.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

