Sun Finance Automates ID Extraction and Fraud Detection

According to an AWS blog post co-authored with Sun Finance staff, Sun Finance built an AI-powered identity verification pipeline using Amazon Bedrock, Amazon Textract, and Amazon Rekognition. The AWS post reports the solution improved extraction accuracy from 79.7% to 90.8%, reduced per-document costs by 91%, and cut processing time from up to 20 hours to under 5 seconds. The engagement included a 32-day AWS Generative AI Innovation Center phase and technical handover, with Sun Finance moving the solution to production and going live on January 22, 2026, per the blog. The post also describes a serverless fraud detection component built using vector similarity search and notes the combined approach of specialized OCR plus LLM structuring outperformed either tool alone.
What happened
According to an AWS blog post co-authored with Sun Finance staff, Sun Finance built an identity verification (IDV) pipeline that combines `Amazon Textract`, `Amazon Rekognition`, and `Amazon Bedrock`. The post reports the deployment improved extraction accuracy from 79.7% to 90.8%, reduced per-document costs by 91%, and shortened processing time from as long as 20 hours to under 5 seconds. The blog states the AWS Generative AI Innovation Center engagement ran 32 days from kickoff through final presentation, followed by a 26-day technical handover and a 35-business-day push to production, with the system going live on January 22, 2026.
Technical details
According to the AWS post, the solution pairs specialized OCR for structured fields with LLM-based structuring to extract and normalize data, and it implements serverless fraud detection using vector similarity search to surface near-duplicate or suspicious identity artifacts. The blog frames the architecture as serverless and cloud-native and highlights that the hybrid approach outperformed using OCR or LLMs alone for Sun Finance's dataset and regional document complexity.
Editorial analysis
Industry-pattern observations: Combining high-accuracy OCR with LLM-based post-processing is a common pattern for large-scale document automation because OCR preserves layout and deterministic extraction while LLMs help normalize, infer, and map noisy text to canonical records. Similarly, using vector similarity search as part of fraud controls is increasingly common for detecting near-duplicates and identity reuse at scale.
Context and significance
Editorial analysis: The reported metric improvements-double-digit accuracy gains, multi-hour to sub-second latency reductions, and a 91% per-document cost cut-illustrate why lenders and fintechs are adopting generative-AI-assisted pipelines for IDV. For practitioners, the case underscores practical priorities: orchestration between specialized extractors and LLMs, production-grade latency and cost optimization, and integration of similarity search for fraud signals.
What to watch
For practitioners: monitor manual-review rate, end-to-end false positive and false negative rates, model drift on regional document types, latency and cost under real traffic, and auditability for compliance. Observers should also watch how vendors surface tooling for grounding LLM outputs to deterministic OCR fields and how vector search scales with historical identity stores.
Scoring Rationale
This is a practical, production-grade case study showing measurable gains from combining OCR, LLMs, and vector search for IDV. It is useful for practitioners designing ID pipelines but not a frontier-model or platform breakthrough.
Practice with real FinTech & Trading data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all FinTech & Trading problems


