AWS builds multimodal embeddings pipeline for aerial imagery search

According to an AWS machine learning blog post, the team worked with Vexcel, an aerial imagery provider that collects high-resolution data across 45+ countries and territories, to turn billions of aerial pixels into a natural-language-searchable knowledge base. The post describes an architecture built on Amazon Bedrock and Amazon OpenSearch Serverless, an evaluation methodology using ground truth, and four experiments comparing embedding models, fusion strategies, caption integration, and search methods. The blog reports that Amazon Nova Multimodal Embeddings delivered the highest F1 scores across both benchmark queries in the evaluation. AWS writes that the work evolved into a searchable imagery product.
What happened
According to an AWS machine learning blog post, AWS and Vexcel evaluated methods to make large aerial-imagery libraries searchable via natural language. The AWS blog reports Vexcel supplies orthomosaic imagery, oblique multi-angle imagery, and elevation models across 45+ countries and territories. The post documents an architecture implemented on Amazon Bedrock and Amazon OpenSearch Serverless, an evaluation methodology built on ground truth, and four controlled experiments that compared embedding models, fusion strategies, caption integration, and search approaches. The blog states that Amazon Nova Multimodal Embeddings produced the highest F1 scores across the two benchmark query sets they used, and that the work later evolved into a searchable imagery product, per the post.
Technical details
The AWS blog outlines a pipeline that combines multimodal embedding extraction, LLM captioning of image tiles, and vector search indexing in OpenSearch Serverless. The experiments measured retrieval performance across millions of tiles and tested fusion strategies for multi-view data, caption integration versus direct embedding-only search, and retrieval ranking heuristics. The post attributes top performance to Amazon Nova Multimodal Embeddings in their benchmarks; the blog provides evaluation design and metric choices for reproducibility.
Editorial analysis - technical context
Multimodal embeddings plus lightweight captioning and vector search reduce the need to train per-feature CV models, an industry pattern increasingly adopted for large unlabeled visual corpora. For practitioners, this workflow emphasizes scalable embedding pipelines, multi-view fusion design, and the tradeoffs between caption-enhanced semantic search and raw visual embeddings when ground truth is costly to produce.
Industry context
Companies with large geospatial repositories face similar operational scale and labeling constraints. Public reporting frames this AWS example as a template for combining managed model hosting (Amazon Bedrock), vector indices (OpenSearch Serverless), and bespoke evaluation to validate embedding choices at scale.
What to watch
For observers, relevant indicators include published evaluation assets or open benchmarks from the group, further details on fusion algorithms, latency and cost numbers for Serverless indexing at millions-of-tiles scale, and whether comparable embedding models replicate the reported F1 gains in independent tests.
Scoring Rationale
A well-documented AWS case study with Vexcel showing multimodal embeddings and vector search applied to geospatial retrieval at scale, with controlled benchmark experiments across embedding models. More substantive than a typical vendor guide because it includes evaluation methodology and performance comparisons, but remains a vendor-published application rather than an independent research contribution.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


