Jane Goodall Institute Digitizes Gombe Records with AI
The Jane Goodall Institute (JGI) is building an AI-powered platform to digitize and search decades of Gombe field records. Per a JGI press release, JGI USA won a 2025 AWS Imagine Grant in the Pathfinder, Generative AI category, receiving up to $200,000 in unrestricted funding, up to $100,000 in AWS promotional credits, and support from the AWS Generative AI Innovation Center. Business Insider reports JGI began using AI in 2025 to accelerate digitization of roughly 500,000 pages of handwritten notes, and JGI says the archive spans 65 years of primate research. Aibusiness reports AWS has also committed $1 million from its Generative AI Innovation Fund toward digitization. Sources describe the effort as the Gombe AI Research Platform; goals include identifying individual chimpanzees, extracting behavioral signals from video, and converting multilingual handwritten notes into searchable data for researchers.
What happened
Per the Jane Goodall Institute USA press release, JGI USA was named a winner of the 2025 AWS Imagine Grant in the Pathfinder, Generative AI category, and will work with AWS to transform more than 65 years of primate-behavior data from the Gombe Stream Research Center into an AI-powered research platform. The release states the grant provides up to $200,000 in unrestricted funding, up to $100,000 in AWS promotional credits, and implementation support from the AWS Generative AI Innovation Center. Business Insider reports JGI began using AI in 2025 to accelerate digitization of roughly 500,000 pages of handwritten field notes, and aibusiness reports AWS has committed $1 million from its Generative AI Innovation Fund toward broader digitization.
Technical details
Aibusiness reports the proof-of-concept work uses multimodal large language models and embedding models on AWS and Amazon SageMaker, combined with prompt engineering, to convert handwritten notes, film, photos, and audio into structured, searchable records. The effort, developed as the Gombe AI Research Platform, aims to identify individual chimpanzees in video, extract behavioral annotations, and index multilingual notes recorded in English and Swahili, per Business Insider and an AWS podcast.
Editorial analysis - technical context
Organizations digitizing long-term ecological datasets often pair handwriting-tuned optical character recognition with multimodal embeddings to link text, imagery, and video. This typically requires substantial human review for low-resource languages, high inter-observer variability, and domain-specific shorthand in field notebooks. Industry-pattern observations: the practical workstreams for similar systems are dataset curation and labeling, handwriting-adapted OCR, entity resolution for individual animals, and retrieval systems built on embeddings.
Context and significance
Longitudinal observational datasets like the Gombe archive are scientifically valuable because they enable cross-generational analyses of behavior, social structure, and ecological change. Making them discoverable with generative and retrieval-augmented techniques can accelerate hypothesis generation and meta-analysis. The project also sits within a broader trend of philanthropic and hyperscaler support for nonprofit AI work, such as AWS's Imagine Grants and Generative AI Innovation Fund.
Reporting and quotes
Business Insider quoted JGI's vice president of conservation science, Lilian Pintea, framing AI as a continuation of the organization's long history of adopting new technology. Aibusiness attributed the description of multimodal LLM and embedding work on AWS and Amazon SageMaker to Taimur Rashid.
Limitations of reporting
Coverage provides grant amounts, high-level technical aims, and scope estimates, but does not publish detailed performance metrics, dataset-release plans, or reproducible model code. Observers should watch for published methods on handwriting OCR accuracy and identity-matching precision, data-governance and provenance practices, and community and ethics commitments that will affect reproducibility and adoption.
Key Points
- 1Generative AI is being applied to a 65-year conservation archive, turning analog field notes and footage into searchable, multi-decade datasets.
- 2Editorial analysis: cloud-backed multimodal pipelines combining handwriting OCR and embeddings are the practical path to linking text, image, and video in long-term studies.
- 3Editorial analysis: grant-funded nonprofit-hyperscaler collaborations accelerate digitization but raise scrutiny on provenance, governance, and reproducibility.
Scoring Rationale
This is a substantive, well-sourced application of multimodal generative AI to an exceptionally long-term ecological archive, with concrete cloud grant support and clear governance and reproducibility implications for practitioners. It is a domain-specific deployment rather than a frontier-model release, placing it solidly in the mid range.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems