Tutor Intelligence launches Data Factory 1 in Watertown

Tutor Intelligence has opened Data Factory 1 (DF1), a Watertown, Massachusetts facility that the company and multiple outlets describe as the largest robot data factory in the United States, according to reporting by Manufacturing Dive and The Robot Report. Per Manufacturing Dive and Hoodline, DF1 houses 100 semi-humanoid robots named Sonny, occupies roughly 35,000 square feet in a renovated mill, and uses a mix of onsite staff and remote teleoperators to collect training data. The Robot Report says the robots train a vision-language-action model called Ti0, and that Tutor raised $34 million in Series A funding in December 2025. Industry context: this is an example of companies prioritizing real-world, human-supervised data collection at fleet scale rather than relying solely on simulation.
What happened
Tutor Intelligence launched Data Factory 1 (DF1), a Watertown, Massachusetts training facility that multiple outlets report holds 100 semi-humanoid robots, according to Manufacturing Dive and The Robot Report. Manufacturing Dive reported on a facility tour April 22 and quoted CEO Josh Gruenstein; The Robot Report published additional technical detail on May 5. Hoodline and local TV coverage report the site is about 35,000 square feet in a renovated mill and that the company describes DF1 as the largest robot data factory in the United States. Hoodline also cites Tutor's public materials claiming DF1 can produce roughly 10,000 hours of training data per week, per the company page referenced in that coverage.
Technical details
Per The Robot Report, the DF1 fleet is training a vision-language-action model the company calls Ti0. Manufacturing Dive described each robot as bimanual manipulators equipped with four cameras (head, chest, and two claws) mounted to stationary boxes; employees and remote teleoperators supervise and correct behaviors as robots practice object manipulation. The Robot Report and Manufacturing Dive both describe the operation as combining large-scale human supervision with onboard perception to capture real-world manipulation data rather than relying primarily on simulation.
Editorial analysis - technical context
Companies building physical-robot training pipelines often confront a trade-off between scale in simulation and fidelity in the real world. Industry observers note that real-world collection, especially with human-in-the-loop corrective labels, yields richer edge-case data for manipulation tasks but raises operational costs and engineering overhead for instrumentation, remote tooling, and data infrastructure. For modelers, feeding Ti0-style VLA systems with dense, time-series sensor logs plus corrective teleoperation traces can improve generalization on contact-rich tasks compared with purely synthetic data.
Context and significance
DF1 illustrates a broader shift toward fleet-scale, on-premise data infrastructure for physical AI, mirroring patterns seen in autonomous vehicle and robotics labs that prioritize closed-loop data collection. The Robot Report notes Tutor's background as an MIT CSAIL spinout and reports the company raised $34 million in Series A funding in December 2025, which contextualizes DF1 as a capital-intensive step toward production-grade training. Manufacturing Dive's reporting of CEO Josh Gruenstein's comment that DF1 is "an instrument of discovery" frames the facility as an R&D-first deployment rather than a turnkey automation service.
What to watch
observers should monitor:
- •whether DF1 produces reproducible gains in pick-and-place and kitting metrics when transferred to partner plants
- •the type and volume of labeled teleoperation traces that feed Ti0
- •how Tutor integrates cloud training partners and accelerators. Public coverage cites partnerships or toolchains including AWS for cloud compute and NVIDIA CUDA for modeling, per The Robot Report. Also watch adoption signals from logistics and e-commerce operators in pilot disclosures or case studies reported by trade press
For practitioners
For practitioners, the DF1 model underscores that high-quality manipulation datasets currently require substantial physical infrastructure and human supervision. Teams attempting similar programs should budget for instrumentation (multi-camera rigs, synchronized loggers), remote teleoperation tooling, and data pipelines that turn per-episode corrective traces into training signals compatible with VLA architectures like Ti0.
All quoted statements and the facility description derive from reporting by Manufacturing Dive, The Robot Report, and local press coverage; where the company's own materials are cited (for the 10,000 hours claim and square footage), those figures appear in company posts and local coverage referenced by Hoodline and CBS affiliates.
Scoring Rationale
Notable because DF1 demonstrates a capital-intensive, real-world data collection model for robot manipulation at fleet scale; relevant to practitioners building physical-AI datasets. It is important but not a frontier-model release.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
