Startups Pay Workers to Film Household Chores

Reporting by The Verge, MIT Technology Review, the Los Angeles Times, Wired, NPR/WMUK and others documents a rising gig economy trade: startups are paying people to record egocentric video of household tasks so robotics companies can train models. MIT Technology Review reports that Micro1, a US firm, has hired thousands of contract workers across more than 50 countries, and paid examples include $15 per hour in Nigeria (MIT Technology Review) and $80 for two hours in Los Angeles (Los Angeles Times). Reporting also describes companies collating those clips, running hand-tracking and labeling pipelines, and selling packaged datasets to robotics developers (WMUK, MIT Technology Review).
What happened
Reporting across The Verge, MIT Technology Review, the Los Angeles Times, Wired, and NPR/WMUK documents a growing market in so-called egocentric data for robotics. Per MIT Technology Review, Micro1, a US data-collection firm, has contracted thousands of workers in more than 50 countries to record everyday tasks such as folding laundry, washing dishes, and cooking. The Los Angeles Times reports that local crews distributing phone head-mounts in L.A. paid $80 for two hours of footage to at least one participant. MIT Technology Review cites a pay example of $15 per hour for a worker in Nigeria. WMUK (NPR) and Wired describe journalists and gig workers uploading short first-person videos that are then cleaned, labeled, and sold or licensed to robotics developers.
Technical details
Editorial analysis - technical context: Industry reporting explains why egocentric video matters to robotics research. First-person clips show hand-object interactions, fine-grained grasps, occlusions, and real-world lighting and friction, data types that are hard to obtain at scale from existing public corpora. Sources describe downstream processing steps: automated hand-tracking, segmentation, and human annotation are applied to raw footage before it is packaged as training data for manipulation and imitation-learning pipelines (WMUK, MIT Technology Review, The Verge). For practitioners, this pattern highlights the practical tradeoffs between simulated or lab-collected motion-capture data and noisy, diverse in-the-wild video that contains realistic variability.
Context and significance
Multiple outlets frame the trend as a response to a shortage of scalable, real-world interaction data for humanoid and manipulation models. MIT Technology Review and The Verge link demand to efforts by robotics labs and firms, and MIT Technology Review names companies such as Tesla, Figure AI, and Agility Robotics as actors racing to improve physical-world capabilities. Reporting also flags socioeconomic and ethical issues: workers in lower-income countries are being recruited for remote recording gigs, often asked to mount phones on their heads or hands, and some interviewees requested anonymity because they were not authorized to speak (MIT Technology Review). The Los Angeles Times and Wired raise privacy and consent concerns, noting the intimate nature of household footage and the uneven compensation described across reports.
What to watch
Editorial analysis: Observers should follow three categories of indicators. First, dataset provenance and licensing practices as companies package and resell labeled egocentric datasets, including whether terms require explicit informed consent and how personally identifying information is handled. Second, regulatory and platform responses, such as data-protection inquiries or marketplace rule changes, given the privacy focus in reporting. Third, technical performance signals: improvements in robotic manipulation benchmarks trained on in-the-wild egocentric data versus lab-captured or simulated datasets, which will show whether this data modality materially accelerates capabilities.
Practical implications for practitioners
For ML engineers and dataset managers, the rise of paid egocentric-data collection raises operational questions about quality control, annotation throughput, and distributional biases. Datasets aggregated from many countries can increase diversity of environments and hand morphologies, but reporting warns about uneven instruction-following by recorders and variable camera alignment, which increases preprocessing and annotation burdens (MIT Technology Review, Wired). For teams building manipulation models, the tradeoff is between noisy, diverse footage that can improve generalization and the higher annotation cost required to produce usable labels.
Quoted perspectives and open questions
Reporting includes first-person accounts and pay examples, but some articles did not include on-the-record statements from firms involved in the collection and sale of these videos. That gap leaves unresolved questions about contracts, data resale terms, and downstream buyers for many datasets described in the coverage.
Scoring Rationale
This trend matters to practitioners because it supplies a scarce training signal for manipulation and humanoid models while introducing operational, ethical, and provenance challenges. Multiple major outlets reporting and named company examples raise the story above a niche development.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


