362 Australian children found in one AI dataset
Human Rights Watch examined less than 0.0001% of a single training dataset. They found children identified by name, age, and school. The dataset builders blamed the parents.
What Human Rights Watch found
July 2024. Researchers at Human Rights Watch examined LAION-5B, one of the largest open AI training datasets in existence: 5.85 billion image-caption pairs scraped from the public internet. They analysed less than 0.0001% of the data.
In that tiny sample, they found 362 identifiable Australian children and 358 identifiable Brazilian children. Identifiable means named, aged, located. Real children in a dataset used to train AI systems worldwide.
What the photos showed
- Newborns with full names and birth dates
- Preschoolers with full names, ages, and school names
- Primary school students at school events
- Children in swimwear
- First Nations children from multiple communities
One specific case documented by Human Rights Watch: two boys, ages 3 and 4, with their full names, ages, and the name of their Perth preschool all present in the dataset caption. These children were individually identifiable and locatable.
Where the photos came from
School uploads. Personal blogs. Photo-sharing sites. Professional photography websites. YouTube videos marked as "unlisted" (not public, not private, but still scraped). Some of the original photos had been removed from public search engines before being scraped. It did not matter. The scraper captured them anyway.
Permanent
Once images are processed into AI model weights during training, they cannot be removed. Human Rights Watch stated it directly:
"AI models that were trained on the earlier dataset cannot forget the now-removed images." Human Rights Watch, July 2024
Deleting the original source photo changes nothing for models already trained. LAION took the dataset offline temporarily in late 2023 after child sexual abuse material was discovered in it. It was republished. The models trained on the earlier version still contain what they learned. There is no undo.
LAION's response
LAION, the organisation that created the dataset, responded to the Human Rights Watch findings by blaming parents:
"Parents should behave responsibly and not post private sensitive data related to their children on [the] public internet." LAION, response to Human Rights Watch findings, 2024
Schools post these photos. Departments of education require the accounts to be public. Parents are given a binary consent form that mentions none of this. And the dataset builders blame the parents.
The academic evidence
Researchers at the University of Utah and Carnegie Mellon University conducted a large-scale analysis of school Facebook Pages. Their findings:
- 18 million school Facebook posts analysed
- 4.9 million identifiable student images found
- 726,000 included students' first and last names alongside their location
Their conclusion: schools represent "the largest existing collection of publicly accessible, identifiable images of minors."
Not social media companies. Not photo-sharing platforms. Schools. Because school Facebook Pages are public by policy, post frequently, use high-quality photos, and routinely identify children by name.
UN Convention violations
A 2024 peer-reviewed study found that these practices violate three articles of the UN Convention on the Rights of the Child:
- Article 3: Best interests of the child must be a primary consideration in all actions
- Article 12: Children have the right to express views on matters affecting them
- Article 16: Children have the right to privacy
Australia ratified this Convention in 1990. The children in these datasets had no say in the collection of their images, no knowledge it was occurring, and no mechanism to object.
Recent developments
Brazil enacts landmark child data protection legislation, the first country to specifically address children's data in AI training.
Amazon discovers child sexual abuse material in AI training data used for its own models.
Tennessee teenagers sue xAI over AI-generated exploitative images created from school photos.
HRW recommendations
Human Rights Watch called on governments to:
- Adopt child data protection laws that specifically address AI training
- Prohibit the scraping of children's personal data for AI development
- Ban the non-consensual digital replication of minors' likenesses
- Establish accessible justice mechanisms for children harmed by these practices
Australia's Children's Online Privacy Code is due December 2026. Whether it addresses these recommendations depends on what happens during consultation, which closes June 2026.
- Australia: Children's Photos Used to Train AI – Human Rights Watch, July 2024
- LAION-5B dataset documentation – LAION
- University of Utah and Carnegie Mellon University study on school Facebook Pages
- UN Convention on the Rights of the Child – OHCHR
- Protecting Children's Rights in the Age of Generative AI – Human Rights Watch
- Amazon discovers CSAM in AI training data – Reuters, January 2026
- Tennessee teenagers sue xAI over AI-generated images – TechCrunch, March 2026
← Deepfakes · All evidence · Next: No Consent →
Follow the investigation
Get notified when new evidence emerges or policy changes.
Last reviewed: April 2026