Skip to content

What Human Rights Watch found

July 2024. Researchers at Human Rights Watch examined LAION-5B, one of the largest open AI training datasets in existence: 5.85 billion image-caption pairs scraped from the public internet. They analysed less than 0.0001% of the data.

In that tiny sample, they found 362 identifiable Australian children and 358 identifiable Brazilian children. Identifiable means named, aged, located. Real children in a dataset used to train AI systems worldwide.

What the photos showed

  • Newborns with full names and birth dates
  • Preschoolers with full names, ages, and school names
  • Primary school students at school events
  • Children in swimwear
  • First Nations children from multiple communities

One specific case documented by Human Rights Watch: two boys, ages 3 and 4, with their full names, ages, and the name of their Perth preschool all present in the dataset caption. These children were individually identifiable and locatable.

Where the photos came from

School uploads. Personal blogs. Photo-sharing sites. Professional photography websites. YouTube videos marked as "unlisted" (not public, not private, but still scraped). Some of the original photos had been removed from public search engines before being scraped. It did not matter. The scraper captured them anyway.

Permanent

Once images are processed into AI model weights during training, they cannot be removed. Human Rights Watch stated it directly:

"AI models that were trained on the earlier dataset cannot forget the now-removed images." Human Rights Watch, July 2024

Deleting the original source photo changes nothing for models already trained. LAION took the dataset offline temporarily in late 2023 after child sexual abuse material was discovered in it. It was republished. The models trained on the earlier version still contain what they learned. There is no undo.

LAION's response

LAION, the organisation that created the dataset, responded to the Human Rights Watch findings by blaming parents:

"Parents should behave responsibly and not post private sensitive data related to their children on [the] public internet." LAION, response to Human Rights Watch findings, 2024

Schools post these photos. Departments of education require the accounts to be public. Parents are given a binary consent form that mentions none of this. And the dataset builders blame the parents.

The academic evidence

Researchers at the University of Utah and Carnegie Mellon University conducted a large-scale analysis of school Facebook Pages. Their findings:

  • 18 million school Facebook posts analysed
  • 4.9 million identifiable student images found
  • 726,000 included students' first and last names alongside their location

Their conclusion: schools represent "the largest existing collection of publicly accessible, identifiable images of minors."

Not social media companies. Not photo-sharing platforms. Schools. Because school Facebook Pages are public by policy, post frequently, use high-quality photos, and routinely identify children by name.

UN Convention violations

A 2024 peer-reviewed study found that these practices violate three articles of the UN Convention on the Rights of the Child:

  • Article 3: Best interests of the child must be a primary consideration in all actions
  • Article 12: Children have the right to express views on matters affecting them
  • Article 16: Children have the right to privacy

Australia ratified this Convention in 1990. The children in these datasets had no say in the collection of their images, no knowledge it was occurring, and no mechanism to object.

Recent developments

September 2025

Brazil enacts landmark child data protection legislation, the first country to specifically address children's data in AI training.

January 2026

Amazon discovers child sexual abuse material in AI training data used for its own models.

March 2026

Tennessee teenagers sue xAI over AI-generated exploitative images created from school photos.

HRW recommendations

Human Rights Watch called on governments to:

  • Adopt child data protection laws that specifically address AI training
  • Prohibit the scraping of children's personal data for AI development
  • Ban the non-consensual digital replication of minors' likenesses
  • Establish accessible justice mechanisms for children harmed by these practices

Australia's Children's Online Privacy Code is due December 2026. Whether it addresses these recommendations depends on what happens during consultation, which closes June 2026.

Follow the investigation

Get notified when new evidence emerges or policy changes.

Last reviewed: April 2026