Human Rights Watch identified 190 Australian children from less than 0.0001% of a single dataset. Further investigation raised the total to 362. The real number is vastly higher.
In June and July 2024, Human Rights Watch (HRW) published findings from investigations into LAION-5B, a dataset containing 5.85 billion image-caption pairs scraped from the public internet and used to train popular AI image generators including Stable Diffusion.
HRW researchers manually reviewed a sample of just 5,850 image links, less than 0.0001% of the total dataset. In that tiny sample, they initially found 170 Brazilian children (June 2024) and 190 Australian children (July 2024). By September 2024, continued investigation had raised the confirmed totals:
The photos captured every stage of childhood:
"Two boys, ages 3 and 4, grinning from ear to ear as they hold paintbrushes in front of a colourful mural." Human Rights Watch, July 2024
The accompanying caption in the dataset revealed both children's full names and ages, and the name of the preschool they attend in Perth. Anyone with access to the dataset could identify these children, know where they go to school, and know what they look like.
The sources of the children's photos included:
HRW explicitly identified school uploads as one of the sources of children's photos in the dataset. This is not a hypothetical risk. Children's photos posted by schools on public platforms have been confirmed to end up in AI training datasets.
In December 2023, after the Stanford Internet Observatory found over 1,000 verified instances of child sexual abuse material in LAION-5B, the dataset was taken offline. LAION released a cleaned version called Re-LAION-5B in August 2024, which removed CSAM links and the children's photos identified by HRW. But HRW was clear about the limitation:
"AI models that were trained on the earlier dataset cannot forget the now-removed images." Human Rights Watch, September 2024
AI models don't store individual photos. They learn patterns from millions of images during training. Once a child's photo has been processed into a model's weights, there is no way to extract or delete it. The patterns learned from that child's face, body, and identifying information are permanently embedded in every model trained on the dataset.
This means:
When confronted with the findings, LAION's response was to shift blame to families:
"Any information obtained by Human Rights Watch is publicly available, though for some reason unknown to us, they would like to pretend it is not." LAION, in response to Human Rights Watch, 2024
And further:
Parents should "behave responsibly and not post private sensitive data related to their children on [the] public internet, where it can be easily collected." LAION, 2024
The organisation that scraped billions of images, including photos of three-year-olds at preschool, says parents should have known better. Many of these photos were not even posted by parents. They were posted by schools, in good faith, using standard processes.
The tech companies built the systems, scraped the data, and then blamed families. That is why every parent and every school needs to understand what is happening, so we can protect our children together.
Researchers from the University of Utah and Carnegie Mellon University analysed approximately 18 million Facebook posts by US schools and school districts and found:
"The posts we studied may represent the largest existing collection of publicly accessible, identifiable images of minors. It is likely that the photos are being accessed by a range of actors, including government agencies, predictive policing companies, and those with nefarious intent." Rosenberg et al., University of Utah / Carnegie Mellon University
A 2024 peer-reviewed study published in Computers and Education Open found that schools across the UK, US, Australia, and Europe are publishing children's images online without adequately protecting their rights, including:
The problem has not slowed down. If anything, the evidence of harm has escalated:
Human Rights Watch recommended that the Australian Government:
Until those protections exist, the only defence schools have is to stop making children's photos publicly accessible.