Technical, legal, and ethical challenges of generative artificial intelligence: an analysis of the governance of training data and copyrights

Published in Discover Artificial Intelligence, 2025

Abstract

This article examines the legal, technical, and ethical challenges of generative AI, focusing on the governance of training data and copyright compliance. It addresses the growing tension between AI development and content creators’ rights, particularly regarding the unauthorized use of copyrighted material for model training. By analyzing regulatory frameworks in the United States, European Union, Japan, and Brazil, the study highlights how existing mechanisms–such as fair use, text and data mining (TDM) exceptions, and hybrid models–remain inadequate to resolve the opacity and legal uncertainty surrounding AI training datasets. Drawing on insights from Henderson, Yu, Narayanan, and Kapoor, the paper demonstrates that the absence of transparency not only compromises legal accountability but also exacerbates epistemic risks and distributive asymmetries. Adopting a comparative legal-philosophical methodology, the study proposes governance solutions centered on mandatory transparency obligations, ethical compensation schemes for rights holders, and robust audit mechanisms. These recommendations aim to balance incentives for innovation with fairness, sustainability, and the protection of intellectual property in the AI-driven economy.

BibTeX

@article{pasetti2025technical,
  title={Technical, legal, and ethical challenges of generative artificial intelligence: an analysis of the governance of training data and copyrights},
  author={Pasetti, Marcelo and Santos, James William and Corr{\^e}a, Nicholas Kluge and de Oliveira, Nythamar and Barbosa, Camila Palhares},
  journal={Discover Artificial Intelligence},
  volume={5},
  number={1},
  pages={193},
  year={2025},
  publisher={Springer}
}