Nicholas Kluge

To stimulate the future of open development of neural text generation in Portuguese, we developed both GigaVerbo, a concatenation of deduplicated Portuguese text corpora amounting to 200 billion tokens, and Tucano, a series of decoder-transformers natively pre-trained in Portuguese. All byproducts of our study, including the source code used for training and evaluation, are openly released on GitHub and Hugging Face.

The project aims to develop procedures to examine generally accepted standards for AI systems and their verification and to explore business models for AI certification. Certified AI is a project sponsored by the KI.NRW and made in partnership with the Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), the German Federal Office for Information Security (BSI), the University of Cologne, RWTH Aachen University, the German Institute for Standardization (DIN), and numerous DAX-30 and other companies.

Large language models (LLMs) have significantly advanced natural language processing, but their progress has yet to be equal across languages. While most LLMs are trained in high-resource languages like English, multilingual models generally underperform monolingual ones. Additionally, aspects of their multilingual foundation sometimes restrict the byproducts they produce, like computational demands and licensing regimes. Hence, we developed the TeenyTinyLlama pair: two compact models for Brazilian Portuguese text generation.

Ethical Problem-Solving (EPS) is a framework aimed at promoting the development of safe and ethical artificial intelligence. It is divided into an evaluation stage (performed via Algorithmic Impact Assessment tools) and a recommendation stage. Both these stages represent distinct steps in a human-centered EaaS (Ethics as a Service) framework.

Worldwide AI Ethics is a systematic literature review done by AIRES researchers. Building on the work done by other meta-analysts, this study presents an analysis of 200 documents related to AI ethics and governance, presenting a collection of typologies used to classify our sample, all condensed into an interactive, freely accessible online tool.

Aira is a series of chatbots achieved via instruction-tuning and DPO. This series was developed to help researchers explore the challenges related to the Alignment problem. All models and datasets developed are part of my doctoral dissertation, "Dynamic Normativity: Necessary and Sufficient Conditions for Value Alignment."

The Teeny-tiny Castle is a repository of educational tools for AI Ethics and Safety research. It is a python based course on how to create and use tools for addressing certain safety issues in AI (e.g., interpretability, sustainability, fairness, robustness). In it, you can also find an introductory course on Machine Learning.

We have built a web application to automate an experiment in moral philosophy. This experiment aims to find out how ethical guidelines influence software developers' decision making. We also are seeking to investigate different forms of how we can "teach ethics" in STEM fields.

This projects aims at creating a library of published machine learning models. This library would contain information about these models, their capabilities, and the possible risks (together with ethical concerns) regarding their development/utilization. The aim is to map the biggest problems and threats that contemporary AI poses to our society.

NK-Correa.

Research Projects

Tucano

Certified AI (Zertifizierte KI)

TeenyTinyLlama

Ethical Problem-Solving

Worldwide AI Ethics

Aira

Teeny-tiny Castle

Codes of Ethics in IT

Model Library

Contact