Research Projects
Tucano
To stimulate the future of open development of neural text generation in Portuguese, we developed both GigaVerbo, a
concatenation of deduplicated Portuguese text corpora amounting to 200 billion tokens, and Tucano, a series of
decoder-transformers natively pre-trained in Portuguese. All byproducts of our study, including the source code used for training and evaluation,
are openly released on GitHub and
Hugging Face.
Certified AI (Zertifizierte KI)
The project aims to develop procedures to examine generally accepted standards for AI systems and their verification and to explore business
models for AI certification.
Certified AI is a project sponsored by the KI.NRW and made in partnership with the Fraunhofer Institute for Intelligent
Analysis and Information Systems (IAIS), the German Federal Office for Information Security (BSI), the University of Cologne, RWTH Aachen University,
the German Institute for Standardization (DIN), and numerous DAX-30 and other companies.
TeenyTinyLlama
Large language models (LLMs) have significantly advanced natural language processing, but their progress
has yet to be equal across languages. While most LLMs are trained in high-resource languages like English,
multilingual models generally underperform monolingual ones. Additionally, aspects of their multilingual
foundation sometimes restrict the byproducts they produce, like computational demands and licensing regimes.
Hence, we developed the TeenyTinyLlama pair: two compact models for Brazilian Portuguese text generation.
Ethical Problem-Solving
Ethical Problem-Solving (EPS) is a
framework aimed at promoting the development of safe and ethical artificial intelligence.
It is divided into an evaluation stage (performed via Algorithmic Impact Assessment tools) and a recommendation stage. Both these
stages represent distinct steps in a human-centered EaaS (Ethics as a Service) framework.
Worldwide AI Ethics
Worldwide AI Ethics is a systematic literature review done by AIRES researchers.
Building on the work done by other meta-analysts, this study presents an analysis of
200 documents related to AI ethics and governance, presenting a collection of
typologies used to classify our sample, all condensed into an interactive,
freely accessible online tool.
Aira
Aira is a series of chatbots achieved via instruction-tuning and DPO. This series was developed to help researchers explore the
challenges related to the Alignment problem. All models and datasets developed are part of my doctoral dissertation,
"Dynamic Normativity: Necessary and Sufficient Conditions for Value Alignment."
Teeny-tiny Castle
The Teeny-tiny Castle is a
repository of educational tools for AI Ethics and Safety research.
It is a python based course on how to create and use tools for addressing certain safety issues
in AI (e.g., interpretability, sustainability, fairness, robustness). In it, you can also find an
introductory course on Machine Learning.
Codes of Ethics in IT
We have
built a web application to automate an experiment in moral philosophy.
This experiment aims to find out how ethical guidelines influence software developers'
decision making. We also are seeking to investigate different forms of how we can
"teach ethics" in STEM fields.
Model Library
This projects aims at creating a
library of published machine learning models. This library
would contain information about these models, their capabilities, and the possible risks
(together with ethical concerns) regarding their development/utilization. The aim is to map
the biggest problems and threats that contemporary AI poses to our society.