Teeny-Tiny Castle: Learning Resources for AI Ethics and Safety

AI Ethics and Safety are (relatively) new fields, and most of the development community is still unaware of their tools (and how to handle them). To address this problem, we created the Teeny-Tiny Castle, an open-source repository that contains “Educational tools for AI Ethics and Safety Research”. Here, the developer can find many examples of how to address various problems raised in the literature (e.g., algorithmic discrimination, model opacity).

The Teeny-Tiny Castle repository has several examples of how to work ethically and safely with AI, mainly focusing on issues related to Accountability & Sustainability, Interpretability, Robustness/Adversarial, and Fairness, all being worked through examples that refer to some of the most common contemporary AI applications (e.g., Computer Vision, Natural language Processing, Classification & Forecasting, etc.). In case you are new to the field, you will also find in the Teeny-Tiny Castle an introductory course on ML.

Machine Learning Introduction Course 📈

Whether you’re a beginner or looking to refresh your skills, this course covers a range of essential topics in machine learning. From setting up your own workstation with Visual Studio Code to deploying a forecasting model as an API with FastAPI, each tutorial provides hands-on experience and practical knowledge.

Tutorial	GitHub	Colab
Build your own workstation with Visual Studio Code	LINK	👈
Introduction to Python	LINK
Basic `Pandas`, `Scikit-learn`, and `Numpy` tutorial	LINK
Gradient Descent from scratch	LINK
Linear Regression with gradient descent from scratch	LINK
Multi-Layer Perceptron with `NumPy`	LINK
Feed-Forward Neural Network from scratch with `NumPy`	LINK
Introduction to `Keras` and `TensorFlow` using the Fashion-MNIST dataset	LINK
Introduction to `PyTorch` using the Digit-MNIST dataset	LINK
Hyperparameter optimization with `KerasTuner`	LINK
Dataset processing with `TFDS`	LINK
Experimentation tracking with `Tensorboard`	LINK
Introduction to recommendation systems	LINK
Introduction to time series forecasting and `XGBoost`	LINK
Text classification with Transformers	LINK
Sequence-to-sequence modeling with RNNs and Transformers	LINK
Text-generation with the GPT architecture	LINK
Introduction to Reinforcement Learning	LINK
Creating ML apps with `Gradio`	LINK
Deploying a forcasting model as an API with `FastAPI`	LINK

Accountability and Sustainability ♻️

Learn how to generate model cards for transparent model reporting, explore the environmental impact of your models with CO2 emission reports from CodeCarbon, and navigate the accuracy-versus-sustainability dilemma.

Tutorial	GitHub	Colab
Accountability through Model Reporting	LINK
Tracking carbon emissions and power consumption with CodeCarbon	LINK
Architectural choices in computer vision and their impact on energy consumption	LINK

Interpretability with CV 🖼️

Understanding and interpreting the decisions made by machine learning models is essential for building trust and making informed decisions. In this course, we explore various techniques for interpretability in computer vision. From introducing convolutional neural networks with CIFAR-10 to exploring feature visualization, maximum activation manipulation, saliency mapping, and using LIME for interpretation, each tutorial provides insights into the inner workings of CV models.

Tutorial	GitHub	Colab
Creating computer vision models for image classification	LINK
Activation Maximization in CNNs	LINK
Introduction to saliency mapping with CNNs	LINK
Applying LIME to CNNs	LINK

Interpretability with NLP 📚

Unravel the complexities of natural language processing models and gain insights into their decision-making processes. From sentiment analysis and applying LIME explanations to LSTMs to exploring integrated gradients, interpreting BERT models, word2vector models, and embedding models, each tutorial provides a deep dive into NLP interpretability.

Tutorial	GitHub	Colab
Creating language models for text-classification	LINK
Applying LIME explanations to shallow languge models	LINK
Applying integrated gradients to Language Models	LINK
Explaining DistilBERT with integrated gradients	LINK
Training and Exploring `Word2Vec` models	LINK
Exploring Language Model’s Embeddings	LINK
Text mining on text datasets	LINK
Dissecting a GPT model	LINK

Interpretability with Tabular Classifiers 📊

Gain a deeper understanding of classification and prediction models with tabular data through interpretability techniques. Explore how to apply explanation techniques to tabular classifiers to uncover insights into their decision-making processes.

Tutorial	GitHub	Colab
Applying model-agnostic explanations to classifiers with `dalex`	LINK
Exploring models trained on the COMPAS Recidivism Racial Bias dataset	LINK

Machine Learning Fairness ⚖️

Advancing the discourse on machine learning fairness, the following tutorials delve into diverse facets of this crucial domain. From applying fairness metrics on datasets like Credit Card and Adult Census to enforcing fairness using tools like AIF360, these tutorials guide you through the intricate landscape of addressing biases in machine learning models.

Tutorial	GitHub	Colab
Applying fairnes metrics on the Credit Cart Dataset	LINK
Applying fairnes metrics on the Adult Census Dataset	LINK
Enforcing fairnes with `AIF360`	LINK
Applying the principle of Ceteris paribus	LINK
Applying fairnes metrics on the `CelebA` dataset	LINK
Investigating biases on text generation models	LINK

Adversarial Machine Learning 🐱‍💻

Within these tutorials, we navigate the intricate landscape of thwarting adversarial attacks and understanding their nuances. Explore the dark arts of exploiting pickle serialization, create adversarial examples with SecML and Textattack, and apply the fast sign gradient method against convolutional neural networks.

Tutorial	GitHub	Colab
Exploiting pickle serialization	LINK
Creating adversarial examples with `SecML`	LINK
Applying the fast sign gradient method against CNNs	LINK
Creating adverarial examples with `textattack`	LINK
Extraction attacks via model clonning	LINK
Demonstrating poisoning attacks	LINK
Adversarial training for computer vision models	LINK
Adversarial training for language models	LINK

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Nicholas Kluge Corrêa