- adversarial examples
- LLMs
- bias
- paper
•
•
•
-
The Mismeasure of Man and Models
Evaluating Allocational Harms in Large Language Models
-
Adjectives Can Reveal Gender Biases Within NLP Models
We extend WinoBias dataset by incorporating gender-associated adjectives and reveal underlying gender bias in GPT-3.5 model.
-
Balancing Tradeoffs between Fickleness and Obstinacy in NLP Models
We introduce Balanced Adversarial Training (BAT) to train models that are robust to two different types of adversarial examples.