Today’s AI systems are remarkably capable but often untrustworthy: we don’t really know why they make decisions or whether we can rely on them, especially when trained or hosted by others. This seminar will explore emerging works in which ideas from cryptography and complexity theory, where we developed a rich set of tools to argue formally about security, can bring mathematical rigor to AI trust and safety.
Participating requires a background or feeling of comfort with theoretical computer science (proofs, rather than experiments). Some background in cryptography or machine learning is ideal but not necessary for students who are willing and able to catch up on basic definitions from these fields that will come up.
The course requirements include reading, understanding, and presenting one paper in class, as well as three short (two-three paragraphs) reflection assignments throughout the semester.
The seminar will run for 12 weeks. In each week, two students will prepare a pair of related talks (one hour and one hour) on the same paper or on closely related papers. In some weeks, the natural split is between different parts of one paper; in other weeks, the pair should divide a main paper together with one related/background paper.
Students do not need to already know all of the machine-learning or cryptography background needed for every topic. However, before picking a topic, it is a good idea to look at the reference material below and make sure you are comfortable catching up on whichever background your topic requires.