Hello and welcome to Changlin Li's (very bare bones) homepage.
I'm interested in AI safety (which generally entails more work in e.g. interpretability than pretraining).
Here are some pedagogical projects I've been working on.
- Illustrating inner alignment failure with a simple RL model
- Trying to use circuits to interpret neural nets for MNIST digits