|
Dialogue Action Tokens: Steering Language Models in Goal-Directed Dialogue with a Multi-Turn Planner
Kenneth Li*, Yiming Wang*, Fernanda Viégas, Martin Wattenberg
preprint
Arxiv | Code
A small planner model is trained using reinforcement learning to steer large language models in multi-round dialogues.
|
|
Designing a Dashboard for Transparency and Control of Conversational AI
Yida Chen, Aoyu Wu, Trevor DePodesta, Catherine Yeh, Kenneth Li, Nicholas Castillo Marin, Oam Patel, Jan Riecke, Shivam Raval, Olivia Seow, Fernanda Viégas, Martin Wattenberg
preprint
Arxiv | Code | Project Page
We design and evaluate a dashboard interface for visualizing and controlling the internal user model in a conversational LLM.
|
|
Measuring and Controlling Instruction (In)Stability in Language Model Dialogs
Kenneth Li, Tianle Liu, Naomi Bashkansky, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg
COLM, 2024 (Oral)
Arxiv | Code | The Gradient
When a dialogue goes long, a chatbot ceases to follow its system prompt surprisingly quickly—within 8 rounds.
|
|
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
Kenneth Li, Samy Jelassi, Hugh Zhang, Sham Kakade, Martin Wattenberg, David Brandfonbrener
ICML, 2024
Arxiv | Code
Through rejection sampling, we leverage a language model's own discriminative capability to boost its generative capability.
|
|
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Kenneth Li*, Oam Patel*, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg
NeurIPS, 2023 (Spotlight)
Arxiv | Code | Stand-alone Model
By manipulating the activations of a language model, we can compel it to tell the truth it knows but otherwise hides.
|
|
Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
Kenneth Li, Aspen Hopkins, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg
ICLR, 2023 (Oral)
Arxiv | Code | Demo | The Gradient | Scientific American | The Atlantic | Nature News | Andrew Ng
In a transformer trained on Othello transcripts, we uncover an interpretable and controllable world model of the game board.
|
|