Kenneth Li

I am a Ph.D. student at Harvard SEAS, advised by Hanspeter Pfister.

I have a broad interest in Machine Learning, particularly in unsupervised techniques with application to vision, language, and audio. I enjoy reading about Linguistics, Neuroscience, and Psychology. Collaborations and casual chats are welcomed.

Contact: ke_li [at]
Google Scholar  |  LinkedIn  |  Twitter  |  Github

profile photo
clean-usnob Towards Tokenized Human Dynamics Representation
Kenneth Li, Xiao Sun, Zhirong Wu, Fangyun Wei, Stephen Lin
[Arxiv] | [Code] | [Demo]
By self-supervised acton discovery, we convert human dynamics understanding into a language problem.

clean-usnob Pose Recognition with Cascade Transformers
Ke Li*, Shijie Wang*, Xiang Zhang*, Yifan Xu, Weijian Xu, Zhuowen Tu
CVPR, 2021
[Arxiv] | [Code] | [Video]
By using Transformer architecture, we can clean up heuristic designs that has long bedeviled pose estimation models in an end-to-end fashion.

clean-usnob Unsupervised Discriminative Learning of Sounds for Audio Event Classification
Sascha Hornauer, Ke Li, Stella X. Yu, Shabnam Ghaffarzadegan, Liu Ren
ICASSP, 2021
[Arxiv] | [Slides]
A self-supervised learning model that can transfer knowledge across audio datasets and deliver on-par performance with ImageNet pre-training.

clean-usnob Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text
Difei Gao*, Ke Li*, Ruiping Wang, Shiguang Shan, Xilin Chen
CVPR, 2020
[Arxiv] | [Code] | [Supp] | [Video]
Texts spotted in daily images could be rare, polysemous, and ambiguous, but we can pin down their semantics by across-modality message passing.

Design and course code from Jon Barron's website
Last update: 11/2021