Ellis Brown
I'm a PhD candidate in Computer Vision and Machine Learning at New York University, currently working on multimodal AI systems and representation learning. My research focuses on developing more capable and efficient vision-language models that can better understand and interact with the real world.
I work closely with Professor Saining Xie and collaborate with researchers at Meta AI. My recent projects include Cambrian-1, a comprehensive exploration of vision-centric multimodal LLMs, and Internet Explorer, a novel approach for targeted representation learning using the open web. I'm particularly interested in how we can leverage internet-scale data and self-supervised learning to build more robust visual understanding systems.
My work spans several interconnected areas including multimodal learning, self-supervised representation learning, and zero-shot generalization. I aim to develop AI systems that can effectively bridge the gap between the digital and physical worlds while maintaining transparency and reproducibility. A key focus is making these systems more efficient and accessible to the broader research community through open-source implementations and comprehensive documentation.
Publications
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Shengbang Tong, Ellis L Brown, Penghao Wu, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, Austin Wang, Rob Fergus, Yann LeCun, Saining Xie
NeurIPS Oral 2024
V-IRL: Grounding Virtual Intelligence in Real Life
Jihan Yang, Runyu Ding, Ellis L Brown, Xiaojuan Qi, Saining Xie
ECCV 2024
Your Diffusion Model is Secretly a Zero-Shot Classifier
Alexander C. Li, Mihir Prabhudesai, Shivam Duggal, Ellis L Brown, Deepak Pathak
ICCV 2023
Internet Explorer: Targeted Representation Learning on the Open Web
Alexander C. Li, Ellis L Brown, Alexei A. Efros, Deepak Pathak
ICML 2023
Internet Curiosity: Directed Unsupervised Learning on Uncurated Internet Data
Alexander C. Li, Ellis L Brown, Alexei A. Efros, Deepak Pathak
ECCV Workshops 2022
SpatioTemporal Template-based Search: An Architecture to Model Human Search for Spatiotemporal Targets
Ellis L Brown, N. Warford, M. Kunda
Advances in Cognitive Systems