Ellis Brown

I'm a PhD candidate in Computer Vision and Machine Learning at New York University, currently working on multimodal AI systems and representation learning. My research focuses on developing more capable and efficient vision-language models that can better understand and interact with the real world.

I work closely with Professor Saining Xie and collaborate with researchers at Meta AI. My recent projects include Cambrian-1, a comprehensive exploration of vision-centric multimodal LLMs, and Internet Explorer, a novel approach for targeted representation learning using the open web. I'm particularly interested in how we can leverage internet-scale data and self-supervised learning to build more robust visual understanding systems.

My work spans several interconnected areas including multimodal learning, self-supervised representation learning, and zero-shot generalization. I aim to develop AI systems that can effectively bridge the gap between the digital and physical worlds while maintaining transparency and reproducibility. A key focus is making these systems more efficient and accessible to the broader research community through open-source implementations and comprehensive documentation.

Publications

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Shengbang Tong, Ellis L Brown, Penghao Wu, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, Austin Wang, Rob Fergus, Yann LeCun, Saining Xie

NeurIPS Oral 2024

ABS HTML PDF CODE