XAI Grok 1.5 Vision Leverages Tesla FSD to Understand the Real World

Grok-1.5V is competitive with existing frontier multimodal models in a number of domains, ranging from multi-disciplinary reasoning to understanding documents, science diagrams, charts, screenshots, and photographs. Grok has capabilities in understanding our physical world. Grok outperforms its peers in our new RealWorldQA benchmark that measures real-world spatial understanding. For all datasets below, they evaluate Grok in a zero-shot setting without chain-of-thought prompting. This leading real world understanding is building upon Tesla data and work with FSD (full self driving).

Recent reports are that Tesla has 100,000 Nvidia H100 chips or more. This would be 400 exaflops. Elon has said that XAI needs to train Grok 3 with 100,000 Nvidia H100 chips. It is clear that XAI will push future versions of Grok for real world AI leadership in understanding video and audio.

Writing Code from a Handdrawn Flowchart

Calories from Food Labels

3 thoughts on “XAI Grok 1.5 Vision Leverages Tesla FSD to Understand the Real World”

  1. Grok has been spun up crazy fast. And there are so many competing models, makes me think there is no secret sauce behind OpenAI. Just a lot of data and compute. If that’s true these models are trending to become commodities.

  2. These multimodal things are rather impressive. They’re the closest thing to HAL9000 we’ve got.

    When the actual year 2000 arrived and no Tycho lunar bases nor Discovery missions were available, I expected computers would actually follow the path set in 2001: a Space Odyssey and give us talking systems of human-like understanding.

    Well, they in part did, growing exponentially in capabilities. And no, remaining profoundly dumb machines. Until now.

Comments are closed.