At Cloud Next, its annual user conference, Google Cloud today announced the launch of the fifth generation of its tensor processing units (TPUs) for AI training and inferencing. Google announced the fourth version of its custom processors in 2021, but it only became available to developers in 2022.
The company notes that it built this edition of the chip with a focus on efficiency. Compared to the last generation, this version promises to deliver a 2x improvement in training performance per dollar and a 2.x5 improvement in inferencing performance per dollar.
“This is the most cost-efficient and accessible cloud TPU to date,” Mark Lohmeyer, the VP and GM for compute and ML infrastructure at Google Cloud, said in a press conference ahead of today’s announcement.
Lohmeyer also stressed that the company ensured that users would be able to scale their TPU clusters beyond what was previously possible.
“We’re enabling our customers to easily scale their AI models beyond the physical boundaries of a single TPU pod or a single TPU cluster,” he explained. “So in other words, a single large AI workload can now span multiple physical TPU clusters scaling to literally tens of thousands of chips — and doing so very cost-effectively. As a result across cloud GPUs and cloud TPUs, we’re really giving our customers a lot of choice and flexibility and optionality to meet the needs of the broad set of AI workloads that we see emerging.”
In addition to the next generation of TPUs, Google also today announced that next month, it will make Nvidia’s H100 GPUs generally available to developers as part of its A3 series of virtual machines. You can read more about this here.