Published: Thu, May 18, 2017
Money | By Armando Alvarado

Google's new TPUs are here to accelerate AI training

Google's new TPUs are here to accelerate AI training

Google has designed and deployed a second generation of its TensorFlow Processor Unit (TPU) and is giving access to the machine-learning ASIC as a cloud service for commercial customers and researchers.

At the company's I/O 2017 developer's conference, Google unveiled the second-generation Tensor Processing Unit and it is capable of delivering a lot of computing power. Google also uses the computation power of TPUs every time someone enters a query into its search engine.

"Each of these new TPU devices delivers up to 180 teraflops of floating-point performance".

Google's making the Research Cloud available to accelerate the pace of machine learning research and plans to share it with entities like Harvard Medical School.

Though machine learning is normally carried out by GPUs made by NVIDIA, Google has chose to build some its own hardware and optimize it to work well with its software. A TPU pod contains 64 TPUs and provides up to 11.5 petaflops. The original TPU was essentially an 8-bit integer processor, which topped out at 92 teraops. But the newest TPU can also run software for leafing through hundreds of thousands of images or search terms and learning to organize pictures or suggest websites without explicit programming. It's now both much faster capable of floating-point computation, which means it's suitable for training neural networks, too. Just over a week ago we saw Nvidia's latest Tesla V100 accelerator launched, featuring the Volta GV100 GPU, created with dedicated Tensor Cores in the silicon for the first time. Inference performance of the new Cloud TPU has yet to be shared by Google. Nvidia even named Google Cloud as a notable customer in its latest annual report.

Two Senate panels seek Comey memos, Intel wants testimony
Chaffetz requested in his letter all memoranda, notes, summaries, and recordings between Trump and Comey by May 24. The FBI is probing whether Flynn broke any laws. "This is what caused President Nixon to resign from office".

The V100, by the way, delivers up to 120 teraflops of mixed precision (16-bit and 32-bit) matrix math. If the TPU's 180 TPU teraflops are of the same caliber, then Google has surged ahead of the competition, at least in raw flops. Of course, other aspects of the hardware, like memory capacity, bandwidth, cache buffering, and instruction/thread management can significantly affect application performance, so at this point there is no definitive way to know if the TPU truly has a performance edge.

Businesses will be able to use the new chips through Google's Cloud Platform, though the company hasn't provided exact details on what form those services will take.

These Cloud TPUs, available to companies via Google Compute Engine, support larger amounts of machine learning compared to their first-generation counterparts launched at Google's annual developers conference previous year.

For now, it looks like the TPU only supports the TensorFlow deep learning framework, which makes sense, given that it was originally developed by Google and has been the framework of choice internally. "If we can get the time for each experiment down from weeks to days or hours, this improves the ability for everybody doing machine learning to iterate more quickly and do more experiments", Dean says.

Like this: