Tachyum Prodigy Chip Now Has 192 Universal Cores

 Tachyum
Tachyum

This week, Tachyum said that by using the latest electronic design automation (EDA) tools it has managed to squeeze 50% more cores into its Prodigy processor while increasing die size by only 20%. The 192-core chip does not seem to exist in silicon as of now and the company did not share when it plans to start its sampling or shipping these processors to interested parties.

Last year Tachyum sued Cadence for providing IP that did not meet its expectations and had to switch to IP from another provider or providers. Because of this, it had to also change RTL simulation and layout tools. The company did not disclose which EDA tools it uses for Prodigy development, but it claims that the new set of programs enabled it to tweak various parameters, resulting in a 50% increase in core count (from 128 to 192), increase L2/L3 cache from 128MB to 192MB, and a jump in SERDES from 64 to 96 per chip. Die size of the processor increased from 500 mm2 to 600 mm2, or by around 20%.

Tachyum asserts that it could squeeze more of its universal cores within the 858 mm2 reticle limit, performance of all cores would be constrained by memory bandwidth, even when paired with 16 DDR5 channels operating at a 7200MT/s data transfer rate.

"We have achieved better results and timing with our new EDA physical design tools," said Dr. Radoslav Danilak, founder and CEO of Tachyum. "[…] while we did not have any choice but to change EDA tools, our physical design (PD) team worked hard to redo physical design and optimizations with the new set of PD tools, as we approach volume-level production."

Tachyum's Prodigy is a versatile processor with up to 192 unique 64-bit VLIW cores that boast two 1024-bit vector units, a 4096-bit matrix unit, a 64KB instruction cache, a 64KB data cache, and a 1MB L2 cache. Interestingly, unused L2 caches from other cores can be repurposed as a supplemental L3 cache.

When Prodigy runs native code, proper compiler optimizations can enable 4-way out-of-order processing (despite the fact that VLIW is meant to be in-order). Furthermore, Prodigy's instruction set architecture allows for enhanced parallelism through specialized 'poison bits.'

Perhaps the most interesting peculiarity of the Prodigy processor is that it can emulate x86, Arm, CUDA and RISC-V binaries without compromising performance, according to Tachyum. Despite past challenges faced by VLIW processors emulating x86 code, Tachyum is optimistic about its performance, even if certain translations might cause a 30-40% drop.