SRAM-based processor-in-memory (PIM) AI chip proves speedy AI inference performer
Renesas has developed a new processor-in-memory (PIM) technology for AI inference acceleration in low power edge devices. A test chip for the SRAM-based technology achieved 8.8 TOPS/W running convolutional neural network (CNN) workloads, the type of algorithm most commonly used for image recognition.
Renesas’ existing generation of AI inference accelerators, based on its dynamically reconfigurable processor (DRP) architecture, achieves in the order of 1 TOPS/W, enough to enable real time image recognition at the end node. The new PIM technology is almost an order of magnitude better in terms of TOPS/W and could be the basis for implementing incremental learning at the endpoint, the company said.
Processor-in-memory is an increasingly popular technology for AI inference workloads, which involve adding up vast amounts of input data multiplied by weight factors. In PIM devices, multiply-accumulate (MAC) operations are performed in the memory itself as the data is read out.
In the Renesas device, memory cells storing weights can be multiplied by input data by controlling the cell’s output switch, which controls the output current. Measuring the current in the bit line then effectively adds all the outputs together. Performing MAC operations in memory in this way eliminates the bottleneck between memory and processor by avoiding unnecessary data transfer.
In an exclusive interview with EETimes, Koichi Nose, senior principal engineer at Renesas Electronics, explained the new techniques that have been used to improve accuracy and reduce power consumption.
“Traditional processor-in-memory technology cannot achieve adequate accuracy for large scale computation,” said Nose, highlighting the workarounds traditionally required to circumvent poor reliability caused by process variations. “Binary data is also inadequate to express some complex neural networks... it causes degradation of accuracy.”
The new PIM technology is ternary, meaning each cell has three states: -1, 0 or 1. This allows more complex data to be expressed than with binary cells, Nose explained.
If the ternary memory cell holds +1 or -1, current can flow into the bit line, but if the cell stores a 0, no current flows, which helps keep power consumption low.
“Also, the weight data can easily be expanded to an arbitrary number of bits,” said Nose. “The weight data in a neural network is multi-bit information, a zero or a large plus or minus value. It’s difficult to express multi-bit signed information in binary cells. The proposed memory circuit can easily express arbitrary signed bit operation by utilising the combination of a ternary cell and a simple digital calculation block... since this can support different required calculation accuracy on a per-user basis, the user can optimise the balance between accuracy and power consumption.”
Traditional PIM topologies use ADCs to convert the bit-line current into the output data value, but while ADCs are effective, they are power-hungry and take up valuable chip area, Nose said.
Renesas’ PIM technology instead uses a 1-bit sense amplifier from the standard SRAM macro as a comparator, in combination with replica cells (equivalent to the current generation part of a memory cell) whose current can be controlled flexibly. Comparing the replica cell current with the ternary cell current effectively detects the current output of the ternary cell.
Zero-detectors also help reduce power consumption. If the MAC operation result is equal to zero, operation of the comparators is stopped to save energy.
“In a [typical] neural network circuit, almost all the nodes are assigned to zero; only a small amount of neurons are activated, about 1%. So almost all the calculation results are assigned to zero,” said Nose. “Activating the zero-detector circuit shuts down the comparator and contributes to the reduction of power consumption. By combining the comparator AD converter technology and the zero-detector technology, power can be reduced by one order of magnitude.”
In SRAM arrays, manufacturing process variations frequently lead to failures. Errors occur when data is written to individual cells with significantly different electrical properties due to these process variations.
“To avoid this problem, we use the same feature of the neural network — almost all the nodes are assigned to zero,” he said. “We can avoid calculation errors by shuffling the data so that zeros are stored in the [adversely] affected cells.”
In the ternary memory cells, if a zero is stored, no current flows in the bit line, so the summation result is not dependent on the cell current.
How are adversely affected cells identified?
“We are developing some other error cell detection methods, but in this chip, we use a simple methodology,” he said. “We measure the output of the neural network and check whether the outcome is correct, to identify error cells [which do not store] the correct output value.”
Renesas’ 3x3 mm test chip is built on 12nm process technology and consists of four clusters, each of which can operate a different AI process simultaneously. In each cluster, neural network weight data is stored in the PIM block and MAC operation results are stored in a standard SRAM block.
The test chip contains 4 Mb of PIM computational memory and 1.5 MB of SRAM, enough to evaluate a compact CNN without using external memory. This chip has achieved 8.8 TOPS/W power efficiency.
A simple demo of the test chip in a prototype AI module also incorporating a small battery, microcontroller, camera and other peripherals showed that inference for real time person detection could be achieved with only 5 mW.
>> This article was originally published on our sister site, EE Times: "PIM Techniques Boost AI Inference to 8.8 TOPS/W."