By Sally Ward-Foxton, EETimes (January 13, 2022)
Ceva has revamped its NeuPro AI accelerator engine IP, adding specialized co-processors for Winograd transforms and sparsity operations and a general-purpose vector processing unit alongside the engine’s MAC array. The new generation engine, NeuPro-M, can boost performance 5-15X (depending on the exact workload) compared to Ceva’s second generation NeuPro-S core (released Sept 2019). For example, ResNet-50 performance was improved 4.9X without using the specialized engines – boosted to 14.3X when using specialized co-processors, according to Ceva. Results for Yolo-v3 showed similar speedups. The core’s power efficiency is expected to be 24 TOPS/Watt for 1.25 GHz operation.
The NeuPro-M engine architecture allows for parallel processing on two levels — between the engines (if multiple engines are used), and within the engines themselves. The main MAC array has 4000 MACs capable of mixed precision operation (2-16 bits). Alongside this are new, specialized co-processors for some AI tasks. Local memory in each engine breaks the dependence on the core shared memory and on external DDR; the co-processors in each engine can work in parallel on the same memory, though they sometimes transfer data from one to another directly (without passing through memory). The size of this local memory is configurable based on network size, input image size, number of engines in the design and customers’ DDR latency and bandwidth.