This FFT circuit employs unique architectural characteristics, different than any other FFT implementation. In particular the locality, simplicity and regularity of the processing core keeps interconnect delays lower than cell delays, leading to clock speeds that can approach the FPGA fabric limitations, e.g., "worst case" Fmax speeds >500MHz in 65nm FPGA technology. Short critical path lengths also lower power dissipation. Additionally, a novel "base-4" algorithm reduces the number of cycles needed per FFT to less than the transform size. Finally, it includes a low overhead hybrid floating-point feature that increases dynamic range for a given fixed-point word size.