Tuesday, July 6, 2010
Results from Inline Asm / Codelets
I thought I would post the initial speed results achieved using the latest inline asm patch for neon-simd.h . The results are decent - a 2x mflop improvement in some cases - but I'm still aiming to get approximately a 10x speedup on average, which will require some pretty intensive assembly coding. The results below are ony for powers of two, just for a quick overview. Also, I'm trying to completely eliminate in-place transforms from the benchmark, since fftw always performs simd transforms out-of-place (as far as I can tell), and out-of-place transforms are generally faster anyway.