I'm waiting for another benchmark to finish to publish my (rather lengthy) weekly update. It should be done within the next hour. Meanwhile, here's a small sample of many more graphs to come.
update-20100812 (am): right before I was about to publish the rest of the graphs, I ran into a really strange (probably alignment-related) bug that was causing fftwff to stop at N=32. I've sinced fixed the bug, but also realized that I completely neglected optimized memory copies! FFTW terms these as 'rank-0' transforms and you can probably imagine that they're absolutely essential for pretty much everything. This morning I identified the appropriate files to fix, and will get to it tonight after spending the rest of the day on my thesis. For the impatient, I gracefully refer you to my git repository.
update-20100814 (pm): ... found the bug... oddly enough it was not due to misalignment. libavcodec/fft.c:75 in ff_init_ff_cos_tabs(). I rebuilt ffmpeg it with hardcoded tables.
update-20100816 (am): i've since fixed several other bugs to do with alignment and vector strides, added the beginnings of memory copy acceleration (for rank-0 transforms and buffered operations), and am (yet again) generating lots of lovely graphs.