Mark Borgerding
4c458be5e9
checkpoint -- I don't think I've broken anything (yet) adding 2d fft.
2003-11-04 23:25:49 +00:00
Mark Borgerding
ee3094a0e4
benchmark utilities
2003-11-04 02:11:00 +00:00
Mark Borgerding
4ebf0b5aca
aded a CHANGELOG
2003-11-04 02:09:53 +00:00
Mark Borgerding
2788fba0bd
added a CHANGELOG
2003-11-04 02:09:48 +00:00
Mark Borgerding
8b4e3bacca
minor comments and added some primes
2003-11-04 02:00:01 +00:00
Mark Borgerding
6c8049cc75
slight changes to Makefile
2003-11-04 01:01:37 +00:00
Mark Borgerding
7b4de0aa11
a little faster
2003-11-03 04:30:50 +00:00
Mark Borgerding
ad4ee571aa
faster radix5
2003-11-03 04:04:01 +00:00
Mark Borgerding
0403fb3e4a
radix 5 a little optimized
2003-11-03 03:48:34 +00:00
Mark Borgerding
3c0c0431e2
radix 5 works, but is 6x slower than fftw
2003-11-03 03:03:16 +00:00
Mark Borgerding
85764e6437
radix 5 doesn't work, but I thik it should.
...
just a checkpoint commit
2003-11-01 16:48:33 +00:00
Mark Borgerding
8ac63adc77
modified time benchmark to repeat same buffer over and over to avoid IO bottlenecks and get more consistent numbers.
2003-11-01 04:44:50 +00:00
Mark Borgerding
471803ca08
removed unused macro
2003-11-01 04:26:02 +00:00
Mark Borgerding
7b7aefe7c4
moved scratch buffer to stack variable
2003-11-01 03:59:43 +00:00
Mark Borgerding
28551899e2
radix 4 faster
2003-11-01 03:49:53 +00:00
Mark Borgerding
d1df249536
radix3 fixed point now works
2003-10-31 04:01:09 +00:00
Mark Borgerding
b1969544a6
radix 3 still doesn't work for fixed
2003-10-30 03:00:49 +00:00
Mark Borgerding
d4f87befda
re-added radix 3 butterfly
2003-10-30 02:02:29 +00:00
Mark Borgerding
ca4c74e07c
Woops, one should not test with input of all zeros
2003-10-29 04:29:01 +00:00
Mark Borgerding
97b18f3fef
comments
2003-10-27 04:02:11 +00:00
Mark Borgerding
d9fcda04b6
version 0.2 upload to sf
2003-10-26 19:29:36 +00:00
Mark Borgerding
ecb1a76974
added zip creation to tarball make target
2003-10-26 04:25:18 +00:00
Mark Borgerding
1db3d91ee5
getting ready for next release
2003-10-26 04:07:32 +00:00
Mark Borgerding
52b4b9ab5c
*** empty log message ***
2003-10-18 01:45:26 +00:00
Mark Borgerding
c239ba2c1c
slight code cleanup, comments
2003-10-18 01:39:36 +00:00
Mark Borgerding
bca7fd5151
compiles with -ansi -pedantic
2003-10-18 01:23:34 +00:00
Mark Borgerding
e2470b3a03
*** empty log message ***
2003-10-18 00:33:38 +00:00
Mark Borgerding
a3d3217ae6
*** empty log message ***
2003-10-18 00:32:54 +00:00
Mark Borgerding
6f8bcedc24
radix 3 fixed point still broken
2003-10-17 02:59:32 +00:00
Mark Borgerding
31d4214f44
radix 3 seems to be pretty fast
...
fixed point broken for some reason
2003-10-17 02:34:22 +00:00
Mark Borgerding
73744b908c
check point
...
fixed does not currently work for radix 3
2003-10-17 01:26:14 +00:00
Mark Borgerding
317f11e66e
starting point for radix 3
...
'make test' output
### testing SNR for 2187 point FFTs
#### DOUBLE
snr_t2f = 292.51
snr_f2t = 304.97
#### FLOAT
snr_t2f = 143.46
snr_f2t = 138.03
#### SHORT
snr_t2f = 49.257
snr_f2t = 16.294
#### timing 10000 x 2187 point FFTs
#### DOUBLE
Elapsed:0:05.05 user:3.60 sys:0.54
#### FLOAT
Elapsed:0:02.41 user:1.85 sys:0.23
#### SHORT
Elapsed:0:04.02 user:3.13 sys:0.08
2003-10-17 00:11:19 +00:00
Mark Borgerding
d6ae498630
took the bitwise and out of the switch case -- may have prevented optimization
2003-10-15 03:45:24 +00:00
Mark Borgerding
5f0efe8f17
pretty happy with radix 2 and radix 4
...
next up is radix 3, or maybe 5
2003-10-15 03:38:05 +00:00
Mark Borgerding
9504aa79c1
Fixed generic mixed radix butterfly
...
'make test' output:
### testing SNR for 1024 point FFTs
#### DOUBLE
snr_t2f = 296.95
snr_f2t = 317.25
#### FLOAT
snr_t2f = 147.96
snr_f2t = 145.14
#### SHORT
snr_t2f = 52.414
snr_f2t = 22.438
#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:03.56 user:2.63 sys:0.19
#### FLOAT
Elapsed:0:01.35 user:1.07 sys:0.10
#### SHORT
Elapsed:0:01.70 user:1.37 sys:0.06
2003-10-15 02:52:34 +00:00
Mark Borgerding
0424734e9d
radix 4 now about as fast as original version
...
'make test' output:
### testing SNR for 1024 point FFTs
#### DOUBLE
snr_t2f = 296.78
snr_f2t = 317.11
#### FLOAT
snr_t2f = 145.28
snr_f2t = 143.51
#### SHORT
snr_t2f = 52.409
snr_f2t = 22.174
#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:03.43 user:2.68 sys:0.25
#### FLOAT
Elapsed:0:01.39 user:1.08 sys:0.11
#### SHORT
Elapsed:0:02.01 user:1.39 sys:0.09
2003-10-15 01:52:13 +00:00
Mark Borgerding
f609401471
about to make some changes -- just wanted a checkpoint
2003-10-15 00:05:50 +00:00
Mark Borgerding
2ae7e0f1f2
radix 4 works but slow
2003-10-14 02:47:25 +00:00
Mark Borgerding
6b76490456
Fixed point works
...
'make test' output:
### testing SNR for 1024 point FFTs
#### DOUBLE
snr_t2f = 296.51
snr_f2t = 315.25
#### FLOAT
snr_t2f = 146.39
snr_f2t = 142.86
#### SHORT
snr_t2f = 58.077
snr_f2t = 27.897
#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:01.53 user:1.06 sys:0.26
#### FLOAT
Elapsed:0:01.29 user:0.98 sys:0.12
#### SHORT
Elapsed:0:02.08 user:1.65 sys:0.03
2003-10-14 01:09:33 +00:00
Mark Borgerding
8460f1f8f5
added optimization for radix 2
...
### testing SNR for 1024 point FFTs
#### DOUBLE
snr_t2f = 296.29
snr_f2t = 314.48
#### FLOAT
snr_t2f = 146.48
snr_f2t = 143.03
#### SHORT
snr_t2f = -30.269
snr_f2t = -60.442
#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:02.77 user:2.22 sys:0.13
#### FLOAT
Elapsed:0:01.65 user:1.35 sys:0.07
#### SHORT
Elapsed:0:02.44 user:2.00 sys:0.06
2003-10-14 00:38:58 +00:00
Mark Borgerding
0d6d61cfce
reduced calling parameters
...
negligible performane impact
2003-10-11 23:07:16 +00:00
Mark Borgerding
0d44569b3b
made one single malloc for all buffers
...
no noticable performance gain
2003-10-11 23:00:12 +00:00
Mark Borgerding
f93a0258df
Simplified some inner loop calcs
...
'make test' output:
### testing SNR for 1024 point FFTs
#### DOUBLE
snr_t2f = 295.34
snr_f2t = 308.77
#### FLOAT
snr_t2f = 146.93
snr_f2t = 143.56
#### SHORT
snr_t2f = 54.799
snr_f2t = 24.562
#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:10.69 user:8.71 sys:0.20
#### FLOAT
Elapsed:0:04.40 user:3.42 sys:0.11
#### SHORT
Elapsed:0:05.62 user:4.77 sys:0.04
2003-10-11 22:45:35 +00:00
Mark Borgerding
911d29d139
changed from static function that wasn't inlining very well to a macro
...
'make test' output:
### testing SNR for 1024 point FFTs
#### DOUBLE
snr_t2f = 295.70
snr_f2t = 308.53
#### FLOAT
snr_t2f = 146.91
snr_f2t = 143.58
#### SHORT
snr_t2f = 54.677
snr_f2t = 24.668
#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:11.38 user:9.15 sys:0.24
#### FLOAT
Elapsed:0:04.18 user:3.39 sys:0.14
#### SHORT
Elapsed:0:06.03 user:4.75 sys:0.15
2003-10-11 22:41:17 +00:00
Mark Borgerding
11983e5056
used += on complex components
...
dramatic speedup -- 'make test' output:
### testing SNR for 1024 point FFTs
#### DOUBLE
snr_t2f = 295.63
snr_f2t = 307.82
#### FLOAT
snr_t2f = 146.25
snr_f2t = 143.37
#### SHORT
snr_t2f = 54.694
snr_f2t = 24.470
#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:16.06 user:12.72 sys:0.25
#### FLOAT
Elapsed:0:04.63 user:3.79 sys:0.13
#### SHORT
Elapsed:0:05.77 user:4.56 sys:0.07
2003-10-11 22:39:40 +00:00
Mark Borgerding
043da3b65d
avoid last recursive call
...
'make test' output:
### testing SNR for 1024 point FFTs
#### DOUBLE
snr_t2f = 295.35
snr_f2t = 308.32
#### FLOAT
snr_t2f = 146.71
snr_f2t = 143.02
#### SHORT
snr_t2f = 54.718
snr_f2t = 24.494
#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:23.05 user:18.95 sys:0.24
#### FLOAT
Elapsed:0:06.45 user:5.17 sys:0.10
#### SHORT
Elapsed:0:05.59 user:4.72 sys:0.06
2003-10-11 14:43:13 +00:00
Mark Borgerding
7ec9402d5b
Fixed point works (in the loosest sense of the word "works")
...
Fixed point sums are divided by 2 each stage. This will never overflow for radix 2 ffts.
For mixed radix, it may overflow, but will usually give better SNR.
'make test' output:
### testing SNR for 1024 point FFTs
#### DOUBLE
snr_t2f = 295.30
snr_f2t = 308.25
#### FLOAT
snr_t2f = 146.92
snr_f2t = 143.25
#### SHORT
snr_t2f = 54.645
snr_f2t = 24.677
#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:25.96 user:19.77 sys:0.22
#### FLOAT
Elapsed:0:06.62 user:5.48 sys:0.11
#### SHORT
Elapsed:0:06.01 user:4.75 sys:0.12
2003-10-11 14:34:01 +00:00
Mark Borgerding
61571342a5
uses lookup table for twiddle factors
...
'make test' output: (Elapsed time is inflated, realplayer was running at time)
### testing SNR for 1024 point FFTs
#### DOUBLE
snr_t2f = 295.41
snr_f2t = 307.88
#### FLOAT
snr_t2f = 144.63
snr_f2t = 143.48
#### SHORT
snr_t2f = -30.111
snr_f2t = -61.637
#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:25.19 user:20.22 sys:0.30
#### FLOAT
Elapsed:0:07.16 user:6.00 sys:0.09
#### SHORT
Elapsed:0:05.89 user:4.66 sys:0.11
2003-10-11 13:38:37 +00:00
Mark Borgerding
30c4ee30f5
Dog slow, but does mixed radix!
...
'make test' output :
### testing SNR for 1024 point FFTs
#### DOUBLE
snr_t2f = 295.52
snr_f2t = 307.98
#### FLOAT
snr_t2f = 144.62
snr_f2t = 143.23
#### SHORT
snr_t2f = -31.515
snr_f2t = -60.836
#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:44.17 user:35.11 sys:0.27
#### FLOAT
Elapsed:0:24.22 user:19.66 sys:0.16
#### SHORT
Elapsed:0:30.39 user:25.07 sys:0.09
2003-10-11 02:21:48 +00:00
Mark Borgerding
08be1d86b4
works on Fout in-place
2003-10-10 21:30:18 +00:00