Commit Graph

58 Commits

Author SHA1 Message Date
Mark Borgerding
8ac63adc77 modified time benchmark to repeat same buffer over and over to avoid IO bottlenecks and get more consistent numbers. 2003-11-01 04:44:50 +00:00
Mark Borgerding
471803ca08 removed unused macro 2003-11-01 04:26:02 +00:00
Mark Borgerding
7b7aefe7c4 moved scratch buffer to stack variable 2003-11-01 03:59:43 +00:00
Mark Borgerding
28551899e2 radix 4 faster 2003-11-01 03:49:53 +00:00
Mark Borgerding
d1df249536 radix3 fixed point now works 2003-10-31 04:01:09 +00:00
Mark Borgerding
b1969544a6 radix 3 still doesn't work for fixed 2003-10-30 03:00:49 +00:00
Mark Borgerding
d4f87befda re-added radix 3 butterfly 2003-10-30 02:02:29 +00:00
Mark Borgerding
ca4c74e07c Woops, one should not test with input of all zeros 2003-10-29 04:29:01 +00:00
Mark Borgerding
97b18f3fef comments 2003-10-27 04:02:11 +00:00
Mark Borgerding
d9fcda04b6 version 0.2 upload to sf 2003-10-26 19:29:36 +00:00
Mark Borgerding
ecb1a76974 added zip creation to tarball make target 2003-10-26 04:25:18 +00:00
Mark Borgerding
1db3d91ee5 getting ready for next release 2003-10-26 04:07:32 +00:00
Mark Borgerding
52b4b9ab5c *** empty log message *** 2003-10-18 01:45:26 +00:00
Mark Borgerding
c239ba2c1c slight code cleanup, comments 2003-10-18 01:39:36 +00:00
Mark Borgerding
bca7fd5151 compiles with -ansi -pedantic 2003-10-18 01:23:34 +00:00
Mark Borgerding
e2470b3a03 *** empty log message *** 2003-10-18 00:33:38 +00:00
Mark Borgerding
a3d3217ae6 *** empty log message *** 2003-10-18 00:32:54 +00:00
Mark Borgerding
6f8bcedc24 radix 3 fixed point still broken 2003-10-17 02:59:32 +00:00
Mark Borgerding
31d4214f44 radix 3 seems to be pretty fast
fixed point broken for some reason
2003-10-17 02:34:22 +00:00
Mark Borgerding
73744b908c check point
fixed does not currently work for radix 3
2003-10-17 01:26:14 +00:00
Mark Borgerding
317f11e66e starting point for radix 3
'make test' output
### testing SNR for  2187 point FFTs
#### DOUBLE
snr_t2f = 292.51
snr_f2t = 304.97
#### FLOAT
snr_t2f = 143.46
snr_f2t = 138.03
#### SHORT
snr_t2f = 49.257
snr_f2t = 16.294

#### timing 10000 x 2187 point FFTs
#### DOUBLE
Elapsed:0:05.05 user:3.60 sys:0.54
#### FLOAT
Elapsed:0:02.41 user:1.85 sys:0.23
#### SHORT
Elapsed:0:04.02 user:3.13 sys:0.08
2003-10-17 00:11:19 +00:00
Mark Borgerding
d6ae498630 took the bitwise and out of the switch case -- may have prevented optimization 2003-10-15 03:45:24 +00:00
Mark Borgerding
5f0efe8f17 pretty happy with radix 2 and radix 4
next up is radix 3, or maybe 5
2003-10-15 03:38:05 +00:00
Mark Borgerding
9504aa79c1 Fixed generic mixed radix butterfly
'make test' output:
### testing SNR for  1024 point FFTs
#### DOUBLE
snr_t2f = 296.95
snr_f2t = 317.25
#### FLOAT
snr_t2f = 147.96
snr_f2t = 145.14
#### SHORT
snr_t2f = 52.414
snr_f2t = 22.438

#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:03.56 user:2.63 sys:0.19
#### FLOAT
Elapsed:0:01.35 user:1.07 sys:0.10
#### SHORT
Elapsed:0:01.70 user:1.37 sys:0.06
2003-10-15 02:52:34 +00:00
Mark Borgerding
0424734e9d radix 4 now about as fast as original version
'make test' output:
### testing SNR for  1024 point FFTs
#### DOUBLE
snr_t2f = 296.78
snr_f2t = 317.11
#### FLOAT
snr_t2f = 145.28
snr_f2t = 143.51
#### SHORT
snr_t2f = 52.409
snr_f2t = 22.174

#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:03.43 user:2.68 sys:0.25
#### FLOAT
Elapsed:0:01.39 user:1.08 sys:0.11
#### SHORT
Elapsed:0:02.01 user:1.39 sys:0.09
2003-10-15 01:52:13 +00:00
Mark Borgerding
f609401471 about to make some changes -- just wanted a checkpoint 2003-10-15 00:05:50 +00:00
Mark Borgerding
2ae7e0f1f2 radix 4 works but slow 2003-10-14 02:47:25 +00:00
Mark Borgerding
6b76490456 Fixed point works
'make test' output:

### testing SNR for  1024 point FFTs
#### DOUBLE
snr_t2f = 296.51
snr_f2t = 315.25
#### FLOAT
snr_t2f = 146.39
snr_f2t = 142.86
#### SHORT
snr_t2f = 58.077
snr_f2t = 27.897

#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:01.53 user:1.06 sys:0.26
#### FLOAT
Elapsed:0:01.29 user:0.98 sys:0.12
#### SHORT
Elapsed:0:02.08 user:1.65 sys:0.03
2003-10-14 01:09:33 +00:00
Mark Borgerding
8460f1f8f5 added optimization for radix 2
### testing SNR for  1024 point FFTs
#### DOUBLE
snr_t2f = 296.29
snr_f2t = 314.48
#### FLOAT
snr_t2f = 146.48
snr_f2t = 143.03
#### SHORT
snr_t2f = -30.269
snr_f2t = -60.442

#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:02.77 user:2.22 sys:0.13
#### FLOAT
Elapsed:0:01.65 user:1.35 sys:0.07
#### SHORT
Elapsed:0:02.44 user:2.00 sys:0.06
2003-10-14 00:38:58 +00:00
Mark Borgerding
0d6d61cfce reduced calling parameters
negligible performane impact
2003-10-11 23:07:16 +00:00
Mark Borgerding
0d44569b3b made one single malloc for all buffers
no noticable performance gain
2003-10-11 23:00:12 +00:00
Mark Borgerding
f93a0258df Simplified some inner loop calcs
'make test' output:
### testing SNR for  1024 point FFTs
#### DOUBLE
snr_t2f = 295.34
snr_f2t = 308.77
#### FLOAT
snr_t2f = 146.93
snr_f2t = 143.56
#### SHORT
snr_t2f = 54.799
snr_f2t = 24.562

#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:10.69 user:8.71 sys:0.20
#### FLOAT
Elapsed:0:04.40 user:3.42 sys:0.11
#### SHORT
Elapsed:0:05.62 user:4.77 sys:0.04
2003-10-11 22:45:35 +00:00
Mark Borgerding
911d29d139 changed from static function that wasn't inlining very well to a macro
'make test' output:
### testing SNR for  1024 point FFTs
#### DOUBLE
snr_t2f = 295.70
snr_f2t = 308.53
#### FLOAT
snr_t2f = 146.91
snr_f2t = 143.58
#### SHORT
snr_t2f = 54.677
snr_f2t = 24.668

#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:11.38 user:9.15 sys:0.24
#### FLOAT
Elapsed:0:04.18 user:3.39 sys:0.14
#### SHORT
Elapsed:0:06.03 user:4.75 sys:0.15
2003-10-11 22:41:17 +00:00
Mark Borgerding
11983e5056 used += on complex components
dramatic speedup -- 'make test' output:
### testing SNR for  1024 point FFTs
#### DOUBLE
snr_t2f = 295.63
snr_f2t = 307.82
#### FLOAT
snr_t2f = 146.25
snr_f2t = 143.37
#### SHORT
snr_t2f = 54.694
snr_f2t = 24.470

#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:16.06 user:12.72 sys:0.25
#### FLOAT
Elapsed:0:04.63 user:3.79 sys:0.13
#### SHORT
Elapsed:0:05.77 user:4.56 sys:0.07
2003-10-11 22:39:40 +00:00
Mark Borgerding
043da3b65d avoid last recursive call
'make test' output:
### testing SNR for  1024 point FFTs
#### DOUBLE
snr_t2f = 295.35
snr_f2t = 308.32
#### FLOAT
snr_t2f = 146.71
snr_f2t = 143.02
#### SHORT
snr_t2f = 54.718
snr_f2t = 24.494

#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:23.05 user:18.95 sys:0.24
#### FLOAT
Elapsed:0:06.45 user:5.17 sys:0.10
#### SHORT
Elapsed:0:05.59 user:4.72 sys:0.06
2003-10-11 14:43:13 +00:00
Mark Borgerding
7ec9402d5b Fixed point works (in the loosest sense of the word "works")
Fixed point sums are divided by 2 each stage.  This will never overflow for radix 2 ffts.
For mixed radix, it may overflow, but will usually give better SNR.

'make test' output:

### testing SNR for  1024 point FFTs
#### DOUBLE
snr_t2f = 295.30
snr_f2t = 308.25
#### FLOAT
snr_t2f = 146.92
snr_f2t = 143.25
#### SHORT
snr_t2f = 54.645
snr_f2t = 24.677

#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:25.96 user:19.77 sys:0.22
#### FLOAT
Elapsed:0:06.62 user:5.48 sys:0.11
#### SHORT
Elapsed:0:06.01 user:4.75 sys:0.12
2003-10-11 14:34:01 +00:00
Mark Borgerding
61571342a5 uses lookup table for twiddle factors
'make test' output: (Elapsed time is inflated, realplayer was running at time)

### testing SNR for  1024 point FFTs
#### DOUBLE
snr_t2f = 295.41
snr_f2t = 307.88
#### FLOAT
snr_t2f = 144.63
snr_f2t = 143.48
#### SHORT
snr_t2f = -30.111
snr_f2t = -61.637

#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:25.19 user:20.22 sys:0.30
#### FLOAT
Elapsed:0:07.16 user:6.00 sys:0.09
#### SHORT
Elapsed:0:05.89 user:4.66 sys:0.11
2003-10-11 13:38:37 +00:00
Mark Borgerding
30c4ee30f5 Dog slow, but does mixed radix!
'make test' output :

### testing SNR for  1024 point FFTs
#### DOUBLE
snr_t2f = 295.52
snr_f2t = 307.98
#### FLOAT
snr_t2f = 144.62
snr_f2t = 143.23
#### SHORT
snr_t2f = -31.515
snr_f2t = -60.836

#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:44.17 user:35.11 sys:0.27
#### FLOAT
Elapsed:0:24.22 user:19.66 sys:0.16
#### SHORT
Elapsed:0:30.39 user:25.07 sys:0.09
2003-10-11 02:21:48 +00:00
Mark Borgerding
08be1d86b4 works on Fout in-place 2003-10-10 21:30:18 +00:00
Mark Borgerding
edf93e8540 closer 2003-10-10 21:24:46 +00:00
Mark Borgerding
93de2a9410 about to try to split up k into two loops 2003-10-10 21:03:50 +00:00
Mark Borgerding
66b0646c9c *** empty log message *** 2003-10-10 02:04:59 +00:00
Mark Borgerding
18e5e8e360 failed attempt -- DOES NOT WORK! 2003-10-10 02:04:42 +00:00
Mark Borgerding
a2cca1b70e working towards mixed radix 2003-10-10 00:47:17 +00:00
Mark Borgerding
1330c4b3d4 python code for prototyping 2003-10-09 20:28:41 +00:00
Mark Borgerding
502211bc6a broken
working towards mixed radix
decomposition seems to work (for 2)
indices scrambled
2003-08-16 23:40:14 +00:00
Mark Borgerding
c9ff98b2c9 pick the peak frequency from a stereo input 2003-08-14 00:48:51 +00:00
Mark Borgerding
570f23d821 *** empty log message *** 2003-08-14 00:40:01 +00:00
Mark Borgerding
4add8dbbb6 simplified testing 2003-08-13 01:54:58 +00:00
Mark Borgerding
1cd00ce9f5 simplified testing (hopefully) 2003-08-13 01:54:21 +00:00