Mark Borgerding
|
3a5791f203
|
slightly faster
|
2003-11-07 03:23:19 +00:00 |
|
Mark Borgerding
|
1486d89291
|
faster
|
2003-11-07 03:04:52 +00:00 |
|
Mark Borgerding
|
e9095a161c
|
generic butterfly slightly slower -- hmmm
|
2003-11-07 02:39:49 +00:00 |
|
Mark Borgerding
|
a296b09dbf
|
wrong alloc size
|
2003-11-07 01:06:44 +00:00 |
|
Mark Borgerding
|
cb5312efdc
|
2d fft seems to work
|
2003-11-06 03:59:31 +00:00 |
|
Mark Borgerding
|
4c458be5e9
|
checkpoint -- I don't think I've broken anything (yet) adding 2d fft.
|
2003-11-04 23:25:49 +00:00 |
|
Mark Borgerding
|
ee3094a0e4
|
benchmark utilities
|
2003-11-04 02:11:00 +00:00 |
|
Mark Borgerding
|
4ebf0b5aca
|
aded a CHANGELOG
|
2003-11-04 02:09:53 +00:00 |
|
Mark Borgerding
|
2788fba0bd
|
added a CHANGELOG
|
2003-11-04 02:09:48 +00:00 |
|
Mark Borgerding
|
8b4e3bacca
|
minor comments and added some primes
|
2003-11-04 02:00:01 +00:00 |
|
Mark Borgerding
|
6c8049cc75
|
slight changes to Makefile
|
2003-11-04 01:01:37 +00:00 |
|
Mark Borgerding
|
7b4de0aa11
|
a little faster
|
2003-11-03 04:30:50 +00:00 |
|
Mark Borgerding
|
ad4ee571aa
|
faster radix5
|
2003-11-03 04:04:01 +00:00 |
|
Mark Borgerding
|
0403fb3e4a
|
radix 5 a little optimized
|
2003-11-03 03:48:34 +00:00 |
|
Mark Borgerding
|
3c0c0431e2
|
radix 5 works, but is 6x slower than fftw
|
2003-11-03 03:03:16 +00:00 |
|
Mark Borgerding
|
85764e6437
|
radix 5 doesn't work, but I thik it should.
just a checkpoint commit
|
2003-11-01 16:48:33 +00:00 |
|
Mark Borgerding
|
8ac63adc77
|
modified time benchmark to repeat same buffer over and over to avoid IO bottlenecks and get more consistent numbers.
|
2003-11-01 04:44:50 +00:00 |
|
Mark Borgerding
|
471803ca08
|
removed unused macro
|
2003-11-01 04:26:02 +00:00 |
|
Mark Borgerding
|
7b7aefe7c4
|
moved scratch buffer to stack variable
|
2003-11-01 03:59:43 +00:00 |
|
Mark Borgerding
|
28551899e2
|
radix 4 faster
|
2003-11-01 03:49:53 +00:00 |
|
Mark Borgerding
|
d1df249536
|
radix3 fixed point now works
|
2003-10-31 04:01:09 +00:00 |
|
Mark Borgerding
|
b1969544a6
|
radix 3 still doesn't work for fixed
|
2003-10-30 03:00:49 +00:00 |
|
Mark Borgerding
|
d4f87befda
|
re-added radix 3 butterfly
|
2003-10-30 02:02:29 +00:00 |
|
Mark Borgerding
|
ca4c74e07c
|
Woops, one should not test with input of all zeros
|
2003-10-29 04:29:01 +00:00 |
|
Mark Borgerding
|
97b18f3fef
|
comments
|
2003-10-27 04:02:11 +00:00 |
|
Mark Borgerding
|
d9fcda04b6
|
version 0.2 upload to sf
|
2003-10-26 19:29:36 +00:00 |
|
Mark Borgerding
|
ecb1a76974
|
added zip creation to tarball make target
|
2003-10-26 04:25:18 +00:00 |
|
Mark Borgerding
|
1db3d91ee5
|
getting ready for next release
|
2003-10-26 04:07:32 +00:00 |
|
Mark Borgerding
|
52b4b9ab5c
|
*** empty log message ***
|
2003-10-18 01:45:26 +00:00 |
|
Mark Borgerding
|
c239ba2c1c
|
slight code cleanup, comments
|
2003-10-18 01:39:36 +00:00 |
|
Mark Borgerding
|
bca7fd5151
|
compiles with -ansi -pedantic
|
2003-10-18 01:23:34 +00:00 |
|
Mark Borgerding
|
e2470b3a03
|
*** empty log message ***
|
2003-10-18 00:33:38 +00:00 |
|
Mark Borgerding
|
a3d3217ae6
|
*** empty log message ***
|
2003-10-18 00:32:54 +00:00 |
|
Mark Borgerding
|
6f8bcedc24
|
radix 3 fixed point still broken
|
2003-10-17 02:59:32 +00:00 |
|
Mark Borgerding
|
31d4214f44
|
radix 3 seems to be pretty fast
fixed point broken for some reason
|
2003-10-17 02:34:22 +00:00 |
|
Mark Borgerding
|
73744b908c
|
check point
fixed does not currently work for radix 3
|
2003-10-17 01:26:14 +00:00 |
|
Mark Borgerding
|
317f11e66e
|
starting point for radix 3
'make test' output
### testing SNR for 2187 point FFTs
#### DOUBLE
snr_t2f = 292.51
snr_f2t = 304.97
#### FLOAT
snr_t2f = 143.46
snr_f2t = 138.03
#### SHORT
snr_t2f = 49.257
snr_f2t = 16.294
#### timing 10000 x 2187 point FFTs
#### DOUBLE
Elapsed:0:05.05 user:3.60 sys:0.54
#### FLOAT
Elapsed:0:02.41 user:1.85 sys:0.23
#### SHORT
Elapsed:0:04.02 user:3.13 sys:0.08
|
2003-10-17 00:11:19 +00:00 |
|
Mark Borgerding
|
d6ae498630
|
took the bitwise and out of the switch case -- may have prevented optimization
|
2003-10-15 03:45:24 +00:00 |
|
Mark Borgerding
|
5f0efe8f17
|
pretty happy with radix 2 and radix 4
next up is radix 3, or maybe 5
|
2003-10-15 03:38:05 +00:00 |
|
Mark Borgerding
|
9504aa79c1
|
Fixed generic mixed radix butterfly
'make test' output:
### testing SNR for 1024 point FFTs
#### DOUBLE
snr_t2f = 296.95
snr_f2t = 317.25
#### FLOAT
snr_t2f = 147.96
snr_f2t = 145.14
#### SHORT
snr_t2f = 52.414
snr_f2t = 22.438
#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:03.56 user:2.63 sys:0.19
#### FLOAT
Elapsed:0:01.35 user:1.07 sys:0.10
#### SHORT
Elapsed:0:01.70 user:1.37 sys:0.06
|
2003-10-15 02:52:34 +00:00 |
|
Mark Borgerding
|
0424734e9d
|
radix 4 now about as fast as original version
'make test' output:
### testing SNR for 1024 point FFTs
#### DOUBLE
snr_t2f = 296.78
snr_f2t = 317.11
#### FLOAT
snr_t2f = 145.28
snr_f2t = 143.51
#### SHORT
snr_t2f = 52.409
snr_f2t = 22.174
#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:03.43 user:2.68 sys:0.25
#### FLOAT
Elapsed:0:01.39 user:1.08 sys:0.11
#### SHORT
Elapsed:0:02.01 user:1.39 sys:0.09
|
2003-10-15 01:52:13 +00:00 |
|
Mark Borgerding
|
f609401471
|
about to make some changes -- just wanted a checkpoint
|
2003-10-15 00:05:50 +00:00 |
|
Mark Borgerding
|
2ae7e0f1f2
|
radix 4 works but slow
|
2003-10-14 02:47:25 +00:00 |
|
Mark Borgerding
|
6b76490456
|
Fixed point works
'make test' output:
### testing SNR for 1024 point FFTs
#### DOUBLE
snr_t2f = 296.51
snr_f2t = 315.25
#### FLOAT
snr_t2f = 146.39
snr_f2t = 142.86
#### SHORT
snr_t2f = 58.077
snr_f2t = 27.897
#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:01.53 user:1.06 sys:0.26
#### FLOAT
Elapsed:0:01.29 user:0.98 sys:0.12
#### SHORT
Elapsed:0:02.08 user:1.65 sys:0.03
|
2003-10-14 01:09:33 +00:00 |
|
Mark Borgerding
|
8460f1f8f5
|
added optimization for radix 2
### testing SNR for 1024 point FFTs
#### DOUBLE
snr_t2f = 296.29
snr_f2t = 314.48
#### FLOAT
snr_t2f = 146.48
snr_f2t = 143.03
#### SHORT
snr_t2f = -30.269
snr_f2t = -60.442
#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:02.77 user:2.22 sys:0.13
#### FLOAT
Elapsed:0:01.65 user:1.35 sys:0.07
#### SHORT
Elapsed:0:02.44 user:2.00 sys:0.06
|
2003-10-14 00:38:58 +00:00 |
|
Mark Borgerding
|
0d6d61cfce
|
reduced calling parameters
negligible performane impact
|
2003-10-11 23:07:16 +00:00 |
|
Mark Borgerding
|
0d44569b3b
|
made one single malloc for all buffers
no noticable performance gain
|
2003-10-11 23:00:12 +00:00 |
|
Mark Borgerding
|
f93a0258df
|
Simplified some inner loop calcs
'make test' output:
### testing SNR for 1024 point FFTs
#### DOUBLE
snr_t2f = 295.34
snr_f2t = 308.77
#### FLOAT
snr_t2f = 146.93
snr_f2t = 143.56
#### SHORT
snr_t2f = 54.799
snr_f2t = 24.562
#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:10.69 user:8.71 sys:0.20
#### FLOAT
Elapsed:0:04.40 user:3.42 sys:0.11
#### SHORT
Elapsed:0:05.62 user:4.77 sys:0.04
|
2003-10-11 22:45:35 +00:00 |
|
Mark Borgerding
|
911d29d139
|
changed from static function that wasn't inlining very well to a macro
'make test' output:
### testing SNR for 1024 point FFTs
#### DOUBLE
snr_t2f = 295.70
snr_f2t = 308.53
#### FLOAT
snr_t2f = 146.91
snr_f2t = 143.58
#### SHORT
snr_t2f = 54.677
snr_f2t = 24.668
#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:11.38 user:9.15 sys:0.24
#### FLOAT
Elapsed:0:04.18 user:3.39 sys:0.14
#### SHORT
Elapsed:0:06.03 user:4.75 sys:0.15
|
2003-10-11 22:41:17 +00:00 |
|
Mark Borgerding
|
11983e5056
|
used += on complex components
dramatic speedup -- 'make test' output:
### testing SNR for 1024 point FFTs
#### DOUBLE
snr_t2f = 295.63
snr_f2t = 307.82
#### FLOAT
snr_t2f = 146.25
snr_f2t = 143.37
#### SHORT
snr_t2f = 54.694
snr_f2t = 24.470
#### timing 10000 x 1024 point FFTs
#### DOUBLE
Elapsed:0:16.06 user:12.72 sys:0.25
#### FLOAT
Elapsed:0:04.63 user:3.79 sys:0.13
#### SHORT
Elapsed:0:05.77 user:4.56 sys:0.07
|
2003-10-11 22:39:40 +00:00 |
|