mirror of
				https://github.com/mborgerding/kissfft.git
				synced 2025-10-30 15:54:39 -04:00 
			
		
		
		
	
		
			
				
	
	
		
			36 lines
		
	
	
		
			1.8 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			36 lines
		
	
	
		
			1.8 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| Speed:
 | |
|     * experiment with compiler flags
 | |
|         Special thanks to Oscar Lesta. He suggested some compiler flags 
 | |
|         for gcc that make a big difference. They shave 10-15% off
 | |
|         execution time on some systems.  Try some combination of:
 | |
|                 -march=pentiumpro
 | |
|                 -ffast-math
 | |
|                 -fomit-frame-pointer
 | |
| 
 | |
|     * If the input data has no imaginary component, use the kiss_fftr code under tools/.
 | |
|       Real ffts are roughly twice as fast as complex.
 | |
| 
 | |
|     * If you can rearrange your code to do 4 FFTs in parallel and you are on a recent Intel or AMD machine,
 | |
|     then you might want to experiment with the USE_SIMD code.
 | |
|     
 | |
| 
 | |
| Reducing code size:
 | |
|     * remove some of the butterflies. There are currently butterflies optimized for radices
 | |
|         2,3,4,5.  It is worth mentioning that you can still use FFT sizes that contain 
 | |
|         these factors, they just won't be quite as fast.  You can decide for yourself 
 | |
|         whether to keep radix 2 or 4.  If you do some work in this area, let me 
 | |
|         know what you find.
 | |
| 
 | |
|     * For platforms where ROM/code space is more plentiful than RAM,
 | |
|      consider creating a hardcoded kiss_fft_state. In other words, decide which 
 | |
|      FFT size(s) you want and make a structure with the correct factors and twiddles.
 | |
| 
 | |
|     * Frank van der Hulst offered numerous suggestions for smaller code size and correct operation 
 | |
|     on embedded targets.  "I'm happy to help anyone who is trying to implement KISSFFT on a micro"
 | |
| 
 | |
|     Some of these were rolled into the mainline code base:
 | |
|         - using long casts to promote intermediate results of short*short multiplication
 | |
|         - delaying allocation of buffers that are sometimes unused.
 | |
|     In some cases, it may be desirable to limit capability in order to better suit the target:
 | |
|         - predefining the twiddle tables for the desired fft size.  
 |