mirror of
				https://github.com/mborgerding/kissfft.git
				synced 2025-11-04 09:15:15 -05:00 
			
		
		
		
	
		
			
				
	
	
		
			36 lines
		
	
	
		
			1.8 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			36 lines
		
	
	
		
			1.8 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
Speed:
 | 
						|
    * experiment with compiler flags
 | 
						|
        Special thanks to Oscar Lesta. He suggested some compiler flags 
 | 
						|
        for gcc that make a big difference. They shave 10-15% off
 | 
						|
        execution time on some systems.  Try some combination of:
 | 
						|
                -march=pentiumpro
 | 
						|
                -ffast-math
 | 
						|
                -fomit-frame-pointer
 | 
						|
 | 
						|
    * If the input data has no imaginary component, use the kiss_fftr code under tools/.
 | 
						|
      Real ffts are roughly twice as fast as complex.
 | 
						|
 | 
						|
    * If you can rearrange your code to do 4 FFTs in parallel and you are on a recent Intel or AMD machine,
 | 
						|
    then you might want to experiment with the USE_SIMD code.
 | 
						|
    
 | 
						|
 | 
						|
Reducing code size:
 | 
						|
    * remove some of the butterflies. There are currently butterflies optimized for radices
 | 
						|
        2,3,4,5.  It is worth mentioning that you can still use FFT sizes that contain 
 | 
						|
        these factors, they just won't be quite as fast.  You can decide for yourself 
 | 
						|
        whether to keep radix 2 or 4.  If you do some work in this area, let me 
 | 
						|
        know what you find.
 | 
						|
 | 
						|
    * For platforms where ROM/code space is more plentiful than RAM,
 | 
						|
     consider creating a hardcoded kiss_fft_state. In other words, decide which 
 | 
						|
     FFT size(s) you want and make a structure with the correct factors and twiddles.
 | 
						|
 | 
						|
    * Frank van der Hulst offered numerous suggestions for smaller code size and correct operation 
 | 
						|
    on embedded targets.  "I'm happy to help anyone who is trying to implement KISSFFT on a micro"
 | 
						|
 | 
						|
    Some of these were rolled into the mainline code base:
 | 
						|
        - using long casts to promote intermediate results of short*short multiplication
 | 
						|
        - delaying allocation of buffers that are sometimes unused.
 | 
						|
    In some cases, it may be desirable to limit capability in order to better suit the target:
 | 
						|
        - predefining the twiddle tables for the desired fft size.  
 |