[SOLVED] VHDL Based N-point FFT with fixed point package really slow, suggestions or improvements?

mhmert · Aug 17, 2023

Hi!

I'm a student which not much real-life experience in VHDL and FPGA programming but I do have some basic knowledge (I think). Anyhow, I would like to learn by asking this question.
I have an FPGA to take samples from an 10 bit ADC, so I take a 10 bit sample every t time.

Those samples are put into an sfixed type from the fixed_pkg package in the ieee_proposed library, as a "real" number.
I have created a type "complex" with two sfixed(18, -12) vectors in it. The sample is the first one and the 2nd sfixed is (others => '0') for now.

I want to apply some signal processing so I need to apply the hilbert transform to obtain the analytical signal. I do so by storing the "complex" sample in an array of 256 complex samples. Those 256 samples are submitted to an 256-point FFT, the transform is applied and then they are submitted to an 256-point IFFT.

This process is very, very slow in simulation and synthesis does take like a day or so.
- Is this to be expected?
- would this effect its performance when put onto the FPGA
- How could I improve my "code"*
- Is there a better way to achieve what I'm trying to do?

* VHDL, is it defined as programming / coding? if not, what then?

My code: (reduced to 8-point for simplicity)

(VHDL) Taking samples:

Code:

architecture Behavioral of Sampler is
begin
    process(Trigger) is
        variable var_shiftBuffer : complex_array_256 := (others => ((others => '0'), (others => '0')));
        variable var_sample : integer := 0;
    begin
        if rising_edge(Trigger) then
                 
            var_sample := 0;
            if D0 = '1' then var_sample := 1; end if;
            if D1 = '1' then var_sample := var_sample + 2; end if;
            if D2 = '1' then var_sample := var_sample + 4; end if;
            if D3 = '1' then var_sample := var_sample + 8; end if;
            if D4 = '1' then var_sample := var_sample + 16; end if;
            if D5 = '1' then var_sample := var_sample + 32; end if;
            if D6 = '1' then var_sample := var_sample + 64; end if;
            if D7 = '1' then var_sample := var_sample + 128; end if;
            if D8 = '1' then var_sample := var_sample + 256; end if;
            if D9 = '1' then var_sample := var_sample + 512; end if;
         
            for i in 0 to 254 loop
                var_shiftBuffer(i) := var_shiftBuffer(i + 1);
            end loop;

            var_shiftBuffer(255) := (to_sfixed(var_sample, 18, -12), (others => '0'));
         
            Samples <= var_shiftBuffer;
         
        end if;
    end process;

end Behavioral;

(VHDL) Hilbert (8 point)

Code:

architecture Behavioral of Hilbert8 is
    signal sig_fftOut : complex_array_8 := (others => ((others => '0'), (others => '0')));
    signal sig_ifftIn : complex_array_8 := (others => ((others => '0'), (others => '0')));
    constant const_MultTwo : complex := (to_sfixed(2, 16, -8), (others => '0'));

    component FFT8 is
        port (
            x : in complex_array_8;
            Y : out complex_array_8
        );
    end component;
 
    component IFFT8 is
        port (
            Y : in complex_array_8;
            x : out complex_array_8
        );
    end component;
 
 
begin
    --    -- Take FFT
    comp_fft : FFT8 port map (realArrayIn, sig_fftOut);

    -- -- perform Hilbert
    sig_ifftIn(0) <= sig_fftOut(0);
    sig_ifftIn(1) <= mult(sig_fftOut(1), const_MultTwo);
    sig_ifftIn(2) <= mult(sig_fftOut(2), const_MultTwo);
    sig_ifftIn(3) <= mult(sig_fftOut(3), const_MultTwo);
    sig_ifftIn(4) <= sig_fftOut(4);

 
    -- -- Take IFFT
    comp_ifft : IFFT8 port map (sig_ifftIn, complexArrayOut);

end Behavioral;

(VHDL) 8 Point FFT

Code:

architecture Behavioral of FFT8 is

    constant const_N : integer := 8;

    signal signal_1 : complex_array_8 := (others => ((others => '0'), (others => '0')));
    signal signal_2 : complex_array_8 := (others => ((others => '0'), (others => '0')));
    signal signal_3 : complex_array_8 := (others => ((others => '0'), (others => '0')));


    --    W_N^k = cos(2*pi*k/N) - j * sin(2*pi*k/N)
    constant const_W : complex_array_4 := (
        (to_sfixed(1.0, 18, -12),to_sfixed(0.0, 18, -12)),
        (to_sfixed(0.7071, 18, -12),to_sfixed(-0.7071, 18, -12)),
        (to_sfixed(0.0, 18, -12),to_sfixed(-1.0, 18, -12)),
        (to_sfixed(-0.7071, 18, -12),to_sfixed(-0.7071, 18, -12))
    );

    constant const_div : complex := (to_sfixed(0.125, 18, -12), (others => '0'));


    component butterfly is
        port(
            s1,s2 : in complex;         --inputs
            w :in complex;              -- phase factor
            g1,g2 :out complex          -- outputs
        );
    end component;
begin
    --Stage 1
    comp_bf_1_0 : butterfly port map(x(0), x(4), const_W(0), signal_2(0), signal_2(1));
    comp_bf_1_1 : butterfly port map(x(2), x(6), const_W(0), signal_2(2), signal_2(3));
    comp_bf_1_2 : butterfly port map(x(1), x(5), const_W(0), signal_2(4), signal_2(5));
    comp_bf_1_3 : butterfly port map(x(3), x(7), const_W(0), signal_2(6), signal_2(7));
    --Stage 2
    comp_bf_2_0 : butterfly port map(signal_2(0), signal_2(2), const_W(0), signal_3(0), signal_3(2));
    comp_bf_2_1 : butterfly port map(signal_2(1), signal_2(3), const_W(2), signal_3(1), signal_3(3));
    comp_bf_2_2 : butterfly port map(signal_2(4), signal_2(6), const_W(0), signal_3(4), signal_3(6));
    comp_bf_2_3 : butterfly port map(signal_2(5), signal_2(7), const_W(2), signal_3(5), signal_3(7));
    --Stage 3
    comp_bf_3_0 : butterfly port map(signal_3(0), signal_3(4), const_W(0), Y(0), Y(4));
    comp_bf_3_1 : butterfly port map(signal_3(1), signal_3(5), const_W(1), Y(1), Y(5));
    comp_bf_3_2 : butterfly port map(signal_3(2), signal_3(6), const_W(2), Y(2), Y(6));
    comp_bf_3_3 : butterfly port map(signal_3(3), signal_3(7), const_W(3), Y(3), Y(7));
end Behavioral;

(VHDL) Butterfly function

Code:

architecture Behavioral of butterfly is
begin
--butterfly equations.
g1 <= add(s1,mult(s2,w));
g2 <= sub(s1,mult(s2,w));
end Behavioral;

(VHDL) Complex functions

Code:

package fft_pkg is

    type complex is
         record
              r : sfixed(18 downto -12);
              i : sfixed(18 downto -12);
         end record;

    type complex_array_256 is array (0 to 255) of complex;
    type complex_array_128 is array (0 to 127) of complex;
    type complex_array_64 is array (0 to 63) of complex;
    type complex_array_32 is array (0 to 31) of complex;
    type complex_array_16 is array (0 to 15) of complex;
    type complex_array_8 is array (0 to 7) of complex;
    type complex_array_4 is array (0 to 3) of complex;

    function add (n1,n2 : complex) return complex;
    function sub (n1,n2 : complex) return complex;
    function mult (n1,n2 : complex) return complex;
    function fftAbs(x : complex) return integer;
    function positiveValues(x : complex) return boolean;
--    function complex_shift_right (x : complex;  n : natural) return complex;

end fft_pkg;

package body fft_pkg is

    --addition of complex numbers
    function add (n1,n2 : complex) return complex is
        variable sum : complex;

        begin
        sum.r:=resize(n1.r + n2.r, 18, -12);
        sum.i:=resize(n1.i + n2.i, 18, -12);
        return sum;
    end add;

    --subtraction of complex numbers.
    function sub (n1,n2 : complex) return complex is

        variable diff : complex;

        begin
        diff.r:=resize(n1.r - n2.r, 18, -12);
        diff.i:=resize(n1.i - n2.i, 18, -12);
        return diff;
    end sub;

    --multiplication of complex numbers.
    function mult (n1,n2 : complex) return complex is

        variable prod : complex;

        begin
        prod.r:= resize((n1.r * n2.r) - (n1.i * n2.i), 18, -12);
        prod.i:= resize((n1.r * n2.i) + (n1.i * n2.r), 18, -12);
        return prod;
    end mult;
 
--    function complex_shift_right (x : complex;  n : natural) return complex is
--    begin
--        return (SHIFT_RIGHT(x.r, n), SHIFT_RIGHT(x.i, n));
--    end complex_shift_right;
 
    --absolute values
    function fftAbs (x : complex) return integer is
        variable a : sfixed(32 downto -12);
        begin
            a := resize(x.r * x.r + x.i * x.i, 32, -12);
        return to_integer(signed(to_slv(a(32 downto 0))));
    end fftAbs;
 
    function positiveValues(x : complex) return boolean is
        begin
        if x.r > to_sfixed(0.0, 18, -12) and x.i > to_sfixed(0.0, 18, -12) then
            return true;
        else
            return false;
        end if;
    end positiveValues;

end fft_pkg;

kaz1 · Aug 17, 2023

This is typical of universities guiding or misguiding their students.
fft/ifft/hilbert are too involved for a student. Even in industry we don't design from scratch. I know universities want to teach concepts but they should direct them for industry.

Anyway, can't you use FFT ip instead of importing wholesale code that you or we can't handle?
get FFT ip core set to 8 bits. and you are left with some practical steps to complete.
Pass input to FFT ip then for Hilbert set all negative frequencies to zero then pass through iFFT.
The iFFT can use FFT core by just swapping Re/Im at inputs and swapping back at output.

mhmert · Aug 17, 2023

kaz1 said:
This is typical of universities guiding or misguiding their students.
fft/ifft/hilbert are too involved for a student. Even in industry we don't design from scratch. I know universities want to teach concepts but they should direct them for industry.

Anyway, can't you use FFT ip instead of importing wholesale code that you or we can't handle?
get FFT ip core set to 8 bits. and you are left with some practical steps to complete.
Pass input to FFT ip then for Hilbert set all negative frequencies to zero then pass through iFFT.
The iFFT can use FFT core by just swapping Re/Im at inputs and swapping back at output.

Couldn't agree more, although it was useful for me to learn how that actually works in a practical manner.

But I'd love to use "ip" although I'm not sure what it actually means? But if I would, would they work with the method I'm using for complex numbers, and 256 of them? would you have a link to an example of such an ip?

Thanks for your help anyway!

kaz1 · Aug 17, 2023

mhmert said:
Couldn't agree more, although it was useful for me to learn how that actually works in a practical manner.

But I'd love to use "ip" although I'm not sure what it actually means? But if I would, would they work with the method I'm using for complex numbers, and 256 of them? would you have a link to an example of such an ip?

Thanks for your help anyway!

FFT ip cores come with vivado or quartus license and may also be free open core available around. I assume vendor core is designed as FFT butterfly but the code is encrypted.
You instantiate it as any component. read the data sheet and it will be clear. It directly accepts Re/Im inputs plus some start, valid...etc signals for framing.

mhmert · Aug 20, 2023

Hi, could you help me out a bit here. I'm using ISE up till now because my device is a nexys 2. For your suggestion, would I need vivado or can I still use ISE?

Since nexys2 is not supported in vivado.
If I can use ISE, could you give me a little pointer in how to apply ip in my solution? Not that familiar with the software at all.

Thanks, I appreciate your help very much!

kaz1 · Aug 20, 2023

mhmert said:
Hi, could you help me out a bit here. I'm using ISE up till now because my device is a nexys 2. For your suggestion, would I need vivado or can I still use ISE?

Since nexys2 is not supported in vivado.
If I can use ISE, could you give me a little pointer in how to apply ip in my solution? Not that familiar with the software at all.

Thanks, I appreciate your help very much!

ISE supports FFT (assuming ip is licensed). This link might help:

Intro Software Walkthrough: Fast Fourier Transforms and the Xilinx FFT IP Core - Technical Articles

This article will explain some of the most important settings and design parameters for the Xilinx FFT IP core and function as a basic walkthrough of the Fast Fourier Transform interface.

www.allaboutcircuits.com

mhmert · Aug 21, 2023

Thanks, got me started!

What I am wondering; I'm using fixed_pkg to have fractional numbers like 1.5 and 2.34. How can I use that in the ip core?

In the fixed pkg I use sfixed(18, -12) leaving a sort of vector(?) with a 31 bit length. Should I transform the sfixed to an std_logic_vector and back after the FFT? How should I interpret those types?

FvM · Aug 21, 2023

Spartan 3E 500k used on Nexys 2 is a rather small FPGA (providing about 10k logic cells and 20 dedicated multipliers). I doubt that it's sufficient to implement the FFT.

kaz1 · Aug 21, 2023

mhmert said:
Thanks, got me started!

What I am wondering; I'm using fixed_pkg to have fractional numbers like 1.5 and 2.34. How can I use that in the ip core?

In the fixed pkg I use sfixed(18, -12) leaving a sort of vector(?) with a 31 bit length. Should I transform the sfixed to an std_logic_vector and back after the FFT? How should I interpret those types?

I will use std_logic only. wire up the 10 bits from ADC (real only) to FFT, set Im input to zeros.
decide pulses for frame start and valid per FFT frame.
Your FFT will be set to 10 bits inputs, 256 frame size.
You can choose a scaling but the easiest would be fixed point rather than BFP. truncate output to 10 bits or so. Set all negative bins to zero. pass through iFFT.
If you clock is fast enough you can use one FFT block for both FFT & iFFT sharing in time.

mhmert · Aug 21, 2023

kaz1 said:
I will use std_logic only. wire up the 10 bits from ADC (real only) to FFT, set Im input to zeros.
decide pulses for frame start and valid per FFT frame.
Your FFT will be set to 10 bits inputs, 256 frame size.
You can choose a scaling but the easiest would be fixed point rather than BFP. truncate output to 10 bits or so. Set all negative bins to zero. pass through iFFT.
If you clock is fast enough you can use one FFT block for both FFT & iFFT sharing in time.

Clear, thanks! I got stuff working.

I do, however, have a few things I'm not quite sure about atm.

I'm using (for testing purposes) 8 point FFT, version 7.1, unscaled and 10 bit input.

when having the MSB at 1, it means the FFT Takes it as signed -> negative so I think I'll have to pad the input to 11 bits with a MSB of 0?

I might apply the scaling later when doing Hilbert but for testing and understanding purposes now I am doing 8 point fft first.

Regarding my prior question I pad the input with the MSB being zero and then it seems to work way better. However.

Given my input:

Code:

import numpy as np
x = [511, 1013, 310, 94, 890, 762, 20, 488]
[print(i) for i in np.fft.fft(x)]

I expect (obtained from python)

0: (4088+0j)
1: (77.0838738653232-188.88373029032368j)
2: (1071-1193j)
3: (-835.0838738653232+391.1162697096763j)
4: (-626+0j)
5: (-835.0838738653232-391.1162697096763j)
6: (1071+1193j)
7: (77.0838738653232+188.88373029032368j)

But from the simulation I get:

0: (4085+0j)
1: (-80-187j)
2: (-1193-1072j)
3: (865+313j)
4: (625+0j)
5: (866-313j)
6: (-1193+1072j)
7: (-79+187j)

In almost all, the signs are incorrect and in 2 and 6, somehow the two numbers seem to have switched around.
Is there any logical reason for this or, should I not worry about that when applying Hilbert? Why (not)?

kaz1 · Aug 21, 2023

FFT takes in fixed point signed so just wire 10 bits as is. No need to change MSB. But map bits correctly as xilinx fft has got some bizarre obsolete MSBs

Your results are way wrong.
You should get same as standard FFT but with minor differences due to ip internal truncation.

mhmert · Aug 21, 2023

I feel sorry that I keep asking, but as for now this is my only method and I hope you'll be willing to help.
I don't think I'm that far off but now I'm wondering then what might be causing the wrong results.

I've included three screenshots in which I show my fft ip core setup and also my VHDL code that actually runs the simulation with this FFT.

Code:

----------------------------------------------------------------------------------
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use ieee.numeric_std.all;
use IEEE.STD_LOGIC_UNSIGNED.ALL;


entity Test is
    Port ( Clk : in  STD_LOGIC );
end Test;

architecture Behavioral of Test is

        signal sig_val0 : std_logic_vector(10 downto 0) := "00111111111"; --511
        signal sig_val1 : std_logic_vector(10 downto 0) := "01111110100";    --1012
        signal sig_val2 : std_logic_vector(10 downto 0) := "00100110101"; --309
        signal sig_val3 : std_logic_vector(10 downto 0) := "00001011110"; --94
        signal sig_val4 : std_logic_vector(10 downto 0) := "01101111010"; --890
        signal sig_val5 : std_logic_vector(10 downto 0) := "01011111010"; --762
        signal sig_val6 : std_logic_vector(10 downto 0) := "00000010100"; --20
        signal sig_val7 : std_logic_vector(10 downto 0) := "00111100111"; --487
    
        COMPONENT FFT8 is
            port(
                clk : in STD_LOGIC := 'X';
                start : in STD_LOGIC := 'X';
                fwd_inv : in STD_LOGIC := 'X';
                fwd_inv_we : in STD_LOGIC := 'X';
                scale_sch_we : in STD_LOGIC := 'X';
                rfd : out STD_LOGIC;
                busy : out STD_LOGIC;
                edone : out STD_LOGIC;
                done : out STD_LOGIC;
                dv : out STD_LOGIC;
                xn_re : in STD_LOGIC_VECTOR ( 10 downto 0 );
                xn_im : in STD_LOGIC_VECTOR ( 10 downto 0 );
                scale_sch : in STD_LOGIC_VECTOR ( 3 downto 0 );
                xn_index : out STD_LOGIC_VECTOR ( 2 downto 0 );
                xk_index : out STD_LOGIC_VECTOR ( 2 downto 0 );
                xk_re : out STD_LOGIC_VECTOR ( 10 downto 0 );
                xk_im : out STD_LOGIC_VECTOR ( 10 downto 0 )
            );
        end component;
        
        signal sig_i_start, sig_i_scale_sch_we : std_logic := '0';
        signal sig_o_rfd, sig_o_busy, sig_o_edone, sig_o_done, sig_o_dv : std_logic := '0';
        
        signal sig_i_xn_re, sig_i_xn_im : std_logic_vector(10 downto 0) := (others => '0');
        signal sig_i_scale_sch : std_logic_vector( 3 downto 0 ) := (others => '0');
        
        signal sig_o_xn_index, sig_o_xk_index : std_logic_vector(2 downto 0) := (others => '0');
        signal sig_o_re, sig_o_im : std_logic_vector(10 downto 0) := (others => '0');
        
begin

    comp_fft8 : FFT8 port map (
        Clk,
        '1',
        '1',
        '1',
        sig_i_scale_sch_we,
        sig_o_rfd,
        sig_o_busy,
        sig_o_edone,
        sig_o_done,
        sig_o_dv,
        sig_i_xn_re,
        sig_i_xn_im,
        sig_i_scale_sch,
        sig_o_xn_index,
        sig_o_xk_index,
        sig_o_re,
        sig_o_im
    );
    
    process(Clk) is
        variable var_counter : integer := 0;
    begin
        if rising_edge(Clk) then
            sig_i_start <= '1';
            
            if sig_o_rfd = '1' then
                if var_counter = 0 then
                    sig_i_xn_re <= sig_val0;
                elsif var_counter = 1 then
                    sig_i_xn_re <= sig_val1;
                elsif var_counter = 2 then
                    sig_i_xn_re <= sig_val2;
                elsif var_counter = 3 then
                    sig_i_xn_re <= sig_val3;
                elsif var_counter = 4 then
                    sig_i_xn_re <= sig_val4;
                elsif var_counter = 5 then
                    sig_i_xn_re <= sig_val5;
                elsif var_counter = 6 then
                    sig_i_xn_re <= sig_val6;
                elsif var_counter = 7 then
                    sig_i_xn_re <= sig_val7;
                end if;
                var_counter := var_counter + 1;
            end if;
            if var_counter > 7 then
                var_counter := 0;
            end if;           
        end if;
    end process;
end Behavioral;

kaz1 · Aug 21, 2023

Make sure your setup is correct:
input: 10 bits (you set it to 11, for now it is ok)
output: select unscaled
make sure your data starts when fft is ready (it seems you start too early)

mhmert · Aug 22, 2023

I got it solved. I had my offset incorrect. By default it was set to zero in the settings but in my VHDL I could never get it at zero, so in the settings I changed it into 3 and I matched it to 3 in VHDL as well. Now it is correct. Thanks for your help setting this up!

Welcome to EDAboard.com

[SOLVED] VHDL Based N-point FFT with fixed point package really slow, suggestions or improvements?

mhmert

Newbie level 4

mhmert

kaz1

Full Member level 6

mhmert

Newbie level 4

kaz1

Full Member level 6

mhmert

Newbie level 4

kaz1

Full Member level 6

Intro Software Walkthrough: Fast Fourier Transforms and the Xilinx FFT IP Core - Technical Articles

mhmert

Newbie level 4

FvM

Super Moderator

kaz1

Full Member level 6

mhmert

Newbie level 4

kaz1

Full Member level 6

mhmert

Newbie level 4

kaz1

Full Member level 6

mhmert

Newbie level 4

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics