+ Post New Thread
Results 1 to 13 of 13
  1. #1
    Member level 3
    Points: 788, Level: 6

    Join Date
    May 2015
    Posts
    59
    Helped
    0 / 0
    Points
    788
    Level
    6

    how to locate the addresses of 1's in std_logic_vector

    I have a std_logic_vector signal, namely A of 2048 bits, 2047 downto 0, I want to locate the positions (addresses) of '1' bits in the vector. These addresses need to be stored into an array B in ascending order i.e. address 0 should contain the address of first 1 encountered as we go 0 to 2047 of signal A. But this whole process should take me as minimum clock cycles as possible.

    •   Alt9th September 2017, 05:19

      advertising

        
       

  2. #2
    Advanced Member level 3
    Points: 4,711, Level: 16

    Join Date
    Feb 2015
    Posts
    781
    Helped
    230 / 230
    Points
    4,711
    Level
    16

    Re: how to locate the addresses of 1's in std_logic_vector

    One solution is to have a shift register that allows 4 (or more) bits to be examined per cycle. this allows 4 compares/writes per cycle, finishing the input in 512 cycles in all cases. 4 is chosen as each BRAM in modern FPGAs has 2 ports each and this probably uses 2 BRAM, so 4 writes per cycle.

    If average case performance is needed, and the input is sparse, other methods that are more aggressive can be used.


    1 members found this post helpful.

  3. #3
    Advanced Member level 5
    Points: 34,868, Level: 45
    Achievements:
    7 years registered

    Join Date
    Jun 2010
    Posts
    6,388
    Helped
    1861 / 1861
    Points
    34,868
    Level
    45

    Re: how to locate the addresses of 1's in std_logic_vector

    Quote Originally Posted by vGoodtimes View Post
    One solution is to have a shift register that allows 4 (or more) bits to be examined per cycle. this allows 4 compares/writes per cycle, finishing the input in 512 cycles in all cases. 4 is chosen as each BRAM in modern FPGAs has 2 ports each and this probably uses 2 BRAM, so 4 writes per cycle.

    If average case performance is needed, and the input is sparse, other methods that are more aggressive can be used.
    While your solution probably works nicely at some very high clock rate, The OP originally asked for as few clocks as possible.
    It would be quite easy just to do it in a single clock cycle. But with a very very slow clock (maybe 2MHz if you're lucky), and probably slower with much higher resources than your suggestion, but it didnt meet the OPs requirements.

    Code:
    process(clk)
      variable b_addr : natural;
    begin
      if rising_edge(clk) then
        b_addr := 0;
        for i in 0 to 2047 loop
          if ip(i) = '1' then
            addresses(b_addr) <= i;
            b_addr := b_addr + 1;
          end if;
        end loop;
      end if;
    end process;
    Coded in a single clock, as the OP asked.


    1 members found this post helpful.

  4. #4
    Advanced Member level 3
    Points: 4,711, Level: 16

    Join Date
    Feb 2015
    Posts
    781
    Helped
    230 / 230
    Points
    4,711
    Level
    16

    Re: how to locate the addresses of 1's in std_logic_vector

    @TrickyDicky:
    Every time you make a post about HW design not being the same as SW design, I will post a link to this thread.



  5. #5
    Advanced Member level 5
    Points: 34,868, Level: 45
    Achievements:
    7 years registered

    Join Date
    Jun 2010
    Posts
    6,388
    Helped
    1861 / 1861
    Points
    34,868
    Level
    45

    Re: how to locate the addresses of 1's in std_logic_vector

    Quote Originally Posted by vGoodtimes View Post
    @TrickyDicky:
    Every time you make a post about HW design not being the same as SW design, I will post a link to this thread.
    Sometimes it can be like software design :D



  6. #6
    Member level 3
    Points: 788, Level: 6

    Join Date
    May 2015
    Posts
    59
    Helped
    0 / 0
    Points
    788
    Level
    6

    Re: how to locate the addresses of 1's in std_logic_vector

    TrickyDicky... It would be quite easy just to do it in a single clock cycle. But with a very very slow clock (maybe 2MHz if you're lucky), and probably slower with much higher resources than your suggestion, but it didnt meet the OPs requirements.
    I tried to implement a small code as per your recommendations
    Code:
    ----------------------------------------------------------------------------------
    -- Company: 
    -- Engineer: 
    -- 
    -- Create Date:    23:11:28 09/09/2017 
    -- Design Name: 
    -- Module Name:    clk_check - Behavioral 
    -- Project Name: 
    -- Target Devices: 
    -- Tool versions: 
    -- Description: 
    --
    -- Dependencies: 
    --
    -- Revision: 
    -- Revision 0.01 - File Created
    -- Additional Comments: 
    --
    ----------------------------------------------------------------------------------
    library IEEE;
    use IEEE.STD_LOGIC_1164.ALL;
    
    -- Uncomment the following library declaration if using
    -- arithmetic functions with Signed or Unsigned values
    use IEEE.NUMERIC_STD.ALL;
    
    -- Uncomment the following library declaration if instantiating
    -- any Xilinx primitives in this code.
    --library UNISIM;
    --use UNISIM.VComponents.all;
    
    entity clk_check is
        Port ( a : in  STD_LOGIC_VECTOR (3 downto 0);
               clk : in  STD_LOGIC;
               y : out  STD_LOGIC_VECTOR (3 downto 0));
    end clk_check;
    
    architecture Behavioral of clk_check is
    signal ip : STD_LOGIC_VECTOR (15 downto 0):= (OTHERS => '0');
    TYPE ram_type IS ARRAY (15 DOWNTO 0) OF STD_LOGIC_VECTOR (3 downto 0);
    SIGNAL addresses: ram_type;
    begin
    process(clk)
      variable b_addr : INTEGER RANGE 0 TO 15;
      variable ip_temp : STD_LOGIC_VECTOR(15 downto 0):= (OTHERS => '0');
    begin
      if rising_edge(clk) then
    	ip_temp := ip_temp(11 downto 0) & a;
    	ip <= ip_temp;
        b_addr := 0;
        for i in 0 to 15 loop
          if ip(i) = '1' then
            addresses(b_addr) <= std_logic_vector(to_unsigned(i,4));
            b_addr := b_addr + 1;
          end if;
    		y <= addresses(b_addr);
        end loop;
    	 ip_temp := ip;
      end if;
    end process;
    
    end Behavioral;
    When I try to synthesis, it slows down my system so much that I can not do anything. This has never been a case for last two years of my VHDL coding. Also the synthesis is never completed. Please let me know, where am I making the mistake?



    •   Alt10th September 2017, 05:04

      advertising

        
       

  7. #7
    Advanced Member level 3
    Points: 4,711, Level: 16

    Join Date
    Feb 2015
    Posts
    781
    Helped
    230 / 230
    Points
    4,711
    Level
    16

    Re: how to locate the addresses of 1's in std_logic_vector

    TD's post was (hopefully) sarcasm. A single cycle design would be very large. Larger than any existing FPGA. Possibly your 16 element version is possible.

    Do you have any assumptions or requirements or targets for resources, clock rates, latency, and area?

    A sane design will take inputs as slow as required and generate outputs as slow as required and store data as little as possible. It might be possible to reduce the size of the input bus -- process fewer elements over more cycles. You might not need an actual array for the output if the outputs can be processed as they are generated. A complex algorithm might not matter if the outputs can't be consumed fast enough.


    1 members found this post helpful.

  8. #8
    Advanced Member level 5
    Points: 34,868, Level: 45
    Achievements:
    7 years registered

    Join Date
    Jun 2010
    Posts
    6,388
    Helped
    1861 / 1861
    Points
    34,868
    Level
    45

    Re: how to locate the addresses of 1's in std_logic_vector

    Quote Originally Posted by rafimiet View Post
    When I try to synthesis, it slows down my system so much that I can not do anything. This has never been a case for last two years of my VHDL coding. Also the synthesis is never completed. Please let me know, where am I making the mistake?
    Yes. Its not a surprise - I suggested it would be very slow, use a lot of resources and be practically useless. I was pointing out the weakness in your original question - my answer met your design spec.
    What are your constraints? what device are you using? what clocks are available? what have you attempted so far (other than this terrible design) and what issues need fixing?
    Without proper specifications, it is all to easy to miss the real problem.



  9. #9
    Member level 3
    Points: 788, Level: 6

    Join Date
    May 2015
    Posts
    59
    Helped
    0 / 0
    Points
    788
    Level
    6

    Re: how to locate the addresses of 1's in std_logic_vector

    @TrickyDicky... What are your constraints? what device are you using? what clocks are available? what have you attempted so far (other than this terrible design) and what issues need fixing?
    I have a vector A(256*256-1 downto 0) that gives me the address to a block of pixels in an image. Starting from LSB to the MSB, when I encounter a '1' bit, I have to move to the next state in the Finite State Machine to process that block of pixels. If I take one bit at a time, it means I have to use 256*256 clock cycles. Which can be even more clock cycles for bigger images (eg. 512*512). Also I know that initially, we will be having only a small number of bits = '1'. With more iterations, the number of '1' bits will increase. All I have to do is to locate the next '1' bit and skip '0' bits in between. Can we do that? Can we locate next '1' bits starting from LSB bit? This can save me a lot of clock cycles.



  10. #10
    Advanced Member level 5
    Points: 34,868, Level: 45
    Achievements:
    7 years registered

    Join Date
    Jun 2010
    Posts
    6,388
    Helped
    1861 / 1861
    Points
    34,868
    Level
    45

    Re: how to locate the addresses of 1's in std_logic_vector

    Why do you have to do it in so few clock cycles? where is the image coming from? is it a live video?
    I dont really understand the need for such low latency. Usually, if it is a video, a 1 frame latency is usually acceptable. And given the pixel rate even at HD is 75Mhz, you can easilly have an on board clock of 3x the pixel rate to given you plety of cycles for image processing.

    So, whats the source? whats the goal? why the need for such a rediculously small latency on such a large image? Why cant it be pipelined?


    1 members found this post helpful.

  11. #11
    Member level 3
    Points: 788, Level: 6

    Join Date
    May 2015
    Posts
    59
    Helped
    0 / 0
    Points
    788
    Level
    6

    Re: how to locate the addresses of 1's in std_logic_vector

    where is the image coming from? is it a live video?
    The source is a still/video camera. It will add up to my work if we can process more than 30 frames a second.
    whats the goal? why the need for such a rediculously small latency on such a large image?
    After acquiring the image, I have to apply 5 levels of Discrete wavelet transform, then around levels for bit-plane coding and then as many levels of decoding and inverse transformation. So it is a full-fledged algorithm. Actually I myself was becoming a bit more cautious for reducing the latency. Because, as I have experienced, it is much easier to code something than debugging the code or trying to improve it once it is completed. I have got all the answers and options I have.



  12. #12
    Advanced Member level 5
    Points: 34,868, Level: 45
    Achievements:
    7 years registered

    Join Date
    Jun 2010
    Posts
    6,388
    Helped
    1861 / 1861
    Points
    34,868
    Level
    45

    Re: how to locate the addresses of 1's in std_logic_vector

    In that case, why are you trying to process the entire image in one go? Why not just process it at the pixel level as each pixel arrives? You should easily be able to do this.

    Don't think about processing the whole image like a software program. Think of it in terms of pixel processing. Processing video in real time is possible all the way up to 4k images.



    •   Alt10th September 2017, 16:52

      advertising

        
       

  13. #13
    Advanced Member level 3
    Points: 4,711, Level: 16

    Join Date
    Feb 2015
    Posts
    781
    Helped
    230 / 230
    Points
    4,711
    Level
    16

    Re: how to locate the addresses of 1's in std_logic_vector

    A priority encoder is an applicable circuit that can work for your application. You can scan blocks of 32 pixels at a time to get the location of the next 1. You can use the bit trick "x <= x & ~(-x & x)" to unset the lsb.

    There are different ways to generate a priority encoder. One version sets the encoded lsb to |((-x & x) & 0xAAAAAAAA), the next msb to |((-x & x) & 0xCCCCCCCC), then next to |((-x & x) & 0xF0F0F0F0), next |((-x & x) & 0xFF00FF00), finally |((-x & x) & 0xFFFF0000). This should run at decently high rates on modern fpgas, and could be pipelined if needed. The final circuit used is just a comparison to 0 to determine if the next 32b should be processed.

    Using this logic, it is possible to skip 32 (or more) 0's per cycle and it takes at least one cycle per 1 encountered.

    You can also have similar logic to skip blocks of 32 even faster. eg, check 8 32b blocks to see if any are non-zero. the result goes to the same encoder structure as above. This can also be pipelined if needed. This allows skipping of multiple 32b blocks per cycle. However, this optimization might not be justified.

    ( (-x & x) generates a vector where only the lsb is set. eg x = 0x12345678. ~x = 0xEDCBA987. -x = 0xEDCBA988. (-x & x) = 0x00000008. only the lsb remains.)

    As mentioned, if you can process pixels as they come in, that would be easier.


    (edit, I also shouldn't have to mention that the 256*256 bit image should be in a BRAM and not a large data bus. This works well as you can have 32b outputs from a BRAM. This size requires ~8 BRAM on modern devices. At the same time, you did try to synthesize the clearly troll design so I'll mention this just in case.)



--[[ ]]--