Using read before write RAM for Histogram calculation

K-J · Apr 15, 2016

std_match said:
I don't see a big problem with reading the data from a CPU or similar. Overclocking the RAM has already been mentioned. An other option is to connect the write port of a second RAM in parallel with the existing RAM. Both RAMs will have the same content and the read port of the second RAM is free to use for reading out the histogram data.

If one is reading out the histogram while it is still being calculated, there are higher level issues to be resolved instead.

Kevin Jennings

vGoodtimes · Apr 16, 2016

In addition to the given solution to the extra cycle of latency there are two other good solutions. The first is based on caching. The second is based on filtering the input and passing an amount and address. Each has possible advantages and disadvantages. These two methods can also be used as power optimizations as they reduce the need to read/write to the BRAM.

The remaining issues are how the RAM contents are transfered to the next stage and how they are reset. There are again several solutions. The common ones being to stop data collection during the read out of the histogram, or to double (or triple) buffer the histogram. (for this problem, note that the BRAMs might have the ability to do a "read addra and write 0" combined operation)

K-J · Apr 16, 2016

vGoodtimes said:
In addition to the given solution to the extra cycle of latency there are two other good solutions. The first is based on caching. The second is based on filtering the input and passing an amount and address. Each has possible advantages and disadvantages. These two methods can also be used as power optimizations as they reduce the need to read/write to the BRAM.

I'd like to hear more about the implementation of both of these.

Kevin

vGoodtimes · Apr 16, 2016

The filtering version is a bit easier to type up, so here is what it would look like:

Code:

-------------------------------------------------------------------------------
--{{{ filter process
-- This could be in a different module if you want.  I've written it that way
-- so it might be possible to remove some registers if you want.
-- I also include the first-cycle-after-reset logic.
-- This can also be written to aggressively write and avoid the accumulator.
--
-- addr_in | addr_dly | acc | addr_flt | acc_flt | vld_flt | enabled
-- --------+----------+-----+----------+---------+---------+--------
-- A       | X     !  | 0   | X        | X       | 0       | 0
-- B       | A     !  | 1   | X        | 0       | 0       | 1       
-- B       | B     =  | 1   | A        | 1       | 1       | 1       
-- C       | B     !  | 2   | B        | 1       | 0       | 1       
-- D       | C     !  | 1   | B        | 2       | 1       | 1       
-- D       | D     =  | 1   | C        | 1       | 1       | 1       
-- D       | D     =  | 2   | D        | 1       | 0       | 1       
-- E       | D     !  | 3   | D        | 2       | 0       | 1       
-- F       | E     !  | 1   | D        | 3       | 1       | 1       
-- F       | F     =  | 1   | E        | 1       | 0       | 1       
p_filter : process (clk) is
begin
  if rising_edge(clk) then
    enabled  <= '1';
    addr_dly <= addr_in;

    addr_flt <= addr_dly;
    acc_flt  <= acc;
    vld_flt  <= '0';

    -- pre-accumulate (deferred write version)
    if addr_dly = addr_in then
      acc      <= acc + 1;
    else
      acc      <= to_unsigned(1, acc'size);
      vld_flt  <= enabled;
    end if;
    
    -- also account for the first input after reset.
    if rst = '1' then
      acc     <= to_unsigned(0, acc'size);
      enabled <= '0';
    end if;

  end if;
end;
--}}}

-------------------------------------------------------------------------------
--{{{ simple dual port now has rden/wren
-- Notice that the ram is only accessed when needed.  This is an optimization
-- to reduce power.
-- 
-- addr_flt | acc_flt | vld_flt | acc_wp | ra | rden | dout | din   | wa | wren
-- ---------+---------+---------+--------+----+------+------+-------+----+-----
-- A        | 1       | 1       | X      | A  | 1    | X    | X+X   | X  | X  
-- B        | 1       | 0       | 1      | B  | 0    | A    | A+1   | A  | 1   
-- B        | 2       | 1       | 1      | B  | 1    | A    | A+1 X | B  | 0   
-- C        | 1       | 1       | 2      | C  | 1    | B    | B+2   | B  | 1   
-- D        | 1       | 0       | 1      | D  | 0    | C    | C+1   | C  | 1   
-- D        | 2       | 0       | 1      | D  | 0    | C    | C+1 X | D  | 0   
-- D        | 3       | 1       | 2      | D  | 1    | C    | C+2 X | D  | 0   
-- E        | 1       | 0       | 3      | E  | 0    | D    | D+3   | D  | 1   

u_ram : entity work.sdp
  port map (
    dout => dout,
    ra   => ra,
    rden => rden,
    din  => din,
    wa   => wa,
    wren => wren
  );

rden <= vld_flt;
ra   <= addr_flt;

din  <= dout + acc_wp;
wa   <= addr_wp;
wren <= vld_wp;

p_sdp_regs : process(clk) is
begin
  if rising_edge(clk) then
    acc_wp  <= acc_flt;
    addr_wp <= addr_flt;
    vld_wp  <= vld_flt;
  end if;
end;
--}}}

vGoodtimes · Apr 17, 2016

The caching version is a bit shorter, but feels a little less natural

Code:

-------------------------------------------------------------------------------
--{{{ Cached Implementation
-- This is a general method that can be used for more general channelized
-- logic.  For example, if the logic isn't just an adder but instead is a fsm.
--
-- addr_in | addr_dly | rden | din | wa | col | col_dly | dout | cache | ena
-- --------+----------+------+-----+----+-----+---------+------+-------+----
--  A      | X        | 1    | X   | X  | X   | 0       | X    | X     | 0   
--  B      | A        | 1    | A+1 | A  | 0   | 0 wr    | A  . | X     | 1   
--  B      | B        | 0    | B+1 | B  | 1   | 0       | B  . | A+1   | 1   
--  C      | B        | 1    | B+2 | B  | 0   | 1 wr    | B    | B+1 . | 1   
--  D      | C        | 1    | C+1 | C  | 0   | 0 wr    | C  . | B+2   | 1   
--  D      | D        | 0    | D+1 | D  | 1   | 0       | D  . | C+1   | 1   
--  D      | D        | 0    | D+2 | D  | 1   | 1       | D    | D+1 . | 1   
--  E      | D        | 1    | D+3 | D  | 0   | 1 wr    | D    | D+2 . | 1   
--  F      | E        | 1    | E+1 | E  | 0   | 0 wr    | E  . | D+3   | 1   
--  F      | F        | 1    | F+1 | F  | 0   | 0 wr    | F  . | E+1   | 1
--
-- !!! Warning !!!
-- These connection are correct for the case where the input is valid every 
-- cycle.  The case where inputs are not valid every cycle is easy to mess up.

u_ram : entity work.sdp
  port map (
    dout => dout,
    ra   => ra,
    rden => rden,
    din  => din,
    wa   => wa,
    wren => wren
  );

collision <= enabled when addr_in = addr_dly else '0';
rden      <= not collision;
ra        <= addr_in;

din       <= (dout + 1) when collision_dly = '0' else (cache + 1);
wren      <= (enable and not collision_dly) 
                 when collision_dly = collision else collision_dly;
wa        <= addr_dly;

p_logic : process (clk) is
begin
  if rising_edge(clk) then
    addr_dly      <= addr_in;
    collision_dly <= collision;
    enabled       <= '1';
    cache         <= din;

    if rst = '1' then
      enabled       <= '0';
      collision_dly <= '0';
    end if;
  end if;
end process;
--}}}

I did these without compiling or simulating. There might be some build or logic oversights. They both assume the input is valid every cycle and may be slightly tricky to convert.

The filtering approach a little nicer when there is more latency in the read-modify-write path. It can remove the ram from the critical path as well.

The caching approach is nicer when the "modify" logic is more complex, like an fsm.

I'll call the other method from earlier posts the "correcting" approach. It is probably the better choice for this application in general.

Welcome to EDAboard.com

Using read before write RAM for Histogram calculation

K-J

Advanced Member level 2

vGoodtimes

Advanced Member level 4

K-J

Advanced Member level 2

vGoodtimes

Advanced Member level 4

vGoodtimes

Advanced Member level 4

shaiko

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics