Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Using read before write RAM for Histogram calculation

Status
Not open for further replies.
I don't see a big problem with reading the data from a CPU or similar. Overclocking the RAM has already been mentioned. An other option is to connect the write port of a second RAM in parallel with the existing RAM. Both RAMs will have the same content and the read port of the second RAM is free to use for reading out the histogram data.
If one is reading out the histogram while it is still being calculated, there are higher level issues to be resolved instead.

Kevin Jennings
 

In addition to the given solution to the extra cycle of latency there are two other good solutions. The first is based on caching. The second is based on filtering the input and passing an amount and address. Each has possible advantages and disadvantages. These two methods can also be used as power optimizations as they reduce the need to read/write to the BRAM.

The remaining issues are how the RAM contents are transfered to the next stage and how they are reset. There are again several solutions. The common ones being to stop data collection during the read out of the histogram, or to double (or triple) buffer the histogram. (for this problem, note that the BRAMs might have the ability to do a "read addra and write 0" combined operation)
 

In addition to the given solution to the extra cycle of latency there are two other good solutions. The first is based on caching. The second is based on filtering the input and passing an amount and address. Each has possible advantages and disadvantages. These two methods can also be used as power optimizations as they reduce the need to read/write to the BRAM.
I'd like to hear more about the implementation of both of these.

Kevin
 

The filtering version is a bit easier to type up, so here is what it would look like:
Code:
-------------------------------------------------------------------------------
--{{{ filter process
-- This could be in a different module if you want.  I've written it that way
-- so it might be possible to remove some registers if you want.
-- I also include the first-cycle-after-reset logic.
-- This can also be written to aggressively write and avoid the accumulator.
--
-- addr_in | addr_dly | acc | addr_flt | acc_flt | vld_flt | enabled
-- --------+----------+-----+----------+---------+---------+--------
-- A       | X     !  | 0   | X        | X       | 0       | 0
-- B       | A     !  | 1   | X        | 0       | 0       | 1       
-- B       | B     =  | 1   | A        | 1       | 1       | 1       
-- C       | B     !  | 2   | B        | 1       | 0       | 1       
-- D       | C     !  | 1   | B        | 2       | 1       | 1       
-- D       | D     =  | 1   | C        | 1       | 1       | 1       
-- D       | D     =  | 2   | D        | 1       | 0       | 1       
-- E       | D     !  | 3   | D        | 2       | 0       | 1       
-- F       | E     !  | 1   | D        | 3       | 1       | 1       
-- F       | F     =  | 1   | E        | 1       | 0       | 1       
p_filter : process (clk) is
begin
  if rising_edge(clk) then
    enabled  <= '1';
    addr_dly <= addr_in;

    addr_flt <= addr_dly;
    acc_flt  <= acc;
    vld_flt  <= '0';

    -- pre-accumulate (deferred write version)
    if addr_dly = addr_in then
      acc      <= acc + 1;
    else
      acc      <= to_unsigned(1, acc'size);
      vld_flt  <= enabled;
    end if;
    
    -- also account for the first input after reset.
    if rst = '1' then
      acc     <= to_unsigned(0, acc'size);
      enabled <= '0';
    end if;

  end if;
end;
--}}}

-------------------------------------------------------------------------------
--{{{ simple dual port now has rden/wren
-- Notice that the ram is only accessed when needed.  This is an optimization
-- to reduce power.
-- 
-- addr_flt | acc_flt | vld_flt | acc_wp | ra | rden | dout | din   | wa | wren
-- ---------+---------+---------+--------+----+------+------+-------+----+-----
-- A        | 1       | 1       | X      | A  | 1    | X    | X+X   | X  | X  
-- B        | 1       | 0       | 1      | B  | 0    | A    | A+1   | A  | 1   
-- B        | 2       | 1       | 1      | B  | 1    | A    | A+1 X | B  | 0   
-- C        | 1       | 1       | 2      | C  | 1    | B    | B+2   | B  | 1   
-- D        | 1       | 0       | 1      | D  | 0    | C    | C+1   | C  | 1   
-- D        | 2       | 0       | 1      | D  | 0    | C    | C+1 X | D  | 0   
-- D        | 3       | 1       | 2      | D  | 1    | C    | C+2 X | D  | 0   
-- E        | 1       | 0       | 3      | E  | 0    | D    | D+3   | D  | 1   

u_ram : entity work.sdp
  port map (
    dout => dout,
    ra   => ra,
    rden => rden,
    din  => din,
    wa   => wa,
    wren => wren
  );

rden <= vld_flt;
ra   <= addr_flt;

din  <= dout + acc_wp;
wa   <= addr_wp;
wren <= vld_wp;

p_sdp_regs : process(clk) is
begin
  if rising_edge(clk) then
    acc_wp  <= acc_flt;
    addr_wp <= addr_flt;
    vld_wp  <= vld_flt;
  end if;
end;
--}}}
 

The caching version is a bit shorter, but feels a little less natural
Code:
-------------------------------------------------------------------------------
--{{{ Cached Implementation
-- This is a general method that can be used for more general channelized
-- logic.  For example, if the logic isn't just an adder but instead is a fsm.
--
-- addr_in | addr_dly | rden | din | wa | col | col_dly | dout | cache | ena
-- --------+----------+------+-----+----+-----+---------+------+-------+----
--  A      | X        | 1    | X   | X  | X   | 0       | X    | X     | 0   
--  B      | A        | 1    | A+1 | A  | 0   | 0 wr    | A  . | X     | 1   
--  B      | B        | 0    | B+1 | B  | 1   | 0       | B  . | A+1   | 1   
--  C      | B        | 1    | B+2 | B  | 0   | 1 wr    | B    | B+1 . | 1   
--  D      | C        | 1    | C+1 | C  | 0   | 0 wr    | C  . | B+2   | 1   
--  D      | D        | 0    | D+1 | D  | 1   | 0       | D  . | C+1   | 1   
--  D      | D        | 0    | D+2 | D  | 1   | 1       | D    | D+1 . | 1   
--  E      | D        | 1    | D+3 | D  | 0   | 1 wr    | D    | D+2 . | 1   
--  F      | E        | 1    | E+1 | E  | 0   | 0 wr    | E  . | D+3   | 1   
--  F      | F        | 1    | F+1 | F  | 0   | 0 wr    | F  . | E+1   | 1
--
-- !!! Warning !!!
-- These connection are correct for the case where the input is valid every 
-- cycle.  The case where inputs are not valid every cycle is easy to mess up.

u_ram : entity work.sdp
  port map (
    dout => dout,
    ra   => ra,
    rden => rden,
    din  => din,
    wa   => wa,
    wren => wren
  );

collision <= enabled when addr_in = addr_dly else '0';
rden      <= not collision;
ra        <= addr_in;

din       <= (dout + 1) when collision_dly = '0' else (cache + 1);
wren      <= (enable and not collision_dly) 
                 when collision_dly = collision else collision_dly;
wa        <= addr_dly;

p_logic : process (clk) is
begin
  if rising_edge(clk) then
    addr_dly      <= addr_in;
    collision_dly <= collision;
    enabled       <= '1';
    cache         <= din;

    if rst = '1' then
      enabled       <= '0';
      collision_dly <= '0';
    end if;
  end if;
end process;
--}}}

I did these without compiling or simulating. There might be some build or logic oversights. They both assume the input is valid every cycle and may be slightly tricky to convert.

The filtering approach a little nicer when there is more latency in the read-modify-write path. It can remove the ram from the critical path as well.

The caching approach is nicer when the "modify" logic is more complex, like an fsm.

I'll call the other method from earlier posts the "correcting" approach. It is probably the better choice for this application in general.
 
  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top