Best Way to Implement Shared RAM

groover · Nov 26, 2018

I have a module defining some memory, e.g. an array of 8-bit wide registers. These have read/write access from outside my core ("external") using a typical 8-bit-wide bus.

From inside my core ("internal") I have two modules A and B that also need read/write access, however for some registers they will need to read/write all eight bits and for others they need to read/write a single bit without affecting any other bits.

I could expose in the registers module interface every single bit for read/write but that won't be manageable in the long term as the number of registers and bits might grow over time.

I am thinking that my only option is a multi-port RAM with the following interfaces:

8-bit read/write port 0 for external accesses
8-bit read/write port 1 for internal accesses from module A with an 8-bit mask for writing
8-bit read/write port 2 for internal accesses from module B with an 8-bit mask for writing

The bit mask would mask out any bits that should not be modified during the 8-bit write and would be put onto the internal bus by the writing module.

I could make blocks A and B share access to the RAM but then they each have to know what the other is doing which might get messy.

I am using Verilog.

Is this the usual way to solve this problem? Am I over-complicating it? Thanks!

ads-ee · Nov 26, 2018

If the array of 8-bit wide register values are used somewhere else then you can't use a RAM as the values in that RAM are inaccessible unless the correct address is applied to a port, if you don't need parallel access to all byte registers simultaneously then a RAM might work.

To share this resource you need an arbiter and if you use a RAM you need to perform read-modify-write operations as there are no RAMs with per bit write enalbes, (unless you configure 8 RAMs as Nx1).

If you stick with flip-flops for these registers then you only need an arbiter for the write access to each register. I can see some strange behaviour if sides A and B access the same register and write different values, the first one that writes will only have its data in the register for a very short period of time.

barry · Nov 26, 2018

I think you need to first decide: is this 8 registers, or is it something else. That will determine what your approach should be. If I understand correctly, you have three interfaces: "external", A and B. A A dual-port RAM will help, but having three interfaces definitely complicates this.

groover · Nov 26, 2018

Sorry, I shouldn't have used the term 'RAM'. This is implemented using flip-flops, as an array of groups of eight.

Code:

reg [(8 * `NUMBEROFENTRIES) - 1:0] Registers;

It sounds like I am not trying to do something completely wacky, which is good.

ads-ee · Nov 27, 2018

Well now that you stated these are flip-flops...you have to have a three way arbitration not just a two way one. You have External, A, and B ports all trying to access each register.

Look into a simple round robin arbitration scheme to access the write side of the flip-flop registers, the read side (the Q output) of the flip-flop is always accessible to all three ports.

Any port that wants to write to the register file needs to make a request through the arbiter first. If you want to access different registers from all three ports simultaneously you would have to have arbiters for each register (requires arbitration of each register separately), which would use a lot more logic resources.

Don't forget to deal with any clock domain issues if these ports use different clock domains. Hopefully you only have one clock domain for all three ports.

KlausST · Nov 27, 2018

Hi,

Writing:
Maybe one of the ports can be preferred ... then this can have direct write access.
While the others (or all) are double buffered for writing.

For reading: all three could access the otputs via (three state) buffers directly.

Klaus

groover · Nov 27, 2018

Thinking about this a bit more I am wondering if I can make some simplifications.

1. A and B modules will never need to write to the same bit.

2. Any bits that A and External can write to or B and External can write to should give External the lowest priority

I was therefore thinking that if both A and B try to write to the same group of eight bits at the same time (each writing to a different bit in the group of eight) I can simply merge the write requests and masks and perform a single write.

And if External and A or External and B try to write to the same group of eight bits at the same time I can use the "last write wins" approach, so always do the external non-blocking write first followed by the merged A/B write?

Same clock is used for all three ports.

barry · Nov 27, 2018

Your last statement is confusing. "do the the external non-blocking write first followed by the merged A/B write". How do you propose to "follow" one write with another? Are you going to create a queue? Maybe you could use a handshaking arrangement, although requires adding more logic to the writing modules. What are the data rates? Obviously, if multiple modules are trying to write continuously, you've got a problem. I think this needs more definition.

vGoodtimes · Nov 27, 2018

I think the last-write-wins is in terms of having multiple reachable assignments in the same clock edge of a process. It would be a short way to write the logic. It implies muxes for the input to each register, which might be acceptable but would be more LUTs than a shared scheme. The LUTs are likely near the registers anyways though.

barry · Nov 27, 2018

vGoodtimes said:
I think the last-write-wins is in terms of having multiple reachable assignments in the same clock edge of a process. It would be a short way to write the logic. It implies muxes for the input to each register, which might be acceptable but would be more LUTs than a shared scheme. The LUTs are likely near the registers anyways though.

Using a mux is not the same as 'following' the last write with another write. That implies queuing writes with a fifo or some other kind of storage mechanism.

pbernardi · Nov 27, 2018

Can´t you multiplex the register access in the time?

Using three cycles could do the job.

Cycle1: Write/Read A
Cycle2: Write/Read B
Cycle3: Write/Read External

barry · Nov 27, 2018

pbernardi said:
Can´t you multiplex the register access in the time?

Using three cycles could do the job.

Cycle1: Write/Read A
Cycle2: Write/Read B
Cycle3: Write/Read External

This method would work, but would again require some kind of signal to each module indicating when data can be written.

We still more definition from the OP.

ads-ee · Nov 27, 2018

pbernardi said:
Can´t you multiplex the register access in the time?

Using three cycles could do the job.

Cycle1: Write/Read A
Cycle2: Write/Read B
Cycle3: Write/Read External

This is the same as a simple round robin arbiter, each port gets a grant in a circular order any port that doesn't have something to write gets skipped.
But as these are registers and not a memory the need for arbitration for reading the output of the registers is unnecessary.

vGoodtimes · Nov 28, 2018

barry said:
Using a mux is not the same as 'following' the last write with another write. That implies queuing writes with a fifo or some other kind of storage mechanism.

Code:

// this is a mux.
always@(posedge clk) begin
  value <= external_data; // so always do the external non-blocking write first 
  if (merged_write) begin //  followed by the merged A/B write
    value <= merged_data; // the "last write wins" approach
  end
end

groover · Nov 28, 2018

vGoodtimes has exactly what I was thinking. I will post my code when I have tested it this evening or tomorrow. Meanwhile I am working on the masked write part for ports A and B. Essentially what I want to do is a read-modify-write in a single clock cycle. Here is my code:

Code:

Registers[Address_1 * 8 +: 8] <= (Registers[Address_1 * 8 +: 8] & MaskIn_1) | DataIn_1;

I am thinking that this is probably a bad idea, but I am not sure why. I did see that circular assignments can cause implied latches. Is this a bad idea and if so why?

Thanks!

ads-ee · Nov 28, 2018

It won't be a latch if it is in a clocked process.

groover · Nov 28, 2018

OK, thanks. Here is what I have come up with. This is inside a clocked block:

Code:

// writing - port 0
if (Write_0) begin
  if (Address_0 < `REG_NUMBEROFREGISTERS) begin
    Registers[Address_0 * 8 +: 8] <= DataIn_0 & WriteMasks[Address_0 * 8 +: 8];
  end
end

// if writing on ports 1 and 2 at the same time and with the same address
if (Write_1 && Write_2 && (Address_1 == Address_2)) begin
  if (Address_1 < `REG_NUMBEROFREGISTERS) begin
    Registers[Address_1 * 8 +: 8] <= (((Registers[Address_1 * 8 +: 8] & MaskIn_1) | DataIn_1) & MaskIn_2) | DataIn_2;
  end
end else begin
  // writing - port 1
  if (Write_1) begin
    if (Address_1 < `REG_NUMBEROFREGISTERS) begin
      Registers[Address_1 * 8 +: 8] <= (Registers[Address_1 * 8 +: 8] & MaskIn_1) | DataIn_1;
    end
  // writing - port 2
  end else if (Write_2) begin
    if (Address_2 < `REG_NUMBEROFREGISTERS) begin
      Registers[Address_2 * 8 +: 8] <= (Registers[Address_2 * 8 +: 8] & MaskIn_2) | DataIn_2;
    end
  end
end

Any comments would be appreciated! Thanks.

ads-ee · Nov 28, 2018

You should really learn how to use parameters instead of `define. defines are really annoying and not very portable and can make reuse impossible.

A couple of observations:
Are you okay with priority encoding between port 1 and 2?
What is supposed to happen when 0 writes to the same address as 1 or 2?
There is no arbitration, so either writes to 1 can completely block writes from 2 until 1 stops writing.

groover · Nov 29, 2018

Thanks for the feedback! I need to change it so that ports 1 and 2 can write at the same time if their addresses are different.

If port 0 writes to the same address as 1 or 2 then ports 1 or 2 get priority as I mentioned in a previous post about simplifications.

ads-ee said:
You should really learn how to use parameters instead of `define. defines are really annoying and not very portable and can make reuse impossible.

Here is my issue and perhaps this betrays my software background:

I am constructing here a small set of 8-bit wide "peripheral registers". Inside my core they will be accessed from different modules (e.g. A and B that I mentioned before).

Suppose I have a peripheral register at address 0x04 and in that register bit two is called "TXBUSY". I need to read/write this bit from modules A and B. I don't want to scatter my code with magic numbers or repeat definitions, because not only is it bad practice but I may want to move bits around later on without breaking anything.

My plan was to create a header file of definitions that are included in the descriptions of all modules. For example:

Code:

`define REG_ADDRESS_TXBUSY 4'h4
`define REG_BITNUM_TXBUSY 8'b00000100

`define REG_ADDRESS_COUNTERS 4'h5
`define REG_MASK_ERRORCOUNTER 8'b00001111
`define REG_MASK_TXCOUNTER 8'b11110000

etc. Then in the modules I don't care where the bits are because I can reference them by the defined names.

I figured that once I have all of these register/bit definitions in a single file I may as well add things like:

Code:

`define REG_NUMBEROFREGISTERS 8

to it to keep everything in one place.

I see that for the size of memories, the width of buses, etc. putting values in parameters allows instantiation of different copies of module with different settings, so maybe the number of registers is a special case and can be a parameter, but considering my aim of describing bits in one place how would you recommend I do this without defines?

Thanks!

KlausST · Nov 29, 2018

Hi,

Suppose I have a peripheral register at address 0x04 and in that register bit two is called "TXBUSY". I need to read/write this bit from modules A and B.

Are you sure the both A and B need to WRITE bit two of this register?

Please show a sketch (hand drawn is OK) about the circuit in your mind.

Klaus

Welcome to EDAboard.com

Best Way to Implement Shared RAM

Junior Member level 1

Super Moderator

Advanced Member level 7

Junior Member level 1

Super Moderator

Advanced Member level 7

Junior Member level 1

Advanced Member level 7

Advanced Member level 4

Advanced Member level 7

Full Member level 3

Advanced Member level 7

Super Moderator

Advanced Member level 4

Junior Member level 1

Super Moderator

Junior Member level 1

Super Moderator

Junior Member level 1

Advanced Member level 7

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor