Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Best Way to Implement Shared RAM

Status
Not open for further replies.

groover

Junior Member level 1
Joined
Oct 9, 2014
Messages
19
Helped
0
Reputation
0
Reaction score
0
Trophy points
1
Activity points
172
I have a module defining some memory, e.g. an array of 8-bit wide registers. These have read/write access from outside my core ("external") using a typical 8-bit-wide bus.

From inside my core ("internal") I have two modules A and B that also need read/write access, however for some registers they will need to read/write all eight bits and for others they need to read/write a single bit without affecting any other bits.

I could expose in the registers module interface every single bit for read/write but that won't be manageable in the long term as the number of registers and bits might grow over time.

I am thinking that my only option is a multi-port RAM with the following interfaces:

  • 8-bit read/write port 0 for external accesses
  • 8-bit read/write port 1 for internal accesses from module A with an 8-bit mask for writing
  • 8-bit read/write port 2 for internal accesses from module B with an 8-bit mask for writing

The bit mask would mask out any bits that should not be modified during the 8-bit write and would be put onto the internal bus by the writing module.

I could make blocks A and B share access to the RAM but then they each have to know what the other is doing which might get messy.

I am using Verilog.

Is this the usual way to solve this problem? Am I over-complicating it? Thanks!
 

If the array of 8-bit wide register values are used somewhere else then you can't use a RAM as the values in that RAM are inaccessible unless the correct address is applied to a port, if you don't need parallel access to all byte registers simultaneously then a RAM might work.

To share this resource you need an arbiter and if you use a RAM you need to perform read-modify-write operations as there are no RAMs with per bit write enalbes, (unless you configure 8 RAMs as Nx1).

If you stick with flip-flops for these registers then you only need an arbiter for the write access to each register. I can see some strange behaviour if sides A and B access the same register and write different values, the first one that writes will only have its data in the register for a very short period of time.
 
I think you need to first decide: is this 8 registers, or is it something else. That will determine what your approach should be. If I understand correctly, you have three interfaces: "external", A and B. A A dual-port RAM will help, but having three interfaces definitely complicates this.
 
Sorry, I shouldn't have used the term 'RAM'. This is implemented using flip-flops, as an array of groups of eight.

Code:
reg [(8 * `NUMBEROFENTRIES) - 1:0] Registers;

It sounds like I am not trying to do something completely wacky, which is good. :)
 

Well now that you stated these are flip-flops...you have to have a three way arbitration not just a two way one. You have External, A, and B ports all trying to access each register.

Look into a simple round robin arbitration scheme to access the write side of the flip-flop registers, the read side (the Q output) of the flip-flop is always accessible to all three ports.

Any port that wants to write to the register file needs to make a request through the arbiter first. If you want to access different registers from all three ports simultaneously you would have to have arbiters for each register (requires arbitration of each register separately), which would use a lot more logic resources.

Don't forget to deal with any clock domain issues if these ports use different clock domains. Hopefully you only have one clock domain for all three ports.
 

Hi,

Writing:
Maybe one of the ports can be preferred ... then this can have direct write access.
While the others (or all) are double buffered for writing.

For reading: all three could access the otputs via (three state) buffers directly.

Klaus
 

Thinking about this a bit more I am wondering if I can make some simplifications.

1. A and B modules will never need to write to the same bit.

2. Any bits that A and External can write to or B and External can write to should give External the lowest priority

I was therefore thinking that if both A and B try to write to the same group of eight bits at the same time (each writing to a different bit in the group of eight) I can simply merge the write requests and masks and perform a single write.

And if External and A or External and B try to write to the same group of eight bits at the same time I can use the "last write wins" approach, so always do the external non-blocking write first followed by the merged A/B write?

Same clock is used for all three ports.
 

Your last statement is confusing. "do the the external non-blocking write first followed by the merged A/B write". How do you propose to "follow" one write with another? Are you going to create a queue? Maybe you could use a handshaking arrangement, although requires adding more logic to the writing modules. What are the data rates? Obviously, if multiple modules are trying to write continuously, you've got a problem. I think this needs more definition.
 

I think the last-write-wins is in terms of having multiple reachable assignments in the same clock edge of a process. It would be a short way to write the logic. It implies muxes for the input to each register, which might be acceptable but would be more LUTs than a shared scheme. The LUTs are likely near the registers anyways though.
 

I think the last-write-wins is in terms of having multiple reachable assignments in the same clock edge of a process. It would be a short way to write the logic. It implies muxes for the input to each register, which might be acceptable but would be more LUTs than a shared scheme. The LUTs are likely near the registers anyways though.

Using a mux is not the same as 'following' the last write with another write. That implies queuing writes with a fifo or some other kind of storage mechanism.
 

Can´t you multiplex the register access in the time?

Using three cycles could do the job.

Cycle1: Write/Read A
Cycle2: Write/Read B
Cycle3: Write/Read External
 

Can´t you multiplex the register access in the time?

Using three cycles could do the job.

Cycle1: Write/Read A
Cycle2: Write/Read B
Cycle3: Write/Read External
This method would work, but would again require some kind of signal to each module indicating when data can be written.

We still more definition from the OP.
 

Can´t you multiplex the register access in the time?

Using three cycles could do the job.

Cycle1: Write/Read A
Cycle2: Write/Read B
Cycle3: Write/Read External

This is the same as a simple round robin arbiter, each port gets a grant in a circular order any port that doesn't have something to write gets skipped.
But as these are registers and not a memory the need for arbitration for reading the output of the registers is unnecessary.
 

Using a mux is not the same as 'following' the last write with another write. That implies queuing writes with a fifo or some other kind of storage mechanism.

Code:
// this is a mux.
always@(posedge clk) begin
  value <= external_data; // so always do the external non-blocking write first 
  if (merged_write) begin //  followed by the merged A/B write
    value <= merged_data; // the "last write wins" approach
  end
end
 
vGoodtimes has exactly what I was thinking. I will post my code when I have tested it this evening or tomorrow. Meanwhile I am working on the masked write part for ports A and B. Essentially what I want to do is a read-modify-write in a single clock cycle. Here is my code:

Code:
Registers[Address_1 * 8 +: 8] <= (Registers[Address_1 * 8 +: 8] & MaskIn_1) | DataIn_1;

I am thinking that this is probably a bad idea, but I am not sure why. I did see that circular assignments can cause implied latches. Is this a bad idea and if so why?

Thanks!
 

It won't be a latch if it is in a clocked process.
 
OK, thanks. Here is what I have come up with. This is inside a clocked block:

Code:
// writing - port 0
if (Write_0) begin
  if (Address_0 < `REG_NUMBEROFREGISTERS) begin
    Registers[Address_0 * 8 +: 8] <= DataIn_0 & WriteMasks[Address_0 * 8 +: 8];
  end
end

// if writing on ports 1 and 2 at the same time and with the same address
if (Write_1 && Write_2 && (Address_1 == Address_2)) begin
  if (Address_1 < `REG_NUMBEROFREGISTERS) begin
    Registers[Address_1 * 8 +: 8] <= (((Registers[Address_1 * 8 +: 8] & MaskIn_1) | DataIn_1) & MaskIn_2) | DataIn_2;
  end
end else begin
  // writing - port 1
  if (Write_1) begin
    if (Address_1 < `REG_NUMBEROFREGISTERS) begin
      Registers[Address_1 * 8 +: 8] <= (Registers[Address_1 * 8 +: 8] & MaskIn_1) | DataIn_1;
    end
  // writing - port 2
  end else if (Write_2) begin
    if (Address_2 < `REG_NUMBEROFREGISTERS) begin
      Registers[Address_2 * 8 +: 8] <= (Registers[Address_2 * 8 +: 8] & MaskIn_2) | DataIn_2;
    end
  end
end

Any comments would be appreciated! Thanks.
 

You should really learn how to use parameters instead of `define. defines are really annoying and not very portable and can make reuse impossible.

A couple of observations:
Are you okay with priority encoding between port 1 and 2?
What is supposed to happen when 0 writes to the same address as 1 or 2?
There is no arbitration, so either writes to 1 can completely block writes from 2 until 1 stops writing.
 
Thanks for the feedback! I need to change it so that ports 1 and 2 can write at the same time if their addresses are different.

If port 0 writes to the same address as 1 or 2 then ports 1 or 2 get priority as I mentioned in a previous post about simplifications.

You should really learn how to use parameters instead of `define. defines are really annoying and not very portable and can make reuse impossible.

Here is my issue and perhaps this betrays my software background:

I am constructing here a small set of 8-bit wide "peripheral registers". Inside my core they will be accessed from different modules (e.g. A and B that I mentioned before).

Suppose I have a peripheral register at address 0x04 and in that register bit two is called "TXBUSY". I need to read/write this bit from modules A and B. I don't want to scatter my code with magic numbers or repeat definitions, because not only is it bad practice but I may want to move bits around later on without breaking anything.

My plan was to create a header file of definitions that are included in the descriptions of all modules. For example:

Code:
`define REG_ADDRESS_TXBUSY 4'h4
`define REG_BITNUM_TXBUSY 8'b00000100

`define REG_ADDRESS_COUNTERS 4'h5
`define REG_MASK_ERRORCOUNTER 8'b00001111
`define REG_MASK_TXCOUNTER 8'b11110000

etc. Then in the modules I don't care where the bits are because I can reference them by the defined names.

I figured that once I have all of these register/bit definitions in a single file I may as well add things like:

Code:
`define REG_NUMBEROFREGISTERS 8

to it to keep everything in one place.

I see that for the size of memories, the width of buses, etc. putting values in parameters allows instantiation of different copies of module with different settings, so maybe the number of registers is a special case and can be a parameter, but considering my aim of describing bits in one place how would you recommend I do this without defines?

Thanks!
 
Last edited:

Hi,

Suppose I have a peripheral register at address 0x04 and in that register bit two is called "TXBUSY". I need to read/write this bit from modules A and B.
Are you sure the both A and B need to WRITE bit two of this register?

Please show a sketch (hand drawn is OK) about the circuit in your mind.

Klaus
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top