Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Multiple users of a DDR interface

Status
Not open for further replies.

shaiko

Advanced Member level 5
Joined
Aug 20, 2011
Messages
2,644
Helped
303
Reputation
608
Reaction score
297
Trophy points
1,363
Activity points
18,302
Hello,

I have a DDR3 controller with a 128 bit wide data interface.

Currently there're 5 users connected to using the following architecture:

Each "user" has a 16 to 128 width converting FIFO. The application write data to the 16 bit side.
Each of the 5 128 bit FIFO outputs is connected via a MUX to a custom arbiter I wrote.
With every clock that the DDR3 controller is ready (wait signal inactive), the arbiter issues a select signal to the MUX, strobes a read signal to the appropriate FIFO and writes the data to a predetermined address range (that matches the FIFO number).

The design works well at ~120 MHz but I doubt that it'll scale up well.
I expect the 128 bit wide MUX to become a performance bottleneck as the number of users increases...

How did you solve similar problems?
 

Pipeline the multiplexer (and address/control) for maximum performance and maximum resource usage, peel off bits of the select and multiplex that subset of inputs and register the output, then multiplex the next set of inputs using the next set of select bits. Assuming a binary select and 6-input LUTs you should end up with two stages of 4-to-1 multiplexers for a 16-to-1 mux, with registers on the output of each stage.
 
  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
This was my first thought a while ago...and it's probably what I'll do eventually...
https://alteraforum.com/forum/showthread.php?t=54643

But I'm asking about other approaches to this issue...
Obviously anyone who tried to connect multiple users to a shared medium with a very wide bus - faced similar design challenges.
So I'm thinking perhaps someone here implemented a different solution than the one I'm considering.
 

I had come across a design in which multiple FIFOs were to write data to an external DDR3 memory (in your case it would be 5 FIFOs of 5 users ).
The FIFO interface signals were AXIS.
So we had used the AXI Virtual FIFO Controller IP core. I am talking about Xilinx based designs.

You may read here a bit and get some idea : https://www.xilinx.com/products/intellectual-property/axi_virtual_fifo_controller.html

If you look at the spec, PG038 November 18, 2015, Pg. 8 shows the top-level block diagram, you will get an idea as to what is happening. You don't need any custom Arbiter or MUX design, everything is saleable.
 
Last edited:
  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
You dont specify what interface you're using?
The Xilinx memory controller offers an AXI4 slave interface, and there is an axi interconnect IP for multiple masters to a Memory slave. THe interconnect has several configuration options, like buffer sizes and packet fifos for buffering data. Different masters can have different dwidths and all transactions will be converted inside the interconnect.
Make your life easy, use existing IP for existing interfaces.

Good thing about AXI4 is that it's pretty simple to plug AXIS into it.
 
  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
The Xilinx memory controller offers an AXI4 slave interface, and there is an axi interconnect IP for multiple masters to a Memory slave.
I designed the custom arbiter for an Altera Cyclone V SOC implementation.
At first, I tried to use the native QSYS interconnect but the timing performance and resource usage were VERY VERY POOR. Fmax was under 95MHz.
I was surprised at first - but our ALTERA FAE confirmed that my Qsys design was optimal...

This led me to try and design something of my own. The Fmax I got with the architecture I described is 124MHz with 1/5 the logic usage of the Qsys equivalent!

Now, I'm working on a new design that will be implemented on a much bigger Xilinx Kintex Ultrascale FPGA.
So I'm thinking whether or not to reuse my custom arbiter or to give Xilinx interconnect a try...
 

Give the xilinx stuff a go. We use it at 200 MHz.
 
  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
Ultrascale.
One thing to bare in mind that Xilinx stuff is all AXI - I dont know what your custom IP speaks.
 

I dont know what your custom IP speaks.
It speaks Avalon which is almost the same...

Do you know what arbitration scheme the AXI interconnect implements in case of multiple users?
I know that Qsys uses weighted round robin...somewhat inefficient.
 

  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
Arbitration scheme and clock rate -- does this matter that much? You should design any algorithms to avoid small external memory accesses. If short accesses are needed, you should include a caching layer similar to the L1/L2 cache from modern CPUs. External memory interfaces like DDR* favor kilobyte sized accesses when possible.
 

Arbitration scheme and clock rate -- does this matter that much?
The Arbitration algorithm does start to matter when you approach the memory bandwidth limits...
 

The Arbitration algorithm does start to matter when you approach the memory bandwidth limits...

But why are you approaching the limits? is it because of small transfers? lots of row changes? Much easier to get the access sequence fixed (if possible) to save bandwidth that the complicated arbitration.
 

But why are you approaching the limits? is it because of small transfers? lots of row changes?
No. I use the memory very efficiently...long bursts and as much spatial locality as possible.
With 5 users, I consume ~40% of the bandwidth - but as the design grows, more users will be added, so I'd like to work as efficiently as possible...
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top