Designing a DDR2 controller: strobe issues

J90 · Sep 7, 2010

I'm trying to implement a DDR2 controller on a Spartan3 FPGA.

The memory is a single 512Mb chip by Micron (32M x 16, that is the data bus is 16 bits wide), and the following is its datasheet: https://download.micron.com/pdf/datasheets/dram/ddr2/512MbDDR2.pdf

Writing the RAM is quite straightforward, but reading it is giving me nightmares.
The DDR2 uses a source synchronous interface, that is the data on the parallel data bus (DQ<15:0>) is not synchronized with the common clock signal (CK), instead a standalone bidirectional data strobe (DQS) is used to sync the data bus. Actually the x16 version of the ram (the one I'm using) has 2 data strobes, LDQS for the lower byte of data (DQ<7:0>) and UDQS for the upper byte of data (DQ<15:8>).

The following is taken from page 98 of the previously mentioned datasheet and it illustrates the data valid windows:
**broken link removed**

Now, I can't use a DQS signal as-it-is to sample data since its edges are not centered with data, a shift of about 90 degrees would be necessary. Unfortunately DQS is not free-running, therefore a DCM can't be used to provide the required phase shift.

I've searched on the network for some examples, and as far as I can see no one actually uses the DQS signals to read data from the RAM, instead they uses a properly shifted version of the internal FPGA clock (the same that goes on the CK line of the RAM).
The required shift really depends on the application, for example one board could have longer data tracks than another, hence the skew difference that requires the FPGA to shift the clock of a different value.
Would be measuring the timings on field and hard-coding a shift value in the FPGA bad practice?

Besides the manually shifted clock solution (which sounds quite horrible, but you tell me) I have no idea on how to accomplish this, any suggestion will be really appreciated.

Thanks.

TrickyDicky · Sep 7, 2010

I know Altera provide a DDR memory controller IP core, Im pretty sure Xilinx do the same. Are you sure you want to write your own?

J90 · Sep 7, 2010

Yes, Xilinx provides an IP core for DDR2 RAMs.
And yes, I'm sure I want to make one of my own, or at least try to make one

permute · Sep 7, 2010

Xilinx (and I would assume altera) use a calibration algorithm for DDR2. IIRC, it is in some of the JEDEC documents. the basic idea is to assume you can send valid commands, as well as write to the RAM. at this point, you write something that puts a known pattern out for each bit. eg 01010011. At this point, the data is re-read as often as possible (eg, taking breaks for refresh commands). The delay for each IO is adjusted over its entire range, looking for valid range of delays. Once all IO's have been calibrated, a "cal_done" signal is set.

This method allows for both startup calibration, tolerance measurements, and BIST features. In more advanced formats, it can allow the FPGA to re-calibrate the RAM at periodic intervals.

J90 · Sep 8, 2010

Thank you permute for your answer.

I've searched on the network for some document about this IIRC algorithm, but I found nothing.
Does anyone have a link or some documentation about it?

FvM · Sep 8, 2010

IIRC = If I remember correctly :grin:

J90 · Sep 8, 2010

Oh my goodness! Shame on me :grin:

Well, I'll keep looking into JEDEC documents, hopefully I'll find something :???:

nachumk · Sep 9, 2010

I'm actually blogging my experience writing a DDR3 controller from scratch:DDR3 controller - from scratch

From what I understand so far (take with a grain of salt...)
You should use DQS to sample the data. Assuming a 90 degree shift is not exactly correct. You don't know what the trace length between your controller and the DDR2 is, so DQS won't be phase aligned with your DDR2 clock, and therefore a 90 degree shift is a guess at best. This is why you use the DQS to capture your incoming data.

To deal with trace lengths between DQS/DQ to your controller you will need to perform some manual calibration. One way to do this is to run the DDR2 at a slow clock and write data to it. Now you start speeding up your clock and begin reading data back from where you wrote to it, adjusting IODELAY primitives as needed. This is how to calibrate the reading. Next, after you've got the read calibrated, you start writing data at high speeds and verify the write gets read correctly, also adjust IODELAY primitives. This is how you calibrate the writing. I believe a DDR2 primitive or SERDES primitive will be required for the 90 degress phase shifting and capturing of DDR2 data.

Another good tip is that you can look at Xilinx's DDR2 source code if you generate the DDR2 module in Core Generator. That might give you some tips. (Warning: It's very complicated.)

Nachum

permute · Sep 9, 2010

indeed. There are a few other things, like determining the correct latency, and handling R/W turnaround.

looking at the website, I'd like to note that IODELAY components are actually controlled independent of IDELAY_CTRL. The Clk control input isn't required to be 200MHz. It can be higher or lower. From what I recall, MIG uses a divided clock to make the calibration state-machine logic easier to route. The 200MHz is only required to go to the IODELAY_CTRL.

I suspect there should be a DLL_RESET/WAIT state somewhere in your state machine. This would be used to ensure the DLL on the DDR3 components are locked.

Your top.ucf also doesn't have timing constraints. You need to have ioclk, and ddr3_clk constrained. It's probably best to provide a from rst to regs constraint as well. I also prefer to put IOB attributes on the IO, to ensure they will be added. Without these, you need to use OFFSET constraints more heavily. You use async resets without a synchronous deassertion, but it doesn't look like there are an 1st cycle transitions. Truely async resets might take several cycles to deassert at all locations of the FPGA. this could still lead to issues

nachumk · Sep 9, 2010

IDELAY_CTRL is required when you use an IODELAY primitive. IDELAY_CTRL is only used to connect the required 200 MHz reference clock for use in creating the tap delays that are controlled through the IODELAY primitive. You can delay DQS or clocks by a variable number of taps by connecting them to an IODELAY primitive. The reason I have the 200 MHz clock going to the IODELAY is b/c my write leveling logic is running in the 200 MHz clock domain - of course it doesn't have to...

I also have DLL reset in the state machine - look at the mode register states.

My website is a work in progress, and I haven't added the next steps yet. Of course there are going to be a lot of constraints that will have to be added in the future.

This is actually the second DDR controller I'm writing, first one was a DDR2, the DDR3 has only been really interesting while tackling the stuff that is different for DDR3. Write leveling is exactly that sort of thing which was very interesting to figure out. I will find time to continue progressing with it in the future, stay tuned to see.

Welcome to EDAboard.com

Designing a DDR2 controller: strobe issues

J90

Junior Member level 1

TrickyDicky

Advanced Member level 7

J90

Junior Member level 1

permute

Advanced Member level 3

J90

J90

Junior Member level 1

FvM

Super Moderator

J90

Junior Member level 1

nachumk

Newbie level 4

permute

Advanced Member level 3

nachumk

Newbie level 4

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics