Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Xilinx Aurora protocol efficiency

Status
Not open for further replies.

shaiko

Advanced Member level 5
Joined
Aug 20, 2011
Messages
2,644
Helped
303
Reputation
608
Reaction score
297
Trophy points
1,363
Activity points
18,302
Hello,

I need to transfer 6.8 Gb/s of data between two Virtex 7 FPGAs.
I want to use the Aurora 64b/66b protocol for that purpose. According to Vivado I can run the Aurora at up-to 11.3 Gb/s on my devices (xc7vx550tffg1927-2 on both sides).

Preferably, I want to avoid aggregating lanes and make do with only one lane.
Do you think it'll be feasible to transfer 6.8 Gb/s of effective data over a ~10 Gb/s Aurora interface ?
 

Hello,

I need to transfer 6.8 Gb/s of data between two Virtex 7 FPGAs.
I want to use the Aurora 64b/66b protocol for that purpose. According to Vivado I can run the Aurora at up-to 11.3 Gb/s on my devices (xc7vx550tffg1927-2 on both sides).

Preferably, I want to avoid aggregating lanes and make do with only one lane.
Do you think it'll be feasible to transfer 6.8 Gb/s of effective data over a ~10 Gb/s Aurora interface ?

Short answer yes.

The periods between "data" will always be filled with IDLE K-codes so sending less data just means sending more IDLEs.

- - - Updated - - -

It just dawned on me that maybe it's the amount of overhead you are worried about?

With a 10 Gbps raw link rate and 64b/66b encoding you'll have 9.697 Gbps of data bandwidth. Now Aurora's protocol uses some bandwidth (which I'm too lazy to look up) but I'm certain it's not so excessive that it would drop that 9.697 Gbps down to below 6.8 Gbps.
 
  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
And what about the overall efficiency?
If we observe the bus for a long period of time and count X number of bits that moved from side to side - what % of X will be actual data? Are the 2 extra bits (64b/66b) the only overhead ?
 

that and the aurora protocol itself, which like I said I'm too lazy to look up.

Just a guess but I suppose it's probably made up of start and stop k-codes and perhaps another extra code? so another extra 3x66 bits for every packet of data perhaps?

Okay from the Aurora document for the v11.0 core (PG074)
Capture.JPG
Page 15 shows the typical channel frame, which looks like it has a SEP, count followed by the data bytes.

So it looks like you would be safe assuming a 10% overhead, so expect you can get about 9 Gbps through the link.

If you have data which is not some multiple of 64-bits to send you will have to break up that data into a chunk which is not a full 64-bits and that will result in more overhead unless you add extra logic to pack the next frame into the partially filled frame.

I've never used Aurora, but have used various proprietary transceiver designs in the past, so I have dealt with all of these issues without the benefit of a pre-canned solution.
 
  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
The slower version of Aurora used 8b/10b encoding.
The 10G capable one uses 64b/66b instead.

I don't understand why...
8b/10b yeilds much stronger scrambling than 64b/66b (per message size) and one would expect that higher link speeds will require stronger scrambling.
 

I don't believe the encoding scheme is for strong scrambling, it's specifically designed for both clock recovery and DC balancing. You have another question about HDMI using 8b10b, which it doesn't use the same 8b10b encoding scheme it actually uses a different scheme which is meant to be used with a DC coupled connection and minimizes transitions unlike the Aurora standard, which uses AC coupling and the standard 8b10b encoding.

To protect a link you would be better off using some sort of forward error correction and interleaving.
 
  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
You're certainly not going to get good DC balancing with only 2 bits to play with.

Another question on the same subject.
How does the clock recovery with 64b/66b work?
Does the PLL lock to the bit transition that takes the least amount of time and assumes it to be the clock period?
 

You're certainly not going to get good DC balancing with only 2 bits to play with.

Another question on the same subject.
How does the clock recovery with 64b/66b work?
Does the PLL lock to the bit transition that takes the least amount of time and assumes it to be the clock period?
Don't know, I assume it finds some sort of beat frequency and locks to it.

Reading the 64b66b wiki page I learned something new about 64b66b, which is that it uses instead a two bit preamble "01" 64-bit data and "10" for 8-bit type + 56-bit data. The data is scrambled with an LFSR to produce a sequence that has approximately equal number of 1's and 0's in the sequence. It's definitely not table based like 8b10b.

Take a look at that wiki page interesting read. Really interesting is the statistical nature of the DC balancing and the run length (guaranteed to be no more than 65 bits as there has to be a transition every 66-bits due to the preamble.
 
  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
I think the only overhead in the Aurora protocol is related to the framing. This means that the frame size should not be too small if you want to have a small overhead.
If you have a fixed frame size you can pack several of them into a "super frame" that you deliver to the Aurora block.
Watch out for the default DFE adaptation mode when using data with too little randomness. It can be unstable even under ideal circumstances.
The DFE adaptation is optimized for electrical lines with frequency-dependant losses. If you only have short electrical wires (to the other FPGA or to an optical transceiver), you can use the LPM mode instead, where the 8b/10b coding is enough even with non-random data.

For the DFE mode, there should not be repeating patterns shorter than 3000-4000 bits (depends the transceiver generation).
If you really need the DFE mode and sometimes have repeating patterns, the adaptation should be "freezed" when the data is non-random.

http://www.xilinx.com/support/answers/56894.html

The simple solution is to use the LPM mode when it is good enough.
 
  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top