# [moved] clock domains crossing

Status
Not open for further replies.

#### dora

##### Full Member level 3
Hi Gents,

I have the following engineering issue which I would like to solve using CPLD in Verilog.

I have a 16 bit data bus and I get data stream on this bus coming at f1=10KHz clock.
I want to pass this data stream to another asynchroneous f2=10Khz clock domain.
Each coming word in the f1 domain I need to pass to f2 domain,
and if f1 and f2 get a tick close enought so the input data is not 'stable' at the f2 tick
then I want to pass the previous word into f2.
Will try to ilustrate this

Code:
  word0        word1      word2            word3
f1|            |          |                |
f2   |            |            |           |
word0        word1        word2       word2
What will be the esisest way to do this?

Thanks
Dora

##### Super Moderator
Staff member
Based on your description you'll lose some data (e.g. word3 gets skipped when word 4 shows up). Is that really what you want the behavior to be? Or is this a case of the clocks not having the exact same frequency and the edges can drift?

To avoid the skipped data you should double or triple buffer the f1 data and read one of the older data words (which you know is stable). This is basically making yourself a very shallow FIFO with FFs.

#### dora

##### Full Member level 3

Thanks for the response.

f1 and f2 are the same but not derived from the same source.
So yes they can drift a bit.
if f1 is 100ppm faster than f2 at 10khz I will have to miss 1 word each second.
if f2 is 100ppm slower than f2 at 10khz I will have to repeate 1 word each second.
I don't have other option without usng long FIFOs and hoping both frequencies average to the same value
before the FIFO gets empty/full, correct?

Yes I am OK to skip or double data word once in a while but I want to get consistent data at f2.
I mean I want to be sure all data at f2 are correct data which have been available at f1 at certain moment.
I want to be metastable free.

Thaking this into account do you think this can be implemented in a small size CPLD?
Do you have recomendations in this respect?

Thanks
Dora

##### Super Moderator
Staff member
I don't have other option without usng long FIFOs and hoping both frequencies average to the same value before the FIFO gets empty/full, correct?
a long FIFO would only postpone the inevitable loss of data. So you're going to lose data, but if you already know that and the system design is okay with the data loss then that shouldn't be an issue.

Yes I am OK to skip or double data word once in a while but I want to get consistent data at f2.
I mean I want to be sure all data at f2 are correct data which have been available at f1 at certain moment.
I want to be metastable free.

Thaking this into account do you think this can be implemented in a small size CPLD?
Do you have recomendations in this respect?
The problem is a small size CPLD will probably be inadequate, you really need to perform clock domain crossing on control signals, which means you have to buffer too much data for a small CPLD. You will likely need at least 4 words of storage given the 2 FF synchronizers and all the control logic that a FIFO would normally have to pass the addresses between domains.

Can you perform the transfer using a higher frequency clock? Then it would be relatively easy to sample both f1 and f2 clocks and capture the f1 data in the higher frequency clock domain and resynchronize it to the f2 clock domain.
e.g. assuming rising edge f1 and f2 and a higher speed transfer clock.
1. synchronize and falling edge detect f1
2. sample f1 data
3. synchronize and falling edge detect f2
4. updated f2 data.
all transfers will be done on the falling edge of the f1 & f2 clocks so there won't be any meta-stability problems with the data being transferred to/from the higher speed transfer clock. You would likely want to use at least a 10x clock, which gives you 5 clocks per half cycle of the f1/f2 clocks (2 of those clocks will be needed for syncrhonization and another for edge detection, leaving 2 capturing the data from the f1 domain)

This kind of problem is why I avoid having interfaces like this. I prefer interfaces that have source synchronous output clocks and are designed for burst operation. Then you don't have to do something ugly like above to transfer data.

dora

### dora

Points: 2

#### dora

##### Full Member level 3

Well I actually may have faster clocks. In order to simplify the question I have presented it not in its pure form.
What I am actually trying to do is to glue two PCM (TDM) interfaces.
Both PCMs have:
-256KHz clock
-8KHz frame sync
-2 x 16bit time slot (I am intersted in the first time slot only TS0)

The issue is that both should work in master mode.
They both generate clk and frame_sync. Both PCMs are asyncroneous.

So what I have to do is to read TS0 from one of the PCM and pass it to the other and vice versa.
So I actually have one word (TS) each 8KHz but at the same time I have also 256KHz which
may be used for oversampling.

Probably there is a typical solution for this problem?

Thanks
Dora

#### pbernardi

##### Full Member level 2
I see this as very possible.

Let's call FSC1, FSC2 for framesyncs and CLK1 and CLK2 for clocks. I see a HW doing this:

1) 2x shift-registers (CLK1/CLK2), 2x counters from 0 to 15 (CLK1/CLK2). When FSC1/FSC2 edge is detected, data is stored on shift registers using the counters.
2) Once the counter goes to 0 again, you move the data from shift register to a buffer (in both cases, meaning 2x 16-bits buffer).
3) When FSC2/FSC1 is detected (note you are on opposite clock domain now), you read the data from buffers, serialize it (2x shift registers(CLK2/CLK1) + 2x counters (CLK2/CLK1) again) and send the data.

If you had a FPGA, a BRAM could be used - it can work with two clocks domains easily.

##### Super Moderator
Staff member
Typical solution? I've never seen one, but that doesn't mean someone somewhere hasn't designed something very similar. I've never had to do what you seem to be attempting.

I'm not sure from your description, how both master devices can write to each other, who decides which one writes to the other one? What part is this, so I can look at a datasheet?

Regardless, I think my proposal for using an over sampled clock to perform the transfer (synchronously) by synchronizing from and to the PCMs should work and would probably be my first attempt if I was putting it in a smallish device.

- - - Updated - - -

I see this as very possible.

Let's call FSC1, FSC2 for framesyncs and CLK1 and CLK2 for clocks. I see a HW doing this:

1) 2x shift-registers (CLK1/CLK2), 2x counters from 0 to 15 (CLK1/CLK2). When FSC1/FSC2 edge is detected, data is stored on shift registers using the counters.
2) Once the counter goes to 0 again, you move the data from shift register to a buffer (in both cases, meaning 2x 16-bits buffer).
3) When FSC2/FSC1 is detected (note you are on opposite clock domain now), you read the data from buffers, serialize it (2x shift registers(CLK2/CLK1) + 2x counters (CLK2/CLK1) again) and send the data.

If you had a FPGA, a BRAM could be used - it can work with two clocks domains easily.

The OP doesn't say this is serial or parallel. I thought about double buffering, but there could be issues with drifting clock phase. Neither clock is locked to the other, I'm assuming the device generates it's own clock. So you could run into meta-stability issues if the end of the shift and transfer from say FSC1 updating the buffered register (that FSC2 reads) right at time when FSC2 has a rising edge then you could end up with a meta-stable event.

The only way to guarantee that you don't have a meta-stable event is to make absolutely sure the data stays stable for multiple clock cycles when transferring between clock domains with the same frequency that may have drift in phase due to differences in their ppm specs and jitter. I've usually built small 4 entry FIFOs in those cases, but then they were implemented in an FPGA so I would use a distributed RAM FIFO to do it (16 deep). I would only resort to something like I suggested with the over-sampled clock when I can't afford the logic required to implement a FIFO. I doubt the OP's CPLD can implement a FIFO, if it can then it is probably just a really small FPGA.

#### dora

##### Full Member level 3
Hello,

I'm not sure from your description, how both master devices can write to each other, who decides which one writes to the other one? What part is this, so I can look at a datasheet?
Well both PCMs has a separate data_in and data_out wire so it is full duplex comunication.
From the technical perspective the issue for the both directions is exactly the same so we can think of a solution for one of the directions only.

The OP doesn't say this is serial or parallel. I thought about double buffering, but there could be issues with drifting clock phase. Neither clock is locked to the other, I'm assuming the device generates it's own clock. So you could run into meta-stability issues if the end of the shift and transfer from say FSC1 updating the buffered register (that FSC2 reads) right at time when FSC2 has a rising edge then you could end up with a meta-stable event.

In PCM/TDM all is serial. So we have 1 wire for all signals clk, frame_sync, data_in, data_out. And yes this is the issue I am trying to solve.

Probably I can try to sample frame_sync2 by the clk1. If frame_sync2 is detected (I am not sure this can be done metastable way as frame_suncs are only one period of the clk long?)
then the data is transfered to the second domain syncroneously with clk1 which should be OK.

Dora

#### pbernardi

##### Full Member level 2
If you can have a latency of 1-2 framesyncs, you can double buffer the data between the transfers, making the buffers acts as a kind of pipeline.

You will need also a logic that garantee that the second buffer is never written while a transfer is being done. This way I think the meta stability can be avoided, at latecy cost of course.

#### dora

##### Full Member level 3
Hi pbernardi,

Yes 1,2 frames latency is acceptable.
Can you elaborate on your idea.
How to garantee that the second bufer is never written on the transfer in CLK2/frame_sync2?

Thanks
Dimitar

#### pbernardi

##### Full Member level 2
From my idea from post #6.

Add an additional step between 2) and 3), with an additional buffer. Basically, you have the following (from f1 to f2 step, note you will need an additional step from f2 to f1):

1) When FSC1 is detected, de-serialize 16 bits, using a counter from 0 to 15.
2) When counter is 0 again, move the 16 bits to the 1st buffer.
3) On next FSC1, you move the data from 1st to 2nd buffer.
4) On next FSC1, you will move the data from 2nd buffer to a shift register, but ONLY if no transfer is being done. This shift register trigger the transmission by FSC2, so:
a) When FSC2 occurs, it means that on next 16 clocks you will be transmitting data. Set a flag during this time and obviousy, transmit the data (count 0 to 15 again)
b) if FSC1 occurs and the flag is not active (so, you´re not transmitting), you move from 2nd buffer to shift register.
c) if FSC1 occurs and the flag is active, you need to wait the flag to be unset until the transmission. When the flag is unset, move from 2nd buffer to shift register.

The same thing must be done on opposite side (f2 to f1).

With the second buffer, you garantee the data is stable even if you delay the transfer because f2 is transmitting. I am not 100% sure if the second buffer is really required, however. You may check this in your simulation.

dora

### dora

Points: 2

#### dora

##### Full Member level 3
Hi pbernardi,

I see the idea.
In my opinion the only weak point is at
>4) On next FSC1, you will move the data from 2nd buffer to a shift register, but ONLY if no transfer is being done. This shift register trigger the transmission by FSC2, so:
What will happen if FSC2 occurs at the same time with FSC2? So the check for " no transfer is being done" should be done in CLK1 domain but is set in the CLK2 domain.
But probably I can set the 'Transmission flag' (you are talking about ) one CLK2 before the actual trasfer. (I can do that because I know that I have 32 CLK2 periods between each FSC2)
What you Gents thing, it will work?

On another idea.
If I manage to sample FSC2 by the CLK1 then I think I have solve the whole task.

FSC2 is about one CLK1 period long
CLK1 is abpout 50% duty cycle.
What I know is that I will have at least one CLK1 edge (positive or negative) in a place FSC2 is stable.
So if I sample on both CLK1 edges I know that one sampling will be metastable free.
Of course it is not an issue to be to take a decision about the FSC2 sample few CLK1 clocks later.

Any common way this can be solved?

Thanks
Dora

#### pbernardi

##### Full Member level 2
Hi pbernardi,

What will happen if FSC2 occurs at the same time with FSC2? So the check for " no transfer is being done" should be done in CLK1 domain but is set in the CLK2 domain.
But probably I can set the 'Transmission flag' (you are talking about ) one CLK2 before the actual trasfer. (I can do that because I know that I have 32 CLK2 periods between each FSC2)
What you Gents thing, it will work?

I think this works, you should implement and test the idea.

##### Super Moderator
Staff member
Well if you go with pbernardi's solution, then you better use a really good testbench that changes the frequencies of the clocks between CLK1 < CLK2 and CLK1 > CLK2 by a small fraction for an entire clock period or two. Let it sweep across all phases, transferring a prbs sequence across the two domains. I would also suggest adding code to monitor the data transition and the capturing clock edge and make the simulation either stop or force the register output to X if the data transitions near the capture clock edge. I actually think this design will have problems, unless you add a bunch of band-aids, bailing wire, and duct tape to fix various issues.

I still think my original suggestion of using a higher frequency sampling clock is both simpler and will use less logic. It also has less latency and as the decision to transfer or not is all in 1 clock domain you won't have issues with the clock edges sweeping through the various phases.

#### dora

##### Full Member level 3

Yes I know what you are saying and I tend to agree with you.
If the BOM cost appears to be accetable we will rather go to external oscilator solution.

Thanks
Dora

##### Super Moderator
Staff member
Basically to do as pbernardi suggests you would have to guarantee that the data stays stable for 3-4 clock cycles of the receiving clock to allow you to resample it into the receiving clock domain, which then gets transferred to the shift register.

So you have architecturally a c1 shift reg, c1 buffer register (holds data stable for 32 clocks), a toggling transfer signal (switches state for each new word), c2 domain synchronizer (2 FFs) for the toggle signal, c2 edge detector (1 FF) of the synchronized toggle, a c2 buffer register, a c2 shift register.
So you have over 131 registers that are required to implement this (in one direction).

If you do it this way you won't have any metastability problems as the transfer from c1 to c2 domain occurs around 4 clocks past the transfer to the c1 buffer register. The dropping of data (if the frequency of c2 < c1) will occur in the c2 buffer as it won't be transferred to the c2 shift register if it gets updated prior to the shift completing. It will also eventually repeat the last transfer (if the frequency of c2 > c1).

As you can see there is quite a bit of logic required to implement this (for one way only), so it may still fit in a larger CPLD with 384 macrocells. I haven't worked it out like this, but I'm pretty sure you can reduce it to ~100 FFs (one way) using a faster clock to perform the transfer as you won't need the two dual double buffers only a shared double buffer for both sides as the decision to update the buffer is decided by the detected clock phase and the shift register state. This will likely allow you to use a smaller CPLD.

#### dora

##### Full Member level 3

Thanks for the clarification.

On another idea.
If I manage to sample FSC2 by the CLK1 then I think I have solve the whole task.

FSC2 is about one CLK1 period long
CLK1 is abpout 50% duty cycle.
What I know is that I will have at least one CLK1 edge (positive or negative) in a place FSC2 is stable.
So if I sample on both CLK1 edges I know that one sampling will be metastable free.
Of course it is not an issue to be to take a decision about the FSC2 sample few CLK1 clocks later.

Dora

##### Super Moderator
Staff member
I would advise against that type of design. You'll have to have both negdege and posedge always blocks and a bunch of combinational always blocks to combine the half results of the two opposite edge triggered always blocks, which you can't reliably sample with either edge of the clock (as they may toggle at the same frequency as the clock), or are you planing on using these half clock cycle signals as latch enables (not even sure you could build this in a CPLD)? Such a design will be by necessity more complex than a straightforward design using only a single edge.

To put it another way. You can use an easy to simulate, test, and analyze design, or go with a convoluted Rube Goldberg latch based design that may have a bunch of hidden bugs that will bite you after the design is in the field. Me I'd rather stay at home on the weekends and enjoy myself, rather than sit in the lab debugging a design that is being recalled due to intermittent field failures.

#### dora

##### Full Member level 3
Let me explain what I have in mind:

Yes I am thinking about two 'alywas' blocks for the two CLK1 edges. They are looking for the positive edge of FSYNC2
At least one of this blocks should detect the FSYNC2 posedge reliably as either positive or negative edge CLK1 will hit at the stable FSYNC2 logic high.
Each edge detector should extend the estimation of FSYNC2 at least to two CLK1 periods. (This is because all of my additional logic shouls be in sync
for example with CLK1 posedge and I want to be able to sample the FSYNC2 estimation)

I will have to combine the two FSYNC2 estimations using 'or' combinatorial logic to let say FSYNC2'.

FSYNC2' is in CLK1 domain and I want to use it only to know when it is not safe to change the buffer (let scall it BUF1) providing data to the CLK2 domain .
For example in CLK1 I can count 20 clocks from posedge FSYNC2' and then change BUF1 with the new data coming from PCM1 (note that I have 32 clocks between each frame sync in my PCMs and data I get/put are located in time slot 0 (TS0) so 16 bits just after the frame syncs, the next 16 bits (TS1) is idle period for me adn I want to move the BUF1 sometimes in TS1)

Do you see a reason this don't work?

Dora

##### Super Moderator
Staff member
Doing what you are attempting is almost a guaranteed disaster and means long nights and weekends in the lab frantically fixing a 11th hour problem found in QA testing.

Why are you insisting on making this so complicated? Remember the KISS rule.

Now that I understand the data format is 16-bits and there are 16-bit times of idle. There are some simplifications that can be done to the design...

1a. in CLK1 domain start a counter using the FSYNC1. (could be a down count from 15, with FSYNC1 loading the counter and the counter stops at 0)
1b. shift the CLK1 data into a 16-bit shift register (and hold)
2. when the counter reaches terminal count, delay 1 CLK1 clock cycle and flip the state of a toggle signal.
3. synchronize the toggle signal onto CLK2 domain.
4. edge detect (both rising and falling) the synchronized toggle signal in CLK2 domain.
5. sample the (stable) data in the 16-bit shift register using CLK2 and the edge detect pulse.
6. now decide (all in CLK2 domain) whether or not you load the CLK2 shift register and transmit the data.

This is a variation on what I previously proposed (for 32-bit data), but no longer requires two separate double buffering schemes. Just a single double buffer on receive side to simplify the decision logic for loading the output shift register. This is much simpler and doesn't rely on trying to implement some multi-clock edge Rube Goldberg design.

By detecting the trailing edge of the shift operation (which then holds the data, until the next FSYNC) and converting that into a signal that toggles every 32 clocks, you can transfer that across the CDC boundary and detect the edge of that signal. This will result in 3-4 clock periods of latency (stable shift register data) when it gets transferred the CLK2 domain. It also means you'll never get called in to fix a timing problem that suddenly appears in a percentage of the production units.

Your timing should look something like this...

The shft_en would be fsync | |counter, as it's slow logic this won't be a timing problem.
xfer_toggle1 is only shown with the falling edge, the next cycle it will go from low to high. The toggle would be generated by the trailing edge (falling edge detection) of shft_en.
Finally the sync_edge_detect is generated using a synchronizer followed by and edge detector. And that is the signal used to capture the entire contents of sreg1.

It shouldn't take more than 30 min to an hour to write the code and a perhaps a half day for writing the testbench and simulating the design with various phases of the clocks using faster/slower clock relationships.

dora

Points: 2