The complexity I'm referring to has nothing to do with the amount of logic required to implement something. It is the amount of effort required to understand every facet of the operation of the circuit. Subtle problems are difficult to detect or find in clever/tricky circuits and may appear out of nowhere when large production runs are done or different lot codes are used.Hi ads-ee,
What you describe looks about the same complexity as what I had in mind.
Aparently you are strongly against using both clock edges even in the very simple form I explained.
Visio, I created a set of waveform shapes that I use for drawing timing diagrams.Few additional questions.
1. What is the tool you have used to draw the waveforms?
You can count up FFs and then estimate the number of logic resources (e.g. 4-input LUTs, 6-input LUTs, # of product terms, etc). Or just pick a big part and synthesize and implement it and see how many resources it takes and then rebuild with the smallest part that can hold the logic.2. I plan to validate this using a FPGA development board we have inhouse.
Is there a free tool which will asses how many resourcse (FFs , etc) will be required and eventually suggest a minimum Xilinx/Altera chip which will fit the design.
Obviously I can test in ISE but is this my only option?
Sorry didn't have the file available where I was this weekend.Yes please share your code. It will help me.
Code Verilog - [expand] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 module a ( output sdo, input sdi, input [2:1] fsync, input [2:1] clk ); reg [3:0] icnt; reg [15:0] sr; wire shift_en = fsync[1] | |icnt; reg [1:0] shf_pipe; reg shf_fedge; reg toggle = 0; reg [1:0] sync_pipe; reg sample_strb; reg [15:0] tx_buf; reg [4:0] ocnt; reg [15:0] out_sr; always @ (posedge clk[1]) begin // when fsync arrives in clk[1] domain initialize the counter // the counter halts the shifting of sdi data when the s2p // conversion is complete. This holds the 16-bit parallel data // constant for 16 more clock cycles. if (fsync[1]) begin icnt <= 15; end else if (icnt != 0) begin icnt <= icnt -1; end // serial to 16-bit parallel conversion for transfer to // clk[2] domain if (shift_en) begin sr <= {sr[14:0], sdi}; end // falling edge detector logic to drive a toggle status shf_pipe <= {shf_pipe[0], shift_en}; shf_fedge <= (shf_pipe == 2'b10); // toggle status let's us cross clock domains with a single // signal that only has one transition per transfer. // note: shf_pipe == 2'b10 could be used instead of shf_fedge // reducing the FF count by 1. Another FF could be removed by // using the !shift_en & shf_pipe[0] only, though that increases // the combo logic to toggle generation. if (shf_fedge) begin toggle <= ~toggle; end end always @ (posedge clk[2]) begin // synchronizer and both edge detector to sample the stable // parallel sr data. sync_pipe <= {sync_pipe[0], toggle}; sample_strb <= ^sync_pipe; // parallel sample transfer buffer from clk[1] to clk[2] domain // data is transferred ~3-4 clock cycles after it's stopped shifting. // note: ^sync_pipe could be used instead to reduce latency by 1 // clock cycle or to reduce resource usage by 1 FF. if (sample_strb) begin tx_buf <= sr; end // synchronized counter to fsync[2] so we can determine if the // output shift register can be loaded. if (fsync[2]) begin ocnt <= 0; end else begin ocnt <= ocnt +1; end // load the output shift reg at the rising edge of fsync[2] and shift out // the 16-bits of data to the receiver. if (ocnt == 30) begin out_sr <= tx_buf; end else begin out_sr <= {out_sr[14:0], 1'b0}; end end assign sdo = out_sr[15]; endmodule
Code Verilog - [expand] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 `timescale 1ps/1ps module a_tb; reg [2:1] clk; reg [2:1] fsync; reg [15:0] sdi; wire sdo; initial begin clk[1] = 0; forever begin clk[1] = #50000000 ~clk[1]; // 100 us period end end initial begin clk[2] = 0; forever begin clk[2] = #55000000 ~clk[2]; // 110 us period end end reg [4:0] cnt1 = 0; reg [4:0] cnt2 = 0; always @ (posedge clk[1]) begin // protocol 1 timer cnt1 <= cnt1 +1; // generate an fsync fsync[1] <= (cnt1 == 0) ? 1'b1 : 1'b0; // generate data if (cnt1 == 0) begin sdi <= $random; end else begin sdi <= {sdi[14:0], 1'b0}; end end always @ (posedge clk[2]) begin // protocol 2 timer cnt2 <= cnt2 +1; // fsync 2 generator fsync[2] <= (cnt2 == 0) ? 1'b1 : 1'b0; end a uut ( .sdo (sdo), .sdi (sdi[15]), .fsync (fsync), .clk (clk) ); endmodule
Update:
A Xilinx XC9572 fits the uni-directional design I created, so a xc95108 might fit the bi-direction design if the counters are modified to allow them to be resource shared for both directions. This would involve modifying the shift_en generation to use a count range instead of a terminal value.
Otherwise the design would fit in the xc95144 with a duplicate instantiated.
Then all I can say is be through in your testing in simulation don't fudge with checking anything (corner cases, clock phasing for both fast and slow transfers, phase skew for identical clock frequencies, etc) and then run a full SDF back annotated gate level simulation using the same testcases from the functional simulation.Thanks for the code.
Meanwhile I have been implemented my version which seems to work fine.
Typically the PCM interface is driving outgoing data on one of the clock edge and sample incoming data on the opposite edge.
So I had to use dual clock edge design anyway.
Then that means your design is more complex/bigger, a single uni-directional version of my design fits with room to spare ~45% utilization.Dora said:The final implementation will be on XC95144XL however or some other similarly priced CPLD/FPGA
I tried to fit my design in XC95144XL and managed but only if set fitting "Optimize Density" otherwise it doesn't fit.
So right now I am into thinking how to optimize my design.
it's a bitwise reduction operator, very handy. &data (data is all ones), ~|data (data is all zeros)=Dora]On your code I see
sample_strb <= ^sync_pipe;
Is this xor between the two bits of sync_pipe and result in sample_strb?
I haven't know about this unitary way of using the bitwise operators.
The tools when told to "Optimize Density" probably shared the counters to make it fit.Dora said:In my implementation I don't share any resourses.
I have two modules (with interface similar as your module "a" ) One for each direction.
Obviously if I impliment a module which implements both direction at once I will be able to share the counters.
However this will make my design more complex.
I really like the fact I split the problem in two simpler unidirection modules.
Any comments about this?
We use cookies and similar technologies for the following purposes:
Do you accept cookies and these technologies?
We use cookies and similar technologies for the following purposes:
Do you accept cookies and these technologies?