eg, each module accepts din, din_valid and provides dout, dout_valid. The exact names may vary based on who writes the module. eg, Xilinx likes using "ready" and "new_data" in a lot of their cores.
Ah okay, I understand what you mean now.
There are several interface strategies, and really, defining and understanding the interfaces in a design can go a long way to solving some issues. for example, in the above example a valid input might propogate through a pipeline, or the design might use "valid" as a clock enable. In the first case, the input data can stop, in the second case it cannot (any data in the pipeline will not be flushed). Sometimes, moving to interfaces that allow processing of blocks of data at a time can be very advantageous, as the control logic can run very slow as it only needs to make decisions every (eg) 1024 cycles.
In this particular pipeline I have a continuous datastream, where the pipeline output sometimes is used or is discarded, depending on some trigger conditions. The discard rate (~ 10%) is low enough and the pipeline depth high enough that this is the best I can think of.
Can you give some examples of nets that are failing?
See below for what I am using now. This very same code would fail if I use only the top level clock enable. Think "USE_LOCAL_CE=0" on all levels in the code below.
For the bufg's, you can also use 1x clock, and both edges of a divided clock, as long as you have logic to start processing on the correct edge. (eg, the first divided clock edge after reset will be either rising or falling, and the logic should be set up to handle either case). This might have other issues in your design.
This would still need an extra clock, right? Which admittedly is better than the previous 2 extra...
I still need the full 375 MHz clock for the fast part, and then I would need a slow clock at 187.5 MHz. And then use the slow clock posedge for the even clock cycles and the slow clock negedge for the odd clock cycles. But it is an idea worth considering for this particular design. Two extra global clocks was a nogo. One extra might just be doable.
Of course this does not solve the general problem of how the hell to properly use clock enables.
Because suppose that some part of the design is a perfect match for multi-cycle operation on three cycles? And we want full throughput so that means three copies of the circuit running in parallel. With 3 there is no such luck of using boith edges, so it would mean 1 full speed clock, and then 3 slow clocks each with a 120 degree phase shift.
On the other hand with clock enables I could do a pre-loaded SRL that is preloaded with 100, 010 and 001 for the respective phases. These SRL's could then be done in several local copies to keep the routing delay to the CE pins low. Again, conceptually simple. Doing this by hand would be a pain to maintain, doing this with the tools is ARRRRGH with my current understanding of the tools.
Currently I am just providing the affected modules with the ability to generate their own local clock enables. So like this, using the USE_LOCAL_CE parameter:
Code:
module count_192_ones_ce #(
parameter USE_LOCAL_CE = 0, // Generate a local flip-flop for clock enable? 0=NO, 1=YES
LOCAL_CE_INIT = 0 // Start local clock enable at EVEN or ODD cycle? 0=EVEN, 1=ODD
) (
input clk, // IN: system clock
input ce, // IN: clock enable for multi-cycle path operation
input [191:0] ones_in, // IN: taps after synchronization
output reg [7:0] count // OUT: number of ones in the "taps" input. Latency 10 (5 deep, 2 cycles each)
);
// Check parameter values + initialize output registers
initial begin
if ((USE_LOCAL_CE < 0) || (USE_LOCAL_CE > 1)) begin
$display("DRC ERROR: Illegal value %d for USE_LOCAL_CE parameter. Should be 0 or 1.", USE_LOCAL_CE);
$finish;
end
if ((LOCAL_CE_INIT < 0) || (LOCAL_CE_INIT > 1)) begin
$display("DRC ERROR: Illegal value %d for LOCAL_CE_INIT parameter. Should be 0 or 1.", LOCAL_CE_INIT);
$finish;
end else begin
count = 0;
end
end
reg local_ce = (LOCAL_CE_INIT);
always @(posedge clk) begin
local_ce <= (~local_ce);
end
wire active_ce = (USE_LOCAL_CE == 1) ? (local_ce) : (ce); // unused one (either "ce" or "local_ce") will be optimized away
wire [95:0] part1 = ones_in[95:0];
wire [95:0] part2 = ones_in[191:96];
wire [6:0] count_part1;
wire [6:0] count_part2;
count_96_ones_ce #(
.USE_LOCAL_CE (1),
.LOCAL_CE_INIT (LOCAL_CE_INIT)
) count_96_ones_part1 (
.clk (clk),
.ce (active_ce),
.ones_in (part1),
.count (count_part1)
);
count_96_ones_ce #(
.USE_LOCAL_CE (1),
.LOCAL_CE_INIT (LOCAL_CE_INIT)
) count_96_ones_part2 (
.clk (clk),
.ce (active_ce),
.ones_in (part2),
.count (count_part2)
);
always @(posedge clk) begin
if (active_ce) begin
count <= (count_part1) + (count_part2);
end
end
endmodule // count_192_ones_ce
And then further down the tree the same structure.
Code:
module count_96_ones_ce #(
parameter USE_LOCAL_CE = 0, // Generate a local flip-flop for clock enable? 0=NO, 1=YES
LOCAL_CE_INIT = 0 // Start local clock enable at EVEN or ODD cycle? 0=EVEN, 1=ODD
) (
input clk, // IN: system clock
input ce, // IN: clock enable for multi-cycle path operation
input [95:0] ones_in, // IN: taps after synchronization
output reg [6:0] count // OUT: number of ones in the "taps" input. Latency 10 (5 deep, 2 cycles each)
);
// Check parameter values.
initial begin
if ((USE_LOCAL_CE < 0) || (USE_LOCAL_CE > 1)) begin
$display("DRC ERROR: Illegal value %d for USE_LOCAL_CE parameter. Should be 0 or 1.", USE_LOCAL_CE);
$finish;
end
if ((LOCAL_CE_INIT < 0) || (LOCAL_CE_INIT > 1)) begin
$display("DRC ERROR: Illegal value %d for LOCAL_CE_INIT parameter. Should be 0 or 1.", LOCAL_CE_INIT);
$finish;
end else begin
count = 0;
end
end
reg local_ce = (LOCAL_CE_INIT);
always @(posedge clk) begin
local_ce <= (~local_ce);
end
wire active_ce = (USE_LOCAL_CE == 1) ? (local_ce) : (ce); // unused one (either "ce" or "local_ce") will be optimized away
wire [47:0] part1 = ones_in[47:0];
wire [47:0] part2 = ones_in[95:48];
wire [5:0] count_part1;
wire [5:0] count_part2;
count_48_ones_ce #(
.USE_LOCAL_CE (0),
.LOCAL_CE_INIT (LOCAL_CE_INIT)
) count_48_ones_part1 (
.clk (clk),
.ce (active_ce),
.ones_in (part1),
.count (count_part1)
);
count_48_ones_ce #(
.USE_LOCAL_CE (0),
.LOCAL_CE_INIT (LOCAL_CE_INIT)
) count_48_ones_part2 (
.clk (clk),
.ce (active_ce),
.ones_in (part2),
.count (count_part2)
);
always @(posedge clk) begin
if (active_ce) begin
count <= (count_part1) + (count_part2);
end
end
endmodule // count_96_ones_ce
So at every level I can decide to keep the clock enable from the upper level module, OR to create a local clock enable and propagate that. This
does work, and preferably I would like something a bit neater than this. But since I'm pretty new to this, this is the best I could come up with that actually works. So any better approaches are welcome.