what highest clock freq. support by block ram in spartan3

k2w2yut · May 10, 2011

Hello,I am a newbie of FPGA and Verilog

My project is build simple 16-bit Pipeline CPU base on MIPS Architecture
I use separated Instruction Memory and Data Memory each have a 8-bit address and 16-bit data width

So because of 2 big memory array I use the block ram by make my module to read and write in synchronous mode driven by clock.

After I implement some stuff like data forwarding,branch predictor and test in simulation and use the clock that generate from built-in clock generator in spartan3 board I got

50 MHz for memory(Instruction and Data)
25 MHz for CPU(by using clock counter)

after that I want to test my design to get the highest clock that it can work on and use DCM

Code:

  wire                                  clkFXa, clka, locked1;
  DCM dcm1 (.CLKIN(clock), .RST(1'b0), .CLKFB(), .CLK0(), .CLKDV(), .CLKFX(clkFXa), .LOCKED(locked1));
  defparam dcm1.CLK_FEEDBACK       = "NONE";
  defparam dcm1.CLKFX_MULTIPLY     = 4;
  defparam dcm1.CLKFX_DIVIDE       = 5;
  defparam dcm1.CLKIN_PERIOD       = 20;
  BUFG buf1 (.I(clkFXa), .O(clka));

so I got the 40 MHz, then I improve my design and by the synthesize report I already acquire less timing constraint than before, But I cannot add more frequency to both CPU and Memory

and It stuck at 40MHz for CPU and 50MHz for Memory

treqer · May 10, 2011

bram is hi frec memory/ in virtex 4 400MHz. This parametr si DC AND SWICHING CHARACTERISTIC FOR S3/

---------- Post added at 19:54 ---------- Previous post was at 19:50 ----------

https://www.xilinx.com/support/documentation/data_sheets/ds529.pdf

Alexium · May 10, 2011

Spartan 3 BRAM works at 100 MHz with quite a margin (according to post-PAR report).

treqer · May 10, 2011

Clock Frequency spartan 3a
FBRAM Block RAM clock frequency 0 320 0 280 MHz

write full fpga name!

k2w2yut · May 10, 2011

Thank you everyone for the answers,Sorry for mis write

I use
Family : Spartan3
Device : XCS3S200
Package :FT256
Speed : -4

So It's surely that bram support more frequency than I have used (50MHz) but why I cannot add more frequency to memory clock (just like 40MHz to CPU and 60 MHz to Memory)

In memory module I only have always block that run for write and read

Thank you, k2w2

mrflibble · May 10, 2011

Read the link that treqer provided:

https://www.xilinx.com/support/documentation/data_sheets/ds529.pdf

speedgrade -4: 280 MHz max for the bram.

So the blockram is not the limiting factor. It sound like the rest of the design is the limiting factor.

As to why can you only go to 50 MHz ... look in the timing report and check what the slowest paths are.

k2w2yut · May 11, 2011

mrflibble said:
Read the link that treqer provided:

https://www.xilinx.com/support/documentation/data_sheets/ds529.pdf

speedgrade -4: 280 MHz max for the bram.

So the blockram is not the limiting factor. It sound like the rest of the design is the limiting factor.

As to why can you only go to 50 MHz ... look in the timing report and check what the slowest paths are.

Thank you for your suggestion :grin:

I really try to think about this too and I try to optimize my design for 3-4 days and expect that it should be better with following result.

This is from Synthesis Report >> Timing Summary

Code:

Timing Summary:
---------------
Speed Grade: -4

   Minimum period: 22.679ns (Maximum Frequency: 44.094MHz)
   Minimum input arrival time before clock: 12.525ns
   Maximum output required time after clock: 7.709ns
   Maximum combinational path delay: 11.560ns

//This is from the detail
Delay:               22.679ns (Levels of Logic = 21)
  Source:            pcpuwm1/pcpu/regWriteDst_MEMWB_1 (FF)
  Destination:       pcpuwm1/pcpu/ZF (FF)
  Source Clock:      clock rising
  Destination Clock: clock rising

the source and destination of slowest path is in the pcpu module but my implementation is

Code:

//clock is 50MHz

wire                                  clkFXa, clka, locked1;
  DCM dcm1 (.CLKIN(clock), .RST(1'b0), .CLKFB(), .CLK0(), .CLKDV(), .CLKFX(clkFXa), .LOCKED(locked1));
  defparam dcm1.CLK_FEEDBACK       = "NONE";
  defparam dcm1.CLKFX_MULTIPLY     = 4;
  defparam dcm1.CLKFX_DIVIDE       = 5;
  defparam dcm1.CLKIN_PERIOD       = 20;
  BUFG buf1 (.I(clkFXa), .O([U]clka[/U]));

pcpuwm pcpuwm1 (.[B]clock[/B]([U]clka[/U]), .[B]clock_mem[/B]([U]clock[/U]), .reset(NBTN[0]), .start(NBTN[1]), .stall(NBTN[2]),
	.sel(SW[4:0]), .y(outgr));

"clka" sent to drive pcpu
"clock" sent to drive memory

So the clock freq. have separated in each module. As my understanding I can add more freq. to the clock_mem because It's doesn't had effect from the clock or slowest path in the pcpu module.

Code:

// this is my memory module code
always @(posedge clk) begin
    if (en) begin 
      if (we)                        //Write Enable
	ram[addr]<=di;           //Update ram by di(Data Input)
      do <= ram[addr];          //Send data out via do(Data Out)
    end
  end

So sorry for my noob question again :-(
Thank you,k2w2

treqer · May 11, 2011

spartan 3 (not 3A)
https://www.xilinx.com/support/documentation/data_sheets/ds099.pdf

k2w2yut · May 11, 2011

treqer said:
spartan 3 (not 3A)
https://www.xilinx.com/support/documentation/data_sheets/ds099.pdf

I already look in to it, the document doesn't write actual clock frequency of BRAM but in the Block RAM Timing (table55 page85) its delay look nearly with spartan3A's so it should be working in the nearly frequency , Am I right??

treqer · May 11, 2011

spartan 3 is designed to build systems with 200 MHz. The calculation of the maximum frequency BRAM on the basis of 1 / (1.37 +1.37) gives too large a figure ))))))) But at 200 the memory should work

Another thing is that the memory layout when multiple modules is one big

permute · May 11, 2011

try setting up a UCF file, then getting the post-PAR timing. it might show more details where things are failing. It also gives a more realistic measure of the design. The synthesis report makes assumptions about routing that might not be true. the results after PAR will generally be a bit lower because of routing issues.

k2w2yut · May 11, 2011

permute said:
try setting up a UCF file, then getting the post-PAR timing. it might show more details where things are failing. It also gives a more realistic measure of the design. The synthesis report makes assumptions about routing that might not be true. the results after PAR will generally be a bit lower because of routing issues.

This is my last result
I use 40.3 MHz to PCPU and 50MHz to Memory

Code:

  wire                                  clkFXa, clka, locked1;
  DCM dcm1 (.CLKIN(clock), .RST(0), .CLKFB(), .CLK0(), .CLKDV(), .CLKFX(clkFXa), .LOCKED(locked1));
  defparam dcm1.CLK_FEEDBACK       = "NONE";
  defparam dcm1.CLKFX_MULTIPLY     = 25;
  defparam dcm1.CLKFX_DIVIDE       = 31;
  BUFG buf1 (.I(clkFXa), .O(clka));

 wire                                  clkFXb, clkb, locked2;
  DCM dcm2 (.CLKIN(clock), .RST(1'b0), .CLKFB(), .CLK0(), .CLKDV(), .CLKFX(clkFXb), .LOCKED(locked2));
  defparam dcm2.CLK_FEEDBACK       = "NONE";
  defparam dcm2.CLKFX_MULTIPLY     = 25;
  defparam dcm2.CLKFX_DIVIDE       = 25;
  BUFG buf2 (.I(clkFXb), .O(clkb));

	pcpuwm pcpuwm1 (.clock(clka), .clock_mem(clkb), .reset(NBTN[0]), .start(NBTN[1]), .stall(NBTN[2]),
	.sel(SW[4:0]), .y(outgr));

and I got synthesis report

Code:

Timing Summary:
---------------
Speed Grade: -4

   Minimum period: 20.064ns (Maximum Frequency: 49.842MHz)
   Minimum input arrival time before clock: 14.360ns
   Maximum output required time after clock: 7.709ns
   Maximum combinational path delay: 11.492ns

and this is from PAR report

Code:

Release 9.2.04i par J.40
Copyright (c) 1995-2007 Xilinx, Inc.  All rights reserved.

CADPC03::  Wed May 11 14:40:42 2011

par -w -intstyle ise -ol std -t 1 board_map.ncd board.ncd board.pcf 


Constraints file: board.pcf.
Loading device for application Rf_Device from file '3s200.nph' in environment C:\Xilinx92i.
   "board" is an NCD, version 3.1, device xc3s200, package ft256, speed -4

Initializing temperature to 85.000 Celsius. (default - Range: 0.000 to 85.000 Celsius)
Initializing voltage to 1.140 Volts. (default - Range: 1.140 to 1.260 Volts)

INFO:Par:282 - No user timing constraints were detected or you have set the option to ignore timing constraints ("par
   -x"). Place and Route will run in "Performance Evaluation Mode" to automatically improve the performance of all
   internal clocks in this design. The PAR timing summary will list the performance achieved for each clock. Note: For
   the fastest runtime, set the effort level to "std".  For best performance, set the effort level to "high". For a
   balance between the fastest runtime and best performance, set the effort level to "med".

Device speed data version:  "PRODUCTION 1.39 2007-10-19".


Device Utilization Summary:

   Number of BUFGMUXs                        2 out of 8      25%
   Number of DCMs                            2 out of 4      50%
   Number of External IOBs                  33 out of 173    19%
      Number of LOCed IOBs                  33 out of 33    100%

   Number of RAMB16s                         2 out of 12     16%
   Number of Slices                        665 out of 1920   34%
      Number of SLICEMs                      0 out of 960     0%



Overall effort level (-ol):   Standard 
Placer effort level (-pl):    High 
Placer cost table entry (-t): 1
Router effort level (-rl):    Standard 

WARNING:Par:288 - The signal BTN<1>_IBUF has no load.  PAR will not attempt to route this signal.
WARNING:Par:288 - The signal BTN<2>_IBUF has no load.  PAR will not attempt to route this signal.
WARNING:Par:288 - The signal BTN<3>_IBUF has no load.  PAR will not attempt to route this signal.

Starting Placer

Phase 1.1
Phase 1.1 (Checksum:98ac4b) REAL time: 2 secs 

Phase 2.7
Phase 2.7 (Checksum:1312cfe) REAL time: 2 secs 

Phase 3.31
Phase 3.31 (Checksum:1c9c37d) REAL time: 2 secs 

Phase 4.2
.....
..
Phase 4.2 (Checksum:26259fc) REAL time: 3 secs 

Phase 5.8
..................................................
........
..................................................
.............
..........
.....
Phase 5.8 (Checksum:aa7b01) REAL time: 9 secs 

Phase 6.5
Phase 6.5 (Checksum:39386fa) REAL time: 9 secs 

Phase 7.18
Phase 7.18 (Checksum:42c1d79) REAL time: 16 secs 

Phase 8.5
Phase 8.5 (Checksum:4c4b3f8) REAL time: 16 secs 

REAL time consumed by placer: 16 secs 
CPU  time consumed by placer: 16 secs 
Writing design to file board.ncd


Total REAL time to Placer completion: 17 secs 
Total CPU time to Placer completion: 17 secs 

Starting Router

Phase 1: 4967 unrouted;       REAL time: 17 secs 

Phase 2: 4706 unrouted;       REAL time: 17 secs 

Phase 3: 2287 unrouted;       REAL time: 18 secs 

Phase 4: 2287 unrouted; (1334)      REAL time: 18 secs 

Phase 5: 2323 unrouted; (0)      REAL time: 19 secs 

Phase 6: 0 unrouted; (5676)      REAL time: 28 secs 

Phase 7: 0 unrouted; (5676)      REAL time: 29 secs 

Updating file: board.ncd with current fully routed design.

Phase 8: 0 unrouted; (3437)      REAL time: 32 secs 

Phase 9: 0 unrouted; (2872)      REAL time: 49 secs 

Phase 10: 0 unrouted; (2872)      REAL time: 49 secs 

Phase 11: 0 unrouted; (0)      REAL time: 50 secs 

WARNING:Route:455 - CLK Net:clock_IBUFG may have excessive skew because 
      6 CLK pins and 0 NON_CLK pins failed to route using a CLK template.
WARNING:Route:455 - CLK Net:clock_counter<10> may have excessive skew because 
      0 CLK pins and 1 NON_CLK pins failed to route using a CLK template.

Total REAL time to Router completion: 50 secs 
Total CPU time to Router completion: 50 secs 

Partition Implementation Status
-------------------------------

  No Partitions were found in this design.

-------------------------------

Generating "PAR" statistics.

**************************
Generating Clock Report
**************************

+---------------------+--------------+------+------+------------+-------------+
|        Clock Net    |   Resource   |Locked|Fanout|Net Skew(ns)|Max Delay(ns)|
+---------------------+--------------+------+------+------------+-------------+
|                clka |      BUFGMUX0| No   |  226 |  0.004     |  1.014      |
+---------------------+--------------+------+------+------------+-------------+
|                clkb |      BUFGMUX3| No   |    2 |  0.000     |  1.011      |
+---------------------+--------------+------+------+------------+-------------+
|         clock_IBUFG |         Local|      |    8 |  0.697     |  1.854      |
+---------------------+--------------+------+------+------------+-------------+
|   clock_counter<10> |         Local|      |   10 |  0.646     |  3.132      |
+---------------------+--------------+------+------+------------+-------------+

* Net Skew is the difference between the minimum and maximum routing
only delays for the net. Note this is different from Clock Skew which
is reported in TRCE timing report. Clock Skew is the difference between
the minimum and maximum path delays which includes logic delays.


   The Delay Summary Report


The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is: 0

   The AVERAGE CONNECTION DELAY for this design is:        1.487
   The MAXIMUM PIN DELAY IS:                               4.911
   The AVERAGE CONNECTION DELAY on the 10 WORST NETS is:   4.476

   Listing Pin Delays by value: (nsec)

    d < 1.00   < d < 2.00  < d < 3.00  < d < 4.00  < d < 5.00  d >= 5.00
   ---------   ---------   ---------   ---------   ---------   ---------
        1596        2034        1124         245          37           0

Timing Score: 0

Asterisk (*) preceding a constraint indicates it was not met.
   This may be due to a setup or hold violation.

------------------------------------------------------------------------------------------------------
  Constraint                                |  Check  | Worst Case |  Best Case | Timing |   Timing   
                                            |         |    Slack   | Achievable | Errors |    Score   
------------------------------------------------------------------------------------------------------
  Autotimespec constraint for clock net clo | SETUP   |         N/A|     4.215ns|     N/A|           0
  ck_IBUFG                                  | HOLD    |     1.124ns|            |       0|           0
------------------------------------------------------------------------------------------------------
  Autotimespec constraint for clock net clo | SETUP   |         N/A|    11.934ns|     N/A|           0
  ck_counter<10>                            | HOLD    |     1.030ns|            |       0|           0
------------------------------------------------------------------------------------------------------
  Autotimespec constraint for clock net clk | SETUP   |         N/A|    21.399ns|     N/A|           0
  a                                         | HOLD    |     0.800ns|            |       0|           0
------------------------------------------------------------------------------------------------------


All constraints were met.
INFO:Timing:2761 - N/A entries in the Constraints list may indicate that the 
   constraint does not cover any paths or that it has no requested value.


Generating Pad Report.

All signals are completely routed.

WARNING:Par:283 - There are 3 loadless signals in this design. This design will cause Bitgen to issue DRC warnings.

Total REAL time to PAR completion: 52 secs 
Total CPU time to PAR completion: 52 secs 

Peak Memory Usage:  141 MB

Placement: Completed - No errors found.
Routing: Completed - No errors found.

Number of error messages: 0
Number of warning messages: 7
Number of info messages: 1

Writing design to file board.ncd



PAR done!

but this try did not work :-(

thank you,k2w2

mrflibble · May 11, 2011

Delay: 22.679ns (Levels of Logic = 21)
Source: pcpuwm1/pcpu/regWriteDst_MEMWB_1 (FF)
Destination: pcpuwm1/pcpu/ZF (FF)
Source Clock: clock rising
Destination Clock: clock rising

Well, 21 logic levels is a bit much. That is definitely going to put a limit on your speed.

If this happens to be a counter then 21 logic levels isn't as bad as it sounds. But should this be all combinatorial without a CARRY4 in there, then 21 logic levels is going to be slooooow.

Also, in the timing report right after the bit I just quotes, there is also information about how the path delay is built up. This is also useful information. Could you include that next time around, that helps in us understanding what is roughly going on...

And as permute suggested, set up some basic sensible timing constraints for your design. Without it ISE might give you numbers that are easily wrongly interpreted.

Now admitted, with 21 logic levels your design probably is going to be slow so you need to take a look at that as well...

---------- Post added at 11:34 ---------- Previous post was at 11:27 ----------

k2w2yut said:
So the clock freq. have separated in each module. As my understanding I can add more freq. to the clock_mem because It's doesn't had effect from the clock or slowest path in the pcpu module.

That sounds like a wrong assumption right there. That signal regWriteDst_MEMWB is related to the memory interface right?

Delay: 22.679ns (Levels of Logic = 21)
Source: pcpuwm1/pcpu/regWriteDst_MEMWB_1 (FF)
Destination: pcpuwm1/pcpu/ZF (FF)

k2w2yut · May 11, 2011

Code:

Timing constraint: Default period analysis for Clock 'clock'
  Clock period: 22.679ns (frequency: 44.094MHz)
  Total number of paths / destination ports: 3022404 / 1196
-------------------------------------------------------------------------
Delay:               22.679ns (Levels of Logic = 21)
  Source:            pcpuwm1/pcpu/regWriteDst_MEMWB_1 (FF)
  Destination:       pcpuwm1/pcpu/ZF (FF)
  Source Clock:      clock rising
  Destination Clock: clock rising

  Data Path: pcpuwm1/pcpu/regWriteDst_MEMWB_1 to pcpuwm1/pcpu/ZF
                                Gate     Net
    Cell:in->out      fanout   Delay   Delay  Logical Name (Net Name)
    ----------------------------------------  ------------
     [COLOR="red"]FDC:C->Q             14   0.720   1.255  pcpuwm1/pcpu/regWriteDst_MEMWB_1 (pcpuwm1/pcpu/regWriteDst_MEMWB_1)
     LUT4_D:I2->O         17   0.551   1.684  pcpuwm1/pcpu/fwdWB_Reg_Con<1>26 (pcpuwm1/pcpu/fwdWB_Reg_Con<1>26)
     LUT2:I0->O           18   0.551   1.443  pcpuwm1/pcpu/fwdWB_Reg_Con<1>43 (pcpuwm1/pcpu/fwdWB_Reg_Con<1>)
     LUT4_D:I3->O         11   0.551   1.170  pcpuwm1/pcpu/ALUIn1_or0001161_SW0 (N269)
     LUT4:I3->O            1   0.551   0.869  pcpuwm1/pcpu/ALUIn1<2>11 (pcpuwm1/pcpu/ALUIn1<2>11)
     LUT4:I2->O           12   0.551   1.313  pcpuwm1/pcpu/ALUIn1<2>39 (pcpuwm1/pcpu/ALUIn1<2>)[/COLOR]
     [COLOR="orange"]LUT2:I1->O            1   0.551   0.000  pcpuwm1/pcpu/Madd_result_addsub0000_lut<2> (pcpuwm1/pcpu/Madd_result_addsub0000_lut<2>)
     MUXCY:S->O            1   0.500   0.000  pcpuwm1/pcpu/Madd_result_addsub0000_cy<2> (pcpuwm1/pcpu/Madd_result_addsub0000_cy<2>)
     MUXCY:CI->O           1   0.064   0.000  pcpuwm1/pcpu/Madd_result_addsub0000_cy<3> (pcpuwm1/pcpu/Madd_result_addsub0000_cy<3>)
     MUXCY:CI->O           1   0.064   0.000  pcpuwm1/pcpu/Madd_result_addsub0000_cy<4> (pcpuwm1/pcpu/Madd_result_addsub0000_cy<4>)
     MUXCY:CI->O           1   0.064   0.000  pcpuwm1/pcpu/Madd_result_addsub0000_cy<5> (pcpuwm1/pcpu/Madd_result_addsub0000_cy<5>)
     MUXCY:CI->O           1   0.064   0.000  pcpuwm1/pcpu/Madd_result_addsub0000_cy<6> (pcpuwm1/pcpu/Madd_result_addsub0000_cy<6>)
     MUXCY:CI->O           1   0.064   0.000  pcpuwm1/pcpu/Madd_result_addsub0000_cy<7> (pcpuwm1/pcpu/Madd_result_addsub0000_cy<7>)
     MUXCY:CI->O           1   0.064   0.000  pcpuwm1/pcpu/Madd_result_addsub0000_cy<8> (pcpuwm1/pcpu/Madd_result_addsub0000_cy<8>)
     XORCY:CI->O           2   0.904   1.216  pcpuwm1/pcpu/Madd_result_addsub0000_xor<9> (pcpuwm1/pcpu/result_addsub0000<9>)
     LUT1:I0->O            1   0.551   0.000  pcpuwm1/pcpu/Madd_result_add0001_Madd_cy<9>_rt (pcpuwm1/pcpu/Madd_result_add0001_Madd_cy<9>_rt)
     MUXCY:S->O            1   0.500   0.000  pcpuwm1/pcpu/Madd_result_add0001_Madd_cy<9> (pcpuwm1/pcpu/Madd_result_add0001_Madd_cy<9>)
     XORCY:CI->O           1   0.904   0.827  pcpuwm1/pcpu/Madd_result_add0001_Madd_xor<10> (pcpuwm1/pcpu/result_add0001<10>)[/COLOR]
 [COLOR="lime"]    LUT4:I3->O            1   0.551   0.827  pcpuwm1/pcpu/result<10>151_SW0_SW0 (N447)
     LUT4:I3->O            2   0.551   0.903  pcpuwm1/pcpu/result<10>151 (pcpuwm1/pcpu/result<10>)
     LUT4:I3->O            1   0.551   0.996  pcpuwm1/pcpu/wZF17 (pcpuwm1/pcpu/wZF17)
     LUT4:I1->O            1   0.551   0.000  pcpuwm1/pcpu/wZF99 (pcpuwm1/pcpu/wZF)
     FDCE:D                    0.203          pcpuwm1/pcpu/ZF[/COLOR]
    ----------------------------------------
    Total                     22.679ns (10.176ns logic, 12.503ns route)
                                       (44.9% logic, 55.1% route)

I will try to explain from my data path
RED section is from my data forwarding unit that detect to send forward data from W/B stage to EX stage
Orange section is from my ALU Arithmetic operation
Green section is use to check and update Zero Flag register

regWriteDst_MEMWB << is the Flip-Flop contain target register(3-bit address) for W/B stage.It use to compare and detect that EX stage should use data from ID stage or forwarding data from W/B stage

I agree that It's not completely don't relate to memory but It's generated at the start of the pcpu clock cycle and unchange until next clock cycle that's enough for posedge of memory clock will "catch" data and operate it.

From 40/50 MHz I improved it to reduce delay, and add more frequency like 40.3/50 or 40/50.3 but It's both doesn't work

If you want more information please tell me ,thank you
k2w2

mrflibble · May 11, 2011

Thank you, that was indeed the type of information I meant.

Well, I can tell you that without some changes you are not going to reach significantly higher speeds than what you get now. This is just one path (the worst one), so no doubt there are more like it.

I don't know your design, but usually with a cpu + memory interface if you already have it decoupled (since you have two different clocks for it) ... then you can pipeline parts of it.

For the particular path you posted, you have already identified 3 major parts that it can be divided into. So lets taking that as an example:

RED section is from my data forwarding unit that detect to send forward data from W/B stage to EX stage
Orange section is from my ALU Arithmetic operation
Green section is use to check and update Zero Flag register

So for each of these, register the output. right after your data forwarding unit sends the data forward to the EX ... clock this data into flip-flops.

Same for the output of your ALU operation. register the ALU output.

Ditto for your zero flag update.

Now instead of trying to do a lot of work within 1 cycle, you smear out the action over 3 clock cycles. So these 21 logic levels will get divided over these 3 stages. As a small bonus you will get some "free" routing as well, because judging by the percentage and amounts of routing delay things are pretty spread out. By adding these extra flip-flop stages you add some "halfway stations" as it were.

You will have to adjust the surrounding stuff accordingly of course to take this extra latency into account. But I am afraid there are no easy free solutions in this case. Either just accept the 21 logic levels (and the delays that go with that), or find some ways to break this up into stages and change the rest of the design to accommodate these extra stages.

What you can do as a quick test (sort of a feasibility study) is this:

For all the inputs of this module, add shift registers to the inputs (say 4 deep). Then enable register balancing for this module. This will not cost you personally all that much effort (does take some extra time for ISE, but hey, go make some coffee). After it is done with the place & route, check the post PAR timings for this path and see if it is any better.

The design will NOT be functional at that moment but we don't care. That is not the intention of this action. The intention of this action is to find out what kind of timing improvements are doable with some cheap pipelining.

Personally I have had mixed results with retiming (register balancing). The absolute best results I get with actual thinking about the design. The tool based improvements with register balancing are ranging from good to total crap. 100% depending on the quality of input code I suppose.

You may want to google "xilinx REGISTER_BALANCING" for some info if you are not clear on this...

k2w2yut · May 11, 2011

Thank you,mrflibble

I should take sometime to try this and will post the result ASAP ^^''

k2w2

Welcome to EDAboard.com

what highest clock freq. support by block ram in spartan3

Newbie level 4

Full Member level 3

Full Member level 2

Full Member level 3

Newbie level 4

Advanced Member level 5

Newbie level 4

Full Member level 3

Newbie level 4

Full Member level 3

Advanced Member level 3

Newbie level 4

Advanced Member level 5

Newbie level 4

Advanced Member level 5

Newbie level 4

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor