Is there a way to make for loop synthesizable with max loop number variable?

layowblue · Mar 21, 2014

Hi All

First time posting question here...

I want to create a circuit as:
"

Code:

reg [7:0]           mem [0:255];
reg [256*8-1:0] data_o;

for (int i=0; i<= size; i++)            //size[7:0] is a register output that may vary
    data_o[8*i+7 -: 8] <= mem[i];  //

"

I know it's not synthesizable, so I am wondering if the follow code is:

"

Code:

for (int i=0; i<= MAX_SIZE-1; i++)     //MAX_SIZE is a parameter, say 256
    if( i<=size)
        data_o[8*i+7 -: 8] <= mem[i];  //

"

Thanks a lot
Leo

ads-ee · Mar 24, 2014

You can't. Verilog isn't a software programming language, with runtime loop definitions.

It's primary use is to model hardware, and I've yet to see hardware that magically appears and disappears based on an internal register value. If you need to have variable amounts of hardware that is used, instead you should build for the largest size value and only use what you need based on smaller size values.

Regards

layowblue · Mar 24, 2014

Thanks for your reply!
I'm still confused. In the second part of codes, I was trying to "build for the largest size value" as you mentioned here.
Do you think that code is synthesizable?

Thanks again

ads-ee said:
You can't. Verilog isn't a software programming language, with runtime loop definitions.

It's primary use is to model hardware, and I've yet to see hardware that magically appears and disappears based on an internal register value. If you need to have variable amounts of hardware that is used, instead you should build for the largest size value and only use what you need based on smaller size values.

Regards

ads-ee · Mar 25, 2014

No, it's still trying to generate or not generate logic after you've already synthesized the hardware (runtime)

layowblue · Mar 25, 2014

Thanks again for the reply.
If it does not work, I have a real problem to implement what I want to do now.
Let me explain what I wanted to do:
The design requires a FIFO with data_width=32 (Dword) and certain depth. It is required that each time when enqueue is '1', one or more DWs are written into the FIFO. the number of DW written in the same cycle is decided by an input signal "wr_size".
(Note that the FIFO is based not on ram, but registers.)

Since the number of valid DW is not controllable and not predictable, I think I need some logic like:
"

Code Verilog - [expand]
1
2
3
4
5
6
7
8
reg [31:0]  mem [0:128];
wire [31:0] input_DW[0:7];
wire [2:0]   wr_size;
 
mem[wr_start_addr][31:0]               <= input_DW[0][31:0];
mem[wr_start_addr+1][31:0]            <= input_DW[1][31:0];
//....
mem[wr_start_addr+wr_size-1][31:0] <= input_DW[wr_size-1][31:0];

"

So how to implement the idea into something synthesizable?

Or, can I generate a input_DW_valid[0:7] to indicate which DW needs to be written into the FIFO?
something like:
"

Code Verilog - [expand]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
always_comb
begin
    for(int i=0; i<8; i++)
      if(i<wr_size)
          input_DW_valid[i] = 1'b1;
      else
          input_DW_valid[i] = 1'b0; 
end
//...
always_ff@(posedge clk, negedge rstb)
begin : buffer_write
     if(! rstb) begin
         for (int i=0; i<128; i=i+1) mem[i] <= 32'b0;
     end
     else begin
          for(int i=0; i<8; i++)
               if(input_DW_valid[i] == 1'b1)
                   mem[wr_start_addr+i][31:0] <= input_DW[i][31:0];
     end

"
Will it work?

Thanks a lot!

ads-ee said:
No, it's still trying to generate or not generate logic after you've already synthesized the hardware (runtime)

ads-ee · Mar 25, 2014

What you want to do is implement the worst case number of DWs and keep track of how many you have written. There is no way to create new registers to hold those DWs after you've implemented a design, so you need to have them all implemented up front. This isn't like software with dynamically linked libraries and memory allocation.

Q why do you have to use registers? Seems rather resource intensive. Do you need access to all the DWs simultaneously?

FvM · Mar 25, 2014

The second code in post #1 is pretty synthesizable, I don't understand the reservations. Of course, MAX_SIZE*8 bit assignments will be implemented in hardware, that's the prize of being flexible.

ads-ee · Mar 25, 2014

FvM, what about the if (i <= size) in the for loop? That performs selective assignment of data_o depending on the OP's size variable (not a constant). I'm not entirely sure what synthesis tools will make of that. It's indeterminate for synthesis, since it could be size==0 or size==MAX_SIZE. I've always gone by the assumption that for every iteration of a loop I want to generate logic for that iteration and not have logic generated depending on another signal variable (non-constant).

just noticed the OP could have written data_o as

Code Verilog - [expand]
1
2
3
data_o[8*i +:8]
//instead of..
data_0[8*i+7 -: 8]

FvM · Mar 25, 2014

The construct will simply synthesize a number of conditional assignments in the RTL, which will be mapped to combinational logic, either in front of registers for sequential always blocks otherwise implemented purely combinational.

Of course there's no thing like conditional logic generation, the MAX_SIZE constant controls the number of generated parallel logic pathes and size is decoded into individual enable terms.

ads-ee · Mar 25, 2014

Interesting, I'll have to experiment with that in my spare time.

What you're saying makes some sense, but I've never had a reason to attempt something like what the OP is attempting.

layowblue · Mar 25, 2014

Thank you ads-ee and FvM both!
Yes, I could not afford the latency ram would introduce, then I ended up with buffer formed by registers.
If it is synthesizable, the next question is: would the combination logic for addressing the mem be blowing up a 500MHz clock timing budget?
Given the mem(buffer) depth is 256, and technology is 28nm, could someone help estimate the timing? I don't have a license to run synthesis myself until other blocks are ready...

FvM · Mar 25, 2014

Being synthesizable is of course not the sufficient condition for a successful design implementation. 500 MHz doesn't sound particularly promising. start_adress requires a large multiplexer tree with respective propagation delay. Some pipelining will be necessary. It seems to me that a wide RAM implementation with additional logic could be better suited to achieve the speed.

layowblue · Mar 26, 2014

Thanks FvM for the input again.
Unfortunately, the latency budget for the block is only one cycle, which means after valid input data is received, the next cycle, they will possibly be on the output bus. This by nature exclude RAM solution because some rd/wr contention will happen, and usually ram access needs more than one cycle...

By the way, could someone tell me how many level of pure combinational muxes would add up to around 1ns delay under 28ns technology? A rough estimation would be much helpful.

layowblue · Apr 17, 2014

Just an update that with 30 levels of combination logic, my design just barely meets timing of a 500MHz clock.
by 1 "level" I mean one NAND gate. Imagine 30 NAND gates chained between to flops...
This gives us some idea of how powerful 28ns can be in term of short cell delays.

Welcome to EDAboard.com

Is there a way to make for loop synthesizable with max loop number variable?

layowblue

Advanced Member level 4

ads-ee

Super Moderator

layowblue

Advanced Member level 4

ads-ee

Super Moderator

layowblue

Advanced Member level 4

ads-ee

Super Moderator

FvM

Super Moderator

ads-ee

Super Moderator

FvM

Super Moderator

ads-ee

ads-ee

Super Moderator

layowblue

Advanced Member level 4

FvM

Super Moderator

layowblue

Advanced Member level 4

layowblue

Advanced Member level 4

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics