Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Is there a way to make for loop synthesizable with max loop number variable?

Status
Not open for further replies.

layowblue

Advanced Member level 4
Full Member level 1
Joined
Mar 21, 2014
Messages
115
Helped
19
Reputation
38
Reaction score
18
Trophy points
18
Activity points
791
Hi All

First time posting question here...

I want to create a circuit as:
"
Code:
reg [7:0]           mem [0:255];
reg [256*8-1:0] data_o;

for (int i=0; i<= size; i++)            //size[7:0] is a register output that may vary
    data_o[8*i+7 -: 8] <= mem[i];  //
"

I know it's not synthesizable, so I am wondering if the follow code is:

"
Code:
for (int i=0; i<= MAX_SIZE-1; i++)     //MAX_SIZE is a parameter, say 256
    if( i<=size)
        data_o[8*i+7 -: 8] <= mem[i];  //
"

Thanks a lot
Leo
 

You can't. Verilog isn't a software programming language, with runtime loop definitions.

It's primary use is to model hardware, and I've yet to see hardware that magically appears and disappears based on an internal register value. If you need to have variable amounts of hardware that is used, instead you should build for the largest size value and only use what you need based on smaller size values.

Regards
 

Thanks for your reply!
I'm still confused. In the second part of codes, I was trying to "build for the largest size value" as you mentioned here.
Do you think that code is synthesizable?

Thanks again

You can't. Verilog isn't a software programming language, with runtime loop definitions.

It's primary use is to model hardware, and I've yet to see hardware that magically appears and disappears based on an internal register value. If you need to have variable amounts of hardware that is used, instead you should build for the largest size value and only use what you need based on smaller size values.

Regards
 

No, it's still trying to generate or not generate logic after you've already synthesized the hardware (runtime)
 

Thanks again for the reply.
If it does not work, I have a real problem to implement what I want to do now.
Let me explain what I wanted to do:
The design requires a FIFO with data_width=32 (Dword) and certain depth. It is required that each time when enqueue is '1', one or more DWs are written into the FIFO. the number of DW written in the same cycle is decided by an input signal "wr_size".
(Note that the FIFO is based not on ram, but registers.)

Since the number of valid DW is not controllable and not predictable, I think I need some logic like:
"

Code Verilog - [expand]
1
2
3
4
5
6
7
8
reg [31:0]  mem [0:128];
wire [31:0] input_DW[0:7];
wire [2:0]   wr_size;
 
mem[wr_start_addr][31:0]               <= input_DW[0][31:0];
mem[wr_start_addr+1][31:0]            <= input_DW[1][31:0];
//....
mem[wr_start_addr+wr_size-1][31:0] <= input_DW[wr_size-1][31:0];


"

So how to implement the idea into something synthesizable?

Or, can I generate a input_DW_valid[0:7] to indicate which DW needs to be written into the FIFO?
something like:
"

Code Verilog - [expand]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
always_comb
begin
    for(int i=0; i<8; i++)
      if(i<wr_size)
          input_DW_valid[i] = 1'b1;
      else
          input_DW_valid[i] = 1'b0; 
end
//...
always_ff@(posedge clk, negedge rstb)
begin : buffer_write
     if(! rstb) begin
         for (int i=0; i<128; i=i+1) mem[i] <= 32'b0;
     end
     else begin
          for(int i=0; i<8; i++)
               if(input_DW_valid[i] == 1'b1)
                   mem[wr_start_addr+i][31:0] <= input_DW[i][31:0];
     end


"
Will it work?

Thanks a lot!


No, it's still trying to generate or not generate logic after you've already synthesized the hardware (runtime)
 
Last edited by a moderator:

What you want to do is implement the worst case number of DWs and keep track of how many you have written. There is no way to create new registers to hold those DWs after you've implemented a design, so you need to have them all implemented up front. This isn't like software with dynamically linked libraries and memory allocation.

Q why do you have to use registers? Seems rather resource intensive. Do you need access to all the DWs simultaneously?
 

The second code in post #1 is pretty synthesizable, I don't understand the reservations. Of course, MAX_SIZE*8 bit assignments will be implemented in hardware, that's the prize of being flexible.
 

FvM, what about the if (i <= size) in the for loop? That performs selective assignment of data_o depending on the OP's size variable (not a constant). I'm not entirely sure what synthesis tools will make of that. It's indeterminate for synthesis, since it could be size==0 or size==MAX_SIZE. I've always gone by the assumption that for every iteration of a loop I want to generate logic for that iteration and not have logic generated depending on another signal variable (non-constant).

just noticed the OP could have written data_o as

Code Verilog - [expand]
1
2
3
data_o[8*i +:8]
//instead of..
data_0[8*i+7 -: 8]

 

The construct will simply synthesize a number of conditional assignments in the RTL, which will be mapped to combinational logic, either in front of registers for sequential always blocks otherwise implemented purely combinational.

Of course there's no thing like conditional logic generation, the MAX_SIZE constant controls the number of generated parallel logic pathes and size is decoded into individual enable terms.
 
  • Like
Reactions: ads-ee

    ads-ee

    Points: 2
    Helpful Answer Positive Rating
Interesting, I'll have to experiment with that in my spare time.

What you're saying makes some sense, but I've never had a reason to attempt something like what the OP is attempting.
 

Thank you ads-ee and FvM both!
Yes, I could not afford the latency ram would introduce, then I ended up with buffer formed by registers.
If it is synthesizable, the next question is: would the combination logic for addressing the mem be blowing up a 500MHz clock timing budget?
Given the mem(buffer) depth is 256, and technology is 28nm, could someone help estimate the timing? I don't have a license to run synthesis myself until other blocks are ready...
 

Being synthesizable is of course not the sufficient condition for a successful design implementation. 500 MHz doesn't sound particularly promising. start_adress requires a large multiplexer tree with respective propagation delay. Some pipelining will be necessary. It seems to me that a wide RAM implementation with additional logic could be better suited to achieve the speed.
 

Thanks FvM for the input again.
Unfortunately, the latency budget for the block is only one cycle, which means after valid input data is received, the next cycle, they will possibly be on the output bus. This by nature exclude RAM solution because some rd/wr contention will happen, and usually ram access needs more than one cycle...

By the way, could someone tell me how many level of pure combinational muxes would add up to around 1ns delay under 28ns technology? A rough estimation would be much helpful.
 

Just an update that with 30 levels of combination logic, my design just barely meets timing of a 500MHz clock.
by 1 "level" I mean one NAND gate. Imagine 30 NAND gates chained between to flops...
This gives us some idea of how powerful 28ns can be in term of short cell delays.
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top