Use of DSP blocks in FPGA

knightnoor · Feb 11, 2011

HI,
I am a Engineering final year student..and we are planning to make our project using FPGA...so i order to do so i need to do the deep study of FPGA...i have few querries regarding DSP blocks...i hope i will get help on this forum....
As i have already gone though few Xilinx PDF regarding DSP blocks so i have a doubt...that....
Q.1.When we already have CLBs slices to accomodate all the arithmetic computation and logic then why do we have DSP blocks specially for arithmetic operations?
Q.2.How do we know that the design or logic we are dump in FPGA in places in DSP slice rather than a simple slice?

permute · Feb 11, 2011

1.) several reasons. The largest one being power. a 18x18 multiplication would require a large number of LUTs, making it large in terms of area. it would likely be slow, as it would also need to use a lot of general purpose routing. Further, these two factors lead to more energy required per operation. By using dedicated HW that can only do a limited (though increasing) number of basic operations, the design for the operation can be small and fast and use less power than a fully reconfigurable option. By being smaller, you can also pack more into the device (routing permitting). it makes timing consistant, as a multiply takes the exact same amount of time because it is implemented the same way every time -- with LUTs, it could choose different locations for each LUT, and use different routing.
2.) reports. The tools will usually tell you the number of dsp slices used in a design. You can also manually instantiate the dsp slices, this can be useful for Xilinx parts with DSP48 slices in more advanced configurations.

FvM · Feb 11, 2011

The generally answer is, you don't need to care for. The design compiler will know, when it's advantageous to use DSP blocks.

If you are interested to learn about it, setup a test design involving multiply operations and make the compiler implement it with and without DSP blocks. (I'm not familiar with Xilinx, but there are HDL synthesis attributes as well as general synthesis options to control it). Compare the logic cell (CLB) utilisation and the timing analyzer results of both designs.

permute · Feb 11, 2011

The main issue that I've found is when the tools decide to use a DSP slice when it ends up being a bad idea. This happens because DSP slices in some of the devices are all located on half of the die. as it maps more and more things to DSP slices, the entire design has to be placed near the DSP slices for the routing delays to work.

time-sharing the DSP tiles is one area where the tools don't work as well as a human. eg, if the opmode/alumode inputs have to change each cycle, the tools will be more likely to use multiple DSP slices and then mux the outputs.

The other issue is that some operations need a good understanding of how the DSP slices can be used. This is probably true of altera as well. Xilinx's DSP48's can have multiple input registers and other intermediate registers that can be used if coded correctly. These features are used when the code allows, but it's very easy to see the code and accidently break it.

Jaffry · Sep 7, 2012

Hey!

Please check the reply 11 of this post. You will get another evidence with practical implementation that I also learn for my learning.
It is also useful to you. Check the link below.

https://www.edaboard.com/threads/263747/

also I have posted the reply as well.

That was indeed a great help. I simply implemented a simplest cirtuit

Code Verilog - [expand]
1
2
3
4
5
6
7
8
9
10
11
12
module mult(
    input clk,
    input rst,
    input [3:0] in1,
    input [3:0] in2,
    output reg  [7:0] out1
    );
 
always@(posedge clk)
if(rst) out1 <= 1;
else out1 <= in1 * in2;
endmodule

and first I simply run the process till PAR and see the report.
It has following details : Min. time 2.40 ns while number of slice 16 while no DSP48E slice.

But when I changed in the synthesis options for use DSP slice to 'YES' then it utilized zero (0) slices while only 1 DSP48E slice.
Hence conclusion 16 slices were saved while the minimum clock frequency was same in both case.

Good learning
Thank you.

FvM · Sep 7, 2012

I agree that 4X4 multiply is a problem which usually won't need a DSP block. 16X16 multiply is a different case.

Welcome to EDAboard.com

Use of DSP blocks in FPGA

knightnoor

Newbie level 5

permute

Advanced Member level 3

lakshmesha

FvM

Super Moderator

permute

Advanced Member level 3

lakshmesha

Jaffry

Member level 1

FvM

Super Moderator

Aya2002

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics