Hello,Hi,
What do you think is the fastest multiplication alghorithms in the world?
unsigned 8 bitHello,
what datatype do you mean ?
Regards
Hello,Hi,
I vote for a big lookup table.
Klaus
I am looking for fast ASIC implementation. Booth looks good.Hello,
i agree with @KlausST. If you are looking for theoretical information see links to these algorithms:
https://en.wikipedia.org/wiki/Karatsuba_algorithm
https://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication
https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm
https://en.wikipedia.org/wiki/F%C3%BCrer%27s_algorithm
If you wanna fast multiplication operation using FPGA just use "hardware multipliers" or "DSP blocks" from FPGA fabric.
Regards
unsigned 8 bit
24 bit multiplier
32 bit floating point
I want a 4 step pipelined multiplier.
I recommend to take some time to find a clear requirement.four or six stage pipelined multiplier.
I am so sorry. You are right.Hi,
Do you recognize that you jump from one requirenent to the other.
I recommend to take some time to find a clear requirement.
Klaus
Hi,
I think there is a big difference whether you do 8x8 bit integer multiplication or 32bit x 32bit floating point multiplication.
One is the lower end, the other is the higher end ( or at least good middle range)
the one has a dynamic of 1:256, the other about 1:144700000000000000000000000000000000000000000000000000000000000000000000000
(dont know whether there is a zero too much or too less)
So it´s something totally different.
*********************************
If I understand you right, then you don´t need "fast" code, but code with high "throughput".
(pipelined code is not fast (input to output) but it has high throughput (maybe one calculation per clock cycle))
Klaus
For 8 bit unsigned int, you will need a table 256X256 in size. The products shall need 16 bit storage.I vote for a big lookup table...
...and every stage of that shift and add can be pipelined to run at very high clock frequencies. Depending on how much pipelining is done you will have increased latency for the first output but you will get all successive outputs on every clock cycle.For the slowest method (school book way) it is just 8 shifts and 8 additions.
The bug was actually associated with the floating point division processor (NDP) - divisions take considerably longer time and they use different lookup tables (if my memory is right). There was no bug in the multiplication lookup table.That is how the original Pentium multiplication error was introduced...