is there any recommendable way for normalization in verilog?
i want to normalize the 4 input data
the divisor is the sum of the 4 input
and the dividend is each input
im using core generator for normalization using LUT based divisor
if i do it with 16 bits width of input data, it consumes about 5000 registers and requires about 16 clocks,,,, it looks not good...
is there any recommandable technique for normalization(or division)? such as converting it to multiplication...
the divisor can not be a constant or multiple of 2. variable.
Parallel dividers are resource consuming by nature. The only alternative is a sequential divider. But it can't work pipelined, providing a new result each clock cycle. So it's only applicable if the data rate is considerably lower than the clock frequency.
im using core generator for normalization using LUT based divisor
if i do it with 16 bits width of input data, it consumes about 5000 registers and requires about 16 clocks,,,, it looks not good...
What version of the core generator IP core is this and for what family? I just ran it on ISE 14.7 for V7, V6, and Spartan 6 for the 3.0/4.0 divider IP and they all gave as the defaults a single clock cycle divider with 16-bit dividend divisor using a radix-2 implementation.
is there any recommandable technique for normalization(or division)? such as converting it to multiplication...
the divisor can not be a constant or multiple of 2. variable.
using a multiplier will result in needing a look up table with precomputed values for all possible 16-bit 1/N values, with some power of 2 scaling so you don't lose accuracy and can simply perform a shift to remove the scaling.
Re: is there any recommendable way for normalization in verilog?
it seems that i mistype it. im using ISE 13.1 for V5 and in the case with 16bit dividend and 18bit divisor, the division requires 1 clock samely,
the latency is 34.. this is what i wanted to say and anyway, it will consume thousands of registers...
and if i use 1/N for multiplier, it will need 2^16 cases LUT for reciprocal , this seems have no advantage in terms of using resources (am i right?)
A straight forward LUT that way would consume 1 MBit, so that's not going to fit in a bram based rom.
Dumb question: is it an absolute requirement that it is exactly normalized? As in should the largest number in the vector result in a precise 1.00000 after normalization? Because if you could get away with "almost normalized, and guaranteed to be <= 1.000" then you can make a cheaper implementation. Take for example the upper 6 bits of the sum, use a 6-bit input 1/N lookup table, and then multiply by the resulting reciprocal. Say that you do a 6-bit input, 8-bit output, that takes up only 2 slices worth of LUT6. Slap on a cheap multiplier and you're done.