1. Solutions for leading one detector

I need leading one detector which can detect the leading one and tells the position of first '1' i.e.

i/p 00000000111100000000
o/p 00000000100000000000 = 9

one way in my mind was to use a priority decoder(using casex or series of if..else)
but that uses much resource for larger inputs

can anyone suggests a better way using some thing else that counts less LUTs??

you can use 2 dffs serialy connected, and use a counter at the same time; of cource the counter bits, u need to watch the i/p sequence firstly;
after reset and the stop signal is unenable, the initial 2DFFS are:00,then begin to capture the reading in bit from i/p sequence serialy,
u just need to dectect and judge the posedge edge 10 (that 2DFFS are: 10) is ok;
but note the following:
1 : the sample frequency of the detector must be 2 times of the frequency of the i/p sequence if it is asynchronous from the i/p's;
2 : or that the the sample frequency of the detector is synchoronous with the i/p's(that the same period and phase,the clocks of i/p&o/p is the same one).
after detect the juge edge, stop signal is enabled, and the detec operation stops.
the resources used are decided by the number of the counter bits.

How wide is the input going to be?
How many clock cycles are you allowed to take to decode?

You can do a shift register + counter if you have plenty of time.
You can do a (pipelined) priority decoder if you are in a hurry.
You can split the input into smaller parts, decode for each part, and then combine.
You can <fill_in/> depending on a lot of other constraints that you did not specify about the signal.

1 members found this post helpful.

Originally Posted by sunjianhuigou
you can use 2 dffs serialy connected, and use a counter at the same time; of cource the counter bits, u need to watch the i/p sequence firstly;
after reset and the stop signal is unenable, the initial 2DFFS are:00,then begin to capture the reading in bit from i/p sequence serialy,
u just need to dectect and judge the posedge edge 10 (that 2DFFS are: 10) is ok;
but note the following:
1 : the sample frequency of the detector must be 2 times of the frequency of the i/p sequence if it is asynchronous from the i/p's;
2 : or that the the sample frequency of the detector is synchoronous with the i/p's(that the same period and phase,the clocks of i/p&o/p is the same one).
after detect the juge edge, stop signal is enabled, and the detec operation stops.
the resources used are decided by the number of the counter bits.
I have only a cycle to decode it completely

---------- Post added at 11:55 ---------- Previous post was at 11:51 ----------

Originally Posted by mrflibble
How wide is the input going to be?
How many clock cycles are you allowed to take to decode?

You can do a shift register + counter if you have plenty of time.
You can do a (pipelined) priority decoder if you are in a hurry.
You can split the input into smaller parts, decode for each part, and then combine.
You can <fill_in/> depending on a lot of other constraints that you did not specify about the signal.
i have only a clock cycle and the input is 66-bits wide.....
splitting the decoder my help in increase fmax but this will no decrease the LUTs....
I believe that synthesis tools are smart enough to chose the best possible implementation of a decoder so thinking at this point is not much useful....
I think i need something else (other than decoding) to solve this problem

A combinational priority encoder involves one logic cell per bit and can't hardly be beaten by any sequential solution, as long as the bit carry delay is acceptable. A recent compiler should be able to synthesize the logic from any equivalent behavioral description. In so far, I would simply let the tool try and check the gate level implementation.

•

may be you can try this way:
-store the input vector in a 66-bit wide q_out register,
-use q_out[i] to clear asynchronously all registers 'below', q_out[i-1:0];
quartus shows usage of 66 luts, not too much;
good luck
J.A

•

there is a way to use addition as well. this can be shown for the trailing case, and then modified.

000011110000 -- input
111100001111 -- invert
000000010000 -- xnor with input.

example 2:
1010101011 -- input
0101010100 -- invert
0000000001 -- xnor with input

clearly you would need to write the code to have a bit-reverse for the input/output. the bitreverse doesn't actually infer logic if done correctly.
this has complexity of 1 adder + 1 LUT, so it's not terrible. the use of the adder at least allows the FPGA to make use of the carry chains.

7 members found this post helpful.

Originally Posted by permute
000000010000 -- xnor with input.
I think you mean "and with input".

Interesting method. What is then the best method to get the bit position as a bit vector index?

•

Originally Posted by permute
there is a way to use addition as well. this can be shown for the trailing case, and then modified.
Hey, that's a pretty good method! With carry chains being so fast it even has pretty good timings for a large number of bits.

@permute; thanks bro. Thats really a new way. i haven't seen this any where. Is this your own invention??

Basically, many synthesis tools should be able to infer a fast carry logic from this construct:
Code:
```carry:='0';
FOR I IN NBIT-1 DOWNTO 0 LOOP
dataout(I) <= datain(I) AND NOT carry;
carry:=carry OR datain(I);
END LOOP;```
I tried with Quartus and Cyclone III, but apparently the tool isn't prepared to use the carry chain respectively arithmetic mode for non-arithmetic problems.

By instantiating lcell_comb low-level primitives, it finally work, using one logic cell per bit and the carry chain.

The inverted adder suggested by permute is recognized without problems, but needs 2*nbit-1 logic cells: The second logic cell increases the delay by only 1 ns.

Code:
```FOR I IN NBIT-1 DOWNTO 0 LOOP
sum(I):=NOT datain(nbit-1-I);
END LOOP;
dataout<=datain XNOR std_logic_vector(unsigned(sum)+to_unsigned(1,nbit));```

2 members found this post helpful.

Well, I came up with it independently. I have to imagine someone saw a problem of "bit N is the leftmost only when bit N+1, N+2, N+.. are all 0s, build up and repeat", and thought, "a carry is a 1 when bit N-1, N-2, N-3, N-... are all 1's". From there the connection to addition is pretty clear. I originally used it, or at least some variants, for a pipelined processor.

The second question about "tells the position of first 1" (priority encoder) is still pending. There are basically two methods
- based on the output of the leading bit detector, which is already "one hot" encoded.
- based on the input directly
The first solution ends up in a wide or expression for each position bit and one additional for the all zero case, like below:
Code:
```FOR J IN 0 TO NBITA-1 LOOP
FOR I IN 0 TO NBIT-1 LOOP
END LOOP;
END LOOP;
zero <= NOT OR_REDUCE(dataout);```
A consideration about priority encoder designs can be found in the Arbitration chapter of the Altera Advanced Synthesis Cookbook. It also mentions a priority masking method using an adder similar to Permute's suggestion.

http://www.altera.com/literature/man...x_cookbook.pdf

Having the leading bit (or priority mask) result altready calculated exposes the zero bit simply as inverted carry output of the lowest bit, as a supplement to the previously suggested code.
Code:
`zero <= NOT carry;`
The synthesis of wide or should be hopefully managed with acceptable results by the synthesis tool for the respective logic famlily.

Alternatively, the bit position can be determined from the unencoded input bit vector directly. Altera has a MegaFunction altpriority_encoder for this purpose, which is slightly faster than the cascaded solution and considerably faster than the synthesis of a behavioral description. It also has options for clocked and pipelined operation. The structure of synthesized logic looks however rather arbitrary and doesn't suggest a particular scheme at first sight.

1 members found this post helpful.

HI,
FvM,Very Good,thanks!
I get some ideas from you, i have a more clearly understanding about the first one-bit-detector and the inferring process.
You tell me that the detector is not so easy to realize even if permute had introduced the skill to me, otherwise, why Altera Co. need to explian it with xxx.pdf !

Originally Posted by permute
000011110000 -- input
111100001111 -- invert
000000010000 -- xnor with input.
I still think that "xnor" is incorrect. You get the one-hot output with the "and" operator. The "xnor" operator gives the result "000000011111" which can also be useful.

1 members found this post helpful.

Originally Posted by std_match
I still think that "xnor" is incorrect. You get the one-hot output with the "and" operator. The "xnor" operator gives the result "000000011111" which can also be useful.
Hi,std_match:
Yeah, it's a tailing one-bit detector, if want to realize the heading one-bit one, needing to cahnge it,
but this kind of method is a must maybe.
0000111 10000 -- input <-A ,original input sequence
............ -- invert
1111000 10000 -- add 1 <-B
0000000 10000 -- xnor with input. <-C ,last result
A XNOR B = C . TAILING ONE-BIT DETECTOR
If want to realize the hedaing '1'-bit ,needing to think it again, TAILING <-> HEADING .
If want to realize the detector in one cycle, use adder is a good way, so the talking now turned into another question about the "Carry xxx", of course needing to consider the bit reverse logic realization and some other combinational circuit realization problems.

You get the one-hot output with the "and" operator.
True indeed, I didn't check it thoroughly. The logic cell effort and speed isn't affected of cause. Altera suggested to add "-1" and perform datain AND not sum, by the way.

Yeah, it's a tailing one-bit detector, if want to realize the heading one-bit one.
This is done by the bit reversal, as permute pointed out. My previous VHDL code is wrong however, it misses to revers the sum again. Here's a corrected version.

Code:
```FOR I IN NBIT-1 DOWNTO 0 LOOP
sum(I):=NOT datain(nbit-1-I);
END LOOP;
sum :=std_logic_vector(unsigned(sum)+to_unsigned(1,nbit));
FOR I IN NBIT-1 DOWNTO 0 LOOP
dataout(I)<=datain(I) AND sum(nbit-1-I);
END LOOP;```

1 members found this post helpful.

@std_match -- yeah, you are correct. IIRC, the time that I did this, I was wanting to take a onehot vector, and convert it into a 00001111 type vector. the intent was to keep a small LRU table sorted. if item 5 was just used, then items 4,3,2,1 would need to be shifted, and item 5 moved to the top of the list. So my operation was a handful of parallel compares, and then this add logic.

the "AND" operator is the correct one in this case.

the general structure is very useful for several applications. you can also do things with bit-reverse, shifting, inverting, negating, ect... the original vector, as well as using a second vector. The only trick is to document what you did, as it isn't usually that obvious.

1 members found this post helpful.

through the talking , i learn a lot ,Good! 罡罡的！