time as well as area optimize circuit will be a combinational circuit as follows
consider your bus is n bit in width then u need 'n' no of n-input NAND gate and one n-input OR gate and n no of NOT gates.
logic is simple , each NAND will detect a sequence containning one logic one . like wise there will be n sequence having singe one on it and therefore n nos of NAND gates. out put of all NAND will be given to OR.
depending how wide your bus is, but I would try to use FA (full adders).
this is not true addition logic since once you have a carry in one of the FA you can immediately bypass the result to the output, since it means you got more than 1 "1".
In some cases this would be faster and for sure less area (and power) than the fully paralleled approach.