Using a counter does not sound too simple at all. However it can be done for example by having a gated counter and a sift register controlling the gated counter. Also, a sequential control logic would be required to steer the process.. That would then count the bits during 8 clock cycles. However, I am pretty sure it will be quite complicated machinery, with maybe two counters, a 8 bit sift register, and a bunch of logic gates.
One very easy way is by a look-up table: It is essentially a ROM chip (usually EPROM or similar ), with each location containing number of one-bits for address of that location. For 8 bits in one needs thus 256 locations, and each location programmed with right contents. If that ROM chip is wider than required four bits, you just utilize what you need - four lines as that is enough to show a value 8 (binary 1000), which is the highest required.
I give below an example of such table, As I am bit too lazy for typing the whole table of 256 lines here, I give just few lines from the beginning and from the end:
Address / Contents
(input) / (output)
00000000 0000
00000001 0001
00000010 0001
00000011 0010
00000100 0001
247 values skipped .... you should be able to figure them out
11111100 0110
11111101 0111
11111110 0111
11111111 1000
Of course, there are many other solutions, but hardly too many requiring just one chip. Another example of single-chip solution would be using a programmable logic chip (PAL, PLD, or similar), but writing the equations would take a bit more effort than generating this simple table. The brute force approach would be actually to write a logic equation, in relevant language such as VHDL, for each output using the table above, and then letting the compiler the minimization work and fitting the result to the chip at hand. In other words, the same table, but converted to logic gates by a compiler.
- Ted