you can simply XOR every delayed version of a bit with it's present value, so with every change in every bit you'll suddenly detect the change(instantaneously with just a very very tiny few delay).
now you can add the resulting bits together in one clock cycle and get the number of them.
a : signal std_logic_vector(3 downto 0)
dlyd_a : signal std_logic_vector(3 downto 0)
c : signal std_logic_vector(3 downto 0)
res: signal std_logic_vector(3 downto 0)
c<= a xor dlyd_a;
process
begin
if rising_edge(clk) then
dlyd_a <= a;
res <= c(0)+c(1)+c(2)+c(3);
end if;
end process
I'm not sure about the syntax error's but you can easily modify it yourself...