First know the number of sink pins i mean register/Flops in the design. If they are 1000 for instance, and let say 20 is the fanout for a buffer. To drive 1000 flops you need atleast now 1000/20 (50) buffers, and you need 3 buffers to drive this 50 buffers.
So, you need 50 buffers at the 3rd level and buffers at 2nd level and 1 driving cell at 1st level.
Level will generally 3 for a design with more than 200 K gate count.We can calculate using the method described before but that should be used for number of flops clocked by the functional/main clock.
In general,it doenst matter how many levels we have as long as skew and insertion delay targets are met. We can have 3 BUFX16 or 5 BUFX4 . The size of the former will still be bigger when compared with latter. Deciding on the optimum buffers/inverters that are to be used( general idea is to remove smallest and largetst buffers) in CTS and realistic skew and network latency targets will help us to keep utilization & timing under control.