Hi,
The order purely depends upon the device architecture. Ex. no. of embedded multipliers, logic cells etc, and wether you are meeting your speed constraints. There are many ways and lots of literature on net, search and see. Also xilinx provides some free core that can be used. Use the CoreGen utility and it will make filters for you.