Easiest option is to use the divide core provided by the vendor - you get control over pipeline length and hence clock speed etc.
You could just use the divide function "/" if you dont care about the clock speed.
I am not sure if it will be your answer but there are some algorithms for division. Booth's algorithm is the hardest one to be implemented if you can implement it, the others would be easier to follow. However, it is hard to implement and you need to create a datapath for it (2 registers, 1 flop, 1 control unit, some muxes etc.). If you implement this, you will learn a lot though.