1.) several reasons. The largest one being power. a 18x18 multiplication would require a large number of LUTs, making it large in terms of area. it would likely be slow, as it would also need to use a lot of general purpose routing. Further, these two factors lead to more energy required per operation. By using dedicated HW that can only do a limited (though increasing) number of basic operations, the design for the operation can be small and fast and use less power than a fully reconfigurable option. By being smaller, you can also pack more into the device (routing permitting). it makes timing consistant, as a multiply takes the exact same amount of time because it is implemented the same way every time -- with LUTs, it could choose different locations for each LUT, and use different routing.
2.) reports. The tools will usually tell you the number of dsp slices used in a design. You can also manually instantiate the dsp slices, this can be useful for Xilinx parts with DSP48 slices in more advanced configurations.