Initial latency may be expected. Are you using LUT or Mult block?. Lut can eat up more latency. But how does it affect you?. I mean, since it is a pipeline stage Mult, you may not have this impact. Once initial latency passes by, you will get o\p continuously. Provided, it will pick 35x35 Mult in your case and hence it adds few more latency cycles.
I wouldn't consider this as a mistake. If 1008 cycles really bother, then think of tweaking the IP configurations and pipeline stages.
- - - Updated - - -
The block will have some initial latency, to flush outputs through the LUTs\DSP Slice. DSP Slice should give very little to zero latency, depending on configurations.
But however 1008 cycles might be odd. But still to my surprise....