One correction. CRC need not be 32 bits. it depends on the maximum power of the polynomial used to generate the checksum. So, assuming that you have a polynomial of power 32, a simple question comes to my mind
Are you going to send this 2048 bits in serial or in parallel or in staggered fashion (like 32 bits in each clock till all 2048 bits are generated)?