You can refer the FIFO design.
I think it will help you.
Or simply use a RAM to store the data, a counter to count the data number in RAM, you can make it work easily.
Asynchronous or synchronous? the second is much more easier. just to simple pointers to generate read and write addresses, and then a controller circuit for each of them. Additionally a dual port memory as the storage element, the memory element however can be LUT based or it can use block memories.