You can use a logiCORE 192 to 192 asynchronous FIFO and add a wrapper that converts the 192-bit output to a 128-bit output.
The wrapper presents a 128-bits of the 192-bits when you "read" the wrapper FIFO. The next read will output the last 64-bits (out of the first 192-bit read) and will read out another 192-bits, returning only 64-bits (64-bit+64-bit 128-bits). The third FIFO read will return the remainder 128-bits (with no logiCORE read).
You will have to continuously keep track of writing two and reading 3 to generate the correct empty flag.