Hi, I only have experience with CPLD's, and I have successfully chained them for download. As you said, from the 'dongle's TDI to the FPGA's TDI, then the FPGA's TDO to the next FPGA's TDI etc.. The rest of the connections (TCK, TMS, PWR, GND) are all in parallel.
But this is for CPLD's, an FPGA uses an on-board memory, to hold the downloaded design, and then load it into the FPGA on power-up. I guess that doesn't really matter, you just JTAG your design into these chips instaed, or the FPGA itself for temp operation (I could be wrong, I have no experience with FPGA's).
Btw, what FPGA's are you using? Xilinx, Altera, Lattice??
Anyway, you mentioned 'level shifting' on your boards, or is it in the 'dongle' (download cable)?? Because you could easily have 3,4 or more ribbon cables coming out of the dongle, one for each board, and these go direct to the FPGA/memory, if your lelvel shifting, or buffer circuitry is inside the dongle. And as I sid above, the wiring would be simple since 4 of the wires are the same connection, and the TDI,TDO just loops. But make sure you label the ribbon cables their place in the chain (1,2,3 etc).
As I said, I'm no expert,
BuriedCode.