In essence - "link training" is the process of finding the safest sampling point at receiver to avoid timings violations and achieve a robust link.
In the Ultrascale family the recommendation is to use the "bitslice" component.
Yes.
The "bitslice" is an IOSERDES, IODELAY and FIFO aggregated into one primitive.
Before Ultrascale - series 7 had these separately.
The new approach is what Xilinx calls: "Native Mode".
The old approach (Series 7) is called: "Component Mode".
You can still use "Component Mode" with Ultrascale - however, it's inadvisable unless you have a good reason to do so...
There is no specific protocol. I have just 4 I/O pins on TX end and 4 I/O pins on RX end and using 2 in one direction and remaining 2 on another direction at each end. At the moment I send some known pattern from each end and verify on another end and then send the data and it works at 100 MHz.