I've made some experiment on "high speed" on parallel port.
**broken link removed**
(sorry it's in french). To sum up I reach 1.1Mo/s using a driver (explainantion in the text, the table relate thing that have been directly mesured, but that are not true)
I will distinguish 2 things :
-obtain a time critical behaviour even with relatively low speed
here, you wil have OS related problem. With Windows you need a realtime layer (like RTX). With Linux it's the same (for example with RTAI)
-obtain a (the) high(est) speed transfer. Here OS isn't the last problem. If you want to have no OS problem, a driverunder Windows will do the job. You are at the lowest level. It's the same things with Linux.
If you want to reach 2Mo/s you need to use all trick, the major one is to send 32 bits of data at a time (that will be split in four eight bit cycle). And solve a part of I/O slow. You need to do 32 bits I/O port. Thiongs that need a library or asm statement.
Be carfeul under Windows all solution where there is one driver call per I/O is catastrophic (in regards of perf).
The last problem will be the bridge between I/O chipset and PCI bus, bridge that is often include in the chipset of the motherboard. With old PC (with no PCI bus you haven't this level and perharsp better perf).
I don't have tried with DMA, that is pehrarps the ultimate method that make 2Mo/s possible.