Here are some ways:
1. There are a lot of 51 uC with 4 8-bits ports, like Atmel 8252. Also, PICs in 40 pins package will work. It is better to use SPI software emulation through the parallel port to connect them.
2. You can also use simple CPLD from Xilinx or Altera. They have enough pins to be connected to parallel port.
3. You can use cheap network PCI card to re-design it to a simple PCI IO card. After that you'll get a plenty of resourses for IO.
Cost of all these solution is about 5-7 $.
Hope it will help!
Ace-X.