Agreed, there is a world of difference between a single bit mapped at 128x96 resolution and a typical 8x8x8 bits at 720x576 resolution. 17,280 times more data to process in real time to be precise, and that is just for one picture and no frame store. As a guess I would say you need to buffer at least four frames to be able to 'slide' them then a buffer to merge them.
An Arduino UNO clocked at maybe 4Ghz with around 2Gb of memory might just do it if it had enough hardware support around it.
Brian.