Image rotation algorithm - Forward vs Backward mapping

shaiko · Feb 17, 2017

Hello,

I'm designing an image rotation block on a Cyclone V FPGA with DDR3.
The input is a simple 30 FPS parallel video protocol: 1024 * 1024 image, with HSYNC, VSYNC, 16 bit data and a ~40MHz pixel clock.

Using the 2D rotation formula I want to map destination pixels to source pixels - I.E:

1. I have a DDR3 address area that acts as an input buffer (to where I write the incoming image) and another DDR3 address area to where I write the transformed image.
2. I raster scan over the destination pixels I calculate the source address, fetch the pixel and write it to an on chip line buffer
3. Once the line buffer is filled up, I write it to the destination address in an efficient byte aligned fashion.

The problem I see with my algorithm is that step 2 is terribly inefficient in memory terms.
I fetch 128 bits (the DDR3 controller's data bus width) even though (at that time I might use) only a single pixel of the fetched data.

Do you think my algorithm is good ?

TrickyDicky · Feb 17, 2017

With DDR, your kind of stuck as random access is pretty poor. But if you have plenty of share bw then if it works what's the problem?

vGoodtimes · Feb 18, 2017

You should work on blocks of data vs lines and also use a cache.

If needed, you can also write data in a block ordering vs a line ordering.

Pixels on different lines might not be in the same row/bank, or might not be on the same row with higher frequency. Row operations are much slower than column accesses.

You should also have the normal latency hiding of allowing multiple accesses to be queued vs waiting for each to return before starting the next.

shaiko · Feb 18, 2017

You should work on blocks of data vs lines

Please elaborate.
What do you mean exactly by "blocks of data" ?

vGoodtimes · Feb 18, 2017

For example, if I work on generating the first 32 pixels of the first 32 lines instead of the first 1024 pixels of the first line. There is now an improved chance that groups of pixels read from the input will actually be used in the output.

For the 90deg rotation, generating 32 output lines means reading 1024*32 times, and possibly incurring an equal number of row operations for the basic approach. For the block approach, it means reading 1024*4 times, and possibly incurring 1024 row operations. That is the biggest difference between the two.

The size of the block can be chosen based on how efficient the 45 deg rotation should be. The average number of used pixels vs read pixels increases as the block size increases.

That said, the rate is fairly low compared to DDR3. The row behavior might still be significant though, although interleaving banks could help out in that case.

Welcome to EDAboard.com

Image rotation algorithm - Forward vs Backward mapping

shaiko

Advanced Member level 5

TrickyDicky

Advanced Member level 7

shaiko

vGoodtimes

Advanced Member level 4

shaiko

shaiko

Advanced Member level 5

vGoodtimes

Advanced Member level 4

shaiko

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics