Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Image rotation algorithm - Forward vs Backward mapping

Status
Not open for further replies.

shaiko

Advanced Member level 5
Joined
Aug 20, 2011
Messages
2,644
Helped
303
Reputation
608
Reaction score
297
Trophy points
1,363
Activity points
18,302
Hello,

I'm designing an image rotation block on a Cyclone V FPGA with DDR3.
The input is a simple 30 FPS parallel video protocol: 1024 * 1024 image, with HSYNC, VSYNC, 16 bit data and a ~40MHz pixel clock.

Using the 2D rotation formula I want to map destination pixels to source pixels - I.E:

1. I have a DDR3 address area that acts as an input buffer (to where I write the incoming image) and another DDR3 address area to where I write the transformed image.
2. I raster scan over the destination pixels I calculate the source address, fetch the pixel and write it to an on chip line buffer
3. Once the line buffer is filled up, I write it to the destination address in an efficient byte aligned fashion.

The problem I see with my algorithm is that step 2 is terribly inefficient in memory terms.
I fetch 128 bits (the DDR3 controller's data bus width) even though (at that time I might use) only a single pixel of the fetched data.

Do you think my algorithm is good ?
 

With DDR, your kind of stuck as random access is pretty poor. But if you have plenty of share bw then if it works what's the problem?
 
Last edited:
  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
You should work on blocks of data vs lines and also use a cache.

If needed, you can also write data in a block ordering vs a line ordering.

Pixels on different lines might not be in the same row/bank, or might not be on the same row with higher frequency. Row operations are much slower than column accesses.

You should also have the normal latency hiding of allowing multiple accesses to be queued vs waiting for each to return before starting the next.
 
  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
You should work on blocks of data vs lines
Please elaborate.
What do you mean exactly by "blocks of data" ?
 

For example, if I work on generating the first 32 pixels of the first 32 lines instead of the first 1024 pixels of the first line. There is now an improved chance that groups of pixels read from the input will actually be used in the output.

For the 90deg rotation, generating 32 output lines means reading 1024*32 times, and possibly incurring an equal number of row operations for the basic approach. For the block approach, it means reading 1024*4 times, and possibly incurring 1024 row operations. That is the biggest difference between the two.

The size of the block can be chosen based on how efficient the 45 deg rotation should be. The average number of used pixels vs read pixels increases as the block size increases.


That said, the rate is fairly low compared to DDR3. The row behavior might still be significant though, although interleaving banks could help out in that case.
 
  • Like
Reactions: shaiko

    shaiko

    Points: 2
    Helpful Answer Positive Rating
Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top