[SOLVED] Suggestion needed regarding LARGE size of ROM and RAMs in FPGA design

syedshan · May 23, 2013

Hi every one

Cutting long story short, since I consider myself relatively new to FPGA design. I really have no idea about, usually what size of ROMs and RAMs in FPGA designs are considered inappropriate.

For example. In my case I have a ROM (dual port ROM) of very large size along with RAM of very large size as well.
Rom size is (100*100*4) data of 27 bits each = 132 KB.
In future I calculated I would need 500 KB as well.
Similar is for RAM.

hence since both simultaneously exist in the FPGA (I will of course try to fully utilize BRAMs then go for Distributed RAMs for my project). Does this thing appears to happen ni FPGA designs. Or is it just the bad design I am makin. Be frank, I will learn with this mistake, although now I have to follow since my project timeline is almost on its end.

Note that I am using Xilinx Virtex-6, ISE and Core-generator.

Thanks in advance

TrickyDicky · May 23, 2013

Those are getting on the large size for FPGAs, and will have a lot of wasted memory space (IIRC BRAM is 1k x 18bit, so 100x100x4 x27 bits will need 80 BRAMs!). That amount of ram is only usually available on the larger chips. Why so big? whats wrong with external ram?

syedshan · May 24, 2013

Thanks for reply,

Actaully externally I have DDR3 and that I am using already to store a big amount of data.
It is actually a movie of 2000 or 500 (depends on application) frames, and each frame has 200x200 points. (It is actually an ultrasonic wave's movie, hence you can also call them pixels for actual movie analogy).

Hence it is something like this X*Y * 2000 frames

Now what I proposed is as that I put all the address of the 0th frame in the ROM and then take it as a reference for storing and retreiving those signals from DDR3

e.g.
if X=0, Y=0, then add_XY = 0; is stored in ROM, for FRAME-0
where as X=0, Y=0, Frame-1 address = "FRAME-0 address" + 1 ;
where as X=0, Y=0, Frame-2 address = "FRAME-0 address" + 2 ;
where as X=0, Y=0, Frame-3 address = "FRAME-0 address" + 3 ;
...
...
if X=0, Y=1, then add_XY = 0; is stored in ROM, for FRAME-0
where as X=0, Y=1, Frame-1 address = "FRAME-0 address" + 1 ;
where as X=0, Y=1, Frame-2 address = "FRAME-0 address" + 2 ;
where as X=0, Y=1, Frame-3 address = "FRAME-0 address" + 3 ;
...

for X=200, Y=200....

Frankly speaking when I started designing this, I had a little idea about actual hardware designs for algorithm implementations and hence what ever I propose to my project co-superviser, he would say yes, and I was a little surprised as well as happy that this is simpler than I think. But now you see the resource utilization is at its maximum. Now he told me to use other DDR3 for this same purpose which I don't want to do because of time limitation and he also considers this limitation, unless there is really big problems we face.

TrickyDicky · May 24, 2013

Why do you need the addresses in a rom? surely you would store the data in some logical way, so that you can use counters for the pixel addresses?
It also seems a bit odd to store an entire video in memory. Why not just stream it from a PC over some data link and process it in real time (obviously, if it takes too long to process a frame, then you cant do that). Either way, it would still make sense to send it from some other source.

syedshan · May 24, 2013

no no, you get it wrong...

The video is not stored in ROM, the video is stored in DDR3 memory.
and the reason I stored the BOUNDARY ADDRESS ( I just give this term this name so as to make things easy during discussion) is that I need to use it as a reference for the proceeding frames that are stored in the DDR3.

The Data is coming from either of tow ways in my project. Actually it is an Ultrasonic Movie generated by PZT transducer and etc.
which is captured using the 4DSP party FPGA board(having ADC and PCIE communication).

1. (which now am doing ) is through the PCIe through DMA transaction. i.e. I have the post-processing data already present in our archive from previous experiments and what I plan to do is to take the whole data in FIFO and then utlize that from FIFO to the the DDR3. Mean while since addressing is being generated by ME (i.e. the ANY designer, hence we can control it) so I just make it hard-coded in the DDR3 using the ROM and then referencing it.

2. Later in the second phase I will take data directly from the ADC at the run-time, even then I think I need to utlize this method !

But any suggestion would be welcome. Note that I only ask suggestion coz it will help me learn and do project well, not that I want the task-done-by-others sort of thing :smile:

Waiting for suggestions

Ice-Tea · May 24, 2013

Two sides to this story: if you have it in your device, why not use it? On the other hand, an external SPI ROM or some such would probably allow you to move to a smaller/cheaper FPGA. So if you plan to produce this thing in series, that might be a better option. In addition, SPI flash devices exist in a wide range of densities so that would scale well

TrickyDicky · May 24, 2013

Still, unless the address references are completly random, it would be more efficient to create the addresses rather than store them. Usually, frame data would be stored the same frame to frame, with just the frame address changing.

syedshan · May 24, 2013

Still, unless the address references are completely random, it would be more efficient to create the addresses rather than store them. Usually, frame data would be stored the same frame to frame, with just the frame address changing

pardon me, but can you give me some example how you meant or some reference or document. I actually sought that before starting coding but could not find.
Anyways. If can please do share here. It might help me as well as be a good reference for other starters. Can you (if you find some time) just summarize the way you say in algorithmic form. But do answer one question

Still, unless the address references are completly random, it would be more efficient to create the addresses rather than store them.

Lets say I saved the data at a time, then unless I have not kept track of the address where data starts, or where first frame of any signal exist etc. how can I retrieve it. OK while writing I can generate address, but for reading there should be something to be read, and that something must have some reference address. Right !

Two sides to this story: if you have it in your device, why not use it? On the other hand, an external SPI ROM or some such would probably allow you to move to a smaller/cheaper FPGA. So if you plan to produce this thing in series, that might be a better option. In addition, SPI flash devices exist in a wide range of densities so that would scale well

Frankly speaking I had to make this project with as early as possible since as soon as I started my research I was given this task and even I had very little to plan the best options and also since this is my first major project hence I made few mistakes and they helped me in learning, like in these forums I learned the simple ways how to divide w/o using divider actually and other tricky things,

TrickyDicky · May 24, 2013

Lets say I saved the data at a time, then unless I have not kept track of the address where data starts, or where first frame of any signal exist etc. how can I retrieve it. OK while writing I can generate address, but for reading there should be something to be read, and that something must have some reference address. Right !

Why would you "save the data" to some random address? surely you know where you saved it, and it should follow some scheme?
Say top N bits is frame number, next M bits is row address then next P bits is column address. Keep this scheme and you will always know where all the data is. So PC says "Process frame 6, and you just set the address to the start of frame 6?

syedshan · May 24, 2013

wow that is great. Because actually initially I also implemented this same thing. I wish I could share with you the image now in which I designed that data-mapping, using sort-of counters and then split the address into three as you said. But then I had to abandon that because of memory space loss. I wish I could share that if I were in my lab now.
But I will do that tomorrow I will share that tomorrow to share with you to show what I meant by memory loss. I try to explain now that how it is done

Lets suppose memory address is 12-bit and I have 10x10 scan area with 10 frams per scan point. I divided 4-bit each for X,Y and frame repectively
Hence address
0000_0000_0000 means (0,0) frame 0
0000_0000_00001 means (0,0) frame 1
..
..
0000_0000_1001 mean (0,0) frame 9

0000_0001_0000 mean (0,1) frame 0
...
1001_1001_1001 mean (9,9) frame 9

Hence you notice that memory bits from 1010->1111 are lost in i.e. 6 memory location. and when I increase the address space, this lost increases more.

Hence any comment. or suggestion

- - - Updated - - -

ACtually if you notice the loss of memory space in this manner is 8*8*8 = 512 memory space for 2^12 size memory....
Hence when I calculated fot 2000 frames of 200x200 scanning area, (keeping in mind the 28-bit memory space of DDR3 MIG) the lost of really high...

syedshan · May 26, 2013

Hi...

Any suggestion !

std_match · May 26, 2013

It is simple if you can afford to do some arithmetic operations on the addresses.

linear_address = row_address * row_size + column_address

You can still lose up to almost 50% of the memory, if the total number of "cells" is just above a power of 2.

This can also be used for more dimensions, but you will need one additional multiplier and adder for each extra dimension.

syedshan · May 27, 2013

Thank you std+match for your advice it is the same thing I have come back. Although I had to spent two nights over it...

But although I have implemented it, there is one draw back in this particular design. I am just sharing so that others might get benefit from it.

Since DDR3 trasmits data in Burst of 8/ Hence I have to implement extra logic along with the address while reading, since each time I read the frame 0 of signal (0,0) for example, it will bring back frame 0->7.
But I needed only frame- 0 hence I discarded 7- other frames, and I have to repeat the same addresses for respective frames along XY plane 8 times, which is recurrent step, I think proper cache is the best solution for this, but now is no time to implement the complex system as cache. Hence I stick to it.

Now first simulation is working fine independently. Now I am ready to integrate with MIG and DDR3 combination...Hope it works smoothly (of course some bugs might occur).

And also many thanks to TrickyDricky as well. He has many times made my difficult coding methods much easy.
Like around 6 months ago I was doing huge logics for implementing fixed point arithmetic while in actual it was the most basic mathematics...

Thanks man.!

Welcome to EDAboard.com

[SOLVED] Suggestion needed regarding LARGE size of ROM and RAMs in FPGA design

syedshan

Advanced Member level 1

TrickyDicky

Advanced Member level 7

syedshan

Advanced Member level 1

TrickyDicky

Advanced Member level 7

syedshan

syedshan

Advanced Member level 1

Ice-Tea

Full Member level 2

TrickyDicky

Advanced Member level 7

syedshan

Advanced Member level 1

TrickyDicky

Advanced Member level 7

syedshan

syedshan

Advanced Member level 1

syedshan

Advanced Member level 1

std_match

Advanced Member level 4

syedshan

syedshan

Advanced Member level 1

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics