Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

In need of any computational project for FPGA you could share

Status
Not open for further replies.

Alexium

Full Member level 2
Joined
Jan 6, 2011
Messages
148
Helped
39
Reputation
78
Reaction score
39
Trophy points
1,308
Location
Ukraine
Activity points
2,163
Greetings.
I'm doing my fibal year project. It's about accelerating computations with FPGA (reconfigurable coprocessor for PC). Right now I'm creating a framework for such a coprocessor, and then I must must implement some number of computational problems and compare their performance on PC with the performance on FPGA.
Obviously, I don't have time to create a number of complex projects. So, if you have some project in mind that you can share with me - I would appreciate it greatly. It could be anything that may be considered a purely computational task and can also be done on PC: DSP projects, N-body simulation, cryptographic projects, RNG...
P.S. I realize no one is going to give me their intellectual property, but if someone is considering it - I assure you the projects I'm asking for are for my eyes only, the final year project report will only contain brief description and performance charts.
 
Last edited:

I think you can download lot of cores from opencores.org.

May be for this purpose download some arithmetic cores.
 

Of course. But arithmetic cores do not solve any particular task, that's the problem.
 

You could also look into RBM's (Restricted Boltzmann Machines). Those are pretty computation intensive and can benefit a lot from the inherent parallelism of fpga's.

There are already some comparisons out there for RBM's on CPU vs GPU vs FPGA for example.
 
Thanks! Never heard of that domain, though I did study neural networks...
 

Right now I'm creating a framework for such a coprocessor ...

Would this framework be adaptable to OpenCL spec? That would be awesomely useful! Taking for example the RBM's being able to choose where to run what (cpu/gpu/fpga) would be a very powerful way to partition the job at a fairly high level of abstraction. Read: Make it easy enough to dispatch the right part of the job to the right tool.

I'm thinking of a general purpose PC here with a modern graphics card and a PCI-e fpga board. You can put together a pretty high performace cpu/gpu/fpga system for ~ $1000 these days.
 
Would this framework be adaptable to OpenCL spec? That would be awesomely useful!
I hear you. To be honest, I wasn't thinking about OpenCL, not yet.
First things first, I have to make something that would even work :) Currently I'm working on it. It would be a simple infrastructure of software API (written in C or C++), hardware modules, and, probably, special PC software tool. The way I currently see it:
a) you have an HDL project (a bunch of modules with single top-level interface module);
b) you want to implement it into FPGA with the ability to access that module from PC;
c) you take my software tool and let it analyze your top-level interface. Maybe, you also specify some settings;
d) this tool creates a projects and automatically implements it into FPGA (I have Xilinx FPGA board so I will initially support Xilinx ISE for this step);
e) the tool also creates C/C++ header file containing functions to access your hardware module (essentially, set() function to sent the data on module's inputs and get() function to read results back);
f) you include this header into your program, make API calls where neccessary, write data processing and so on; compile the program, link it with my API library, run it and have performance gain over pure CPU program.

This would only be intital release, to be actually able to run programs, test the system and make some performance-related research. I hope I'll have the chance to evolve the system much further. But only if I make my final year project in time, otherwise I'm doomed forever. And I need a month or so to finish the first working prototype, while the deadline for project is about 2 months. And it won't be enough to get the framework working - I need to research the domain, that is - I need some computational projects to actually implement for FPGA, to implement for PC and to compare performance. That's where this thread comes from.

---------- Post added at 17:47 ---------- Previous post was at 17:29 ----------

Now, as for OpenCL. The way I see it:
1) OpenCL is good for large homogenous arrays of fixed ALUs with large fast memory access(read: for GPUs). Trying to directly recreate such a structure would be like playing away match - no chance against GPUs.
BTW, memory access and overall data bandwidth is my pet peeve since my board doesn't support PCI and even have some problems with Ethernet. Currently I'm using RS232, which is indeed a dead end.
2) To enable OpenCL for FPGA, I'm going to need OpenCL (read: c) to HDL compiler (translator), which is whole separate story. I'm actually considering writing simple C/C++ to HDL translator (for arithmetic operations) to enable rapid and easy accelerating of existing programs. My supervisor is working on it, and there is such a perspective, but no chance we'll achieve that in the next half a year (to say at least).
3) FPGA's strong ability is total reconfigurability. The most effective way to utilize FPGA's resource is to use an HDL project that was initially created for FPGA, without any fixed architectures in mind. That's why I decided to design the system the way I described above.
Bottomline is: currently, I have no idea how to support OpenCL, but if I'll meet the current deadline I will indeed give it a thought.

I'm thinking of a general purpose PC here with a modern graphics card and a PCI-e fpga board. You can put together a pretty high performace cpu/gpu/fpga system for ~ $1000 these days.
I'm not sure as for $1000, prices for good FPGAs are climbing up to $500 and higher. It's more like 1200-1300, which might also be acceptable.

P.S. With the current concept of a system it's no match for GPUs, but I was hoping for problems that don't fit GPUs well. Not quite sure what those problems are, though...
 
Last edited:

Any input or advice on the matter will be appreciated.
 

I hear you. To be honest, I wasn't thinking about OpenCL, not yet.
First things first, I have to make something that would even work :) Currently I'm working on it. It would be a simple infrastructure of software API (written in C or C++), hardware modules, and, probably, special PC software tool. The way I currently see it:
a) you have an HDL project (a bunch of modules with single top-level interface module);
b) you want to implement it into FPGA with the ability to access that module from PC;
c) you take my software tool and let it analyze your top-level interface. Maybe, you also specify some settings;
d) this tool creates a projects and automatically implements it into FPGA (I have Xilinx FPGA board so I will initially support Xilinx ISE for this step);
e) the tool also creates C/C++ header file containing functions to access your hardware module (essentially, set() function to sent the data on module's inputs and get() function to read results back);
f) you include this header into your program, make API calls where neccessary, write data processing and so on; compile the program, link it with my API library, run it and have performance gain over pure CPU program.

That would be a neat thing! Of course the API you are now describing sounds like a subset of OpenCL though. That is, if I understand things correctly... If that is the case, why reinvent the wheel? I'd suggest at least reading the opencl spec. At the very least you can grab some inspiration, and maybe you could even use that inspiration to implement the specific opencl functions that provide you with the functionalities you need. That way you get all you want without expended extra effort, AND you can decide later to add the rest to make it meet full spec should you so decide later on.

The main drawback would be if you have to implement too much surrounding fluff just to get the wrappers working so to speak. Anyways, personally I would check it out and see if it makes sense for the project.


This would only be intital release, to be actually able to run programs, test the system and make some performance-related research. I hope I'll have the chance to evolve the system much further. But only if I make my final year project in time, otherwise I'm doomed forever. And I need a month or so to finish the first working prototype, while the deadline for project is about 2 months. And it won't be enough to get the framework working - I need to research the domain, that is - I need some computational projects to actually implement for FPGA, to implement for PC and to compare performance. That's where this thread comes from.

Understood. In which case you only have so much time to research any cool specific problems (like RBM's). Doing a good RBM implementation is a whole project on it's own, so your 2 months timetable would be a serious challenge if you had to start researching RBM's today.

Did a bit of thinking of what more problems fit the description... Basically any binary combinatorial logic task that benefits from large scale parallel operation really... Mmmmh, genetic algo implementation of traveling salesman? The benefit of traveling salesman is that the problem domain is well understood and you can read a lot about it. That always helps on the timetable. :p

Now, as for OpenCL. The way I see it:
1) OpenCL is good for large homogenous arrays of fixed ALUs with large fast memory access(read: for GPUs). Trying to directly recreate such a structure would be like playing away match - no chance against GPUs.
BTW, memory access and overall data bandwidth is my pet peeve since my board doesn't support PCI and even have some problems with Ethernet. Currently I'm using RS232, which is indeed a dead end.
2) To enable OpenCL for FPGA, I'm going to need OpenCL (read: c) to HDL compiler (translator), which is whole separate story. I'm actually considering writing simple C/C++ to HDL translator (for arithmetic operations) to enable rapid and easy accelerating of existing programs. My supervisor is working on it, and there is such a perspective, but no chance we'll achieve that in the next half a year (to say at least).
3) FPGA's strong ability is total reconfigurability. The most effective way to utilize FPGA's resource is to use an HDL project that was initially created for FPGA, without any fixed architectures in mind. That's why I decided to design the system the way I described above.
Bottomline is: currently, I have no idea how to support OpenCL, but if I'll meet the current deadline I will indeed give it a thought.


Memory access and overall bandwidth happens to be a pet peeve of mine as well. Yes, a pet peeve. I have several. :p All the boards I have are max 1 Gbit link (Atlys has gbit ethernet). Now 1 gbit is nice enough for a lot of things, but for some other stuff I'd really like the bandwidth of PCI-e x16. Yeah, not going to happen anytime soon I'm afraid. ;)


I'm not sure as for $1000, prices for good FPGAs are climbing up to $500 and higher. It's more like 1200-1300, which might also be acceptable.

I was thinking of a 500/500 split, but I will admit I was thinking in euros. But hey, $$ == euro for computer stuff, right?


P.S. With the current concept of a system it's no match for GPUs, but I was hoping for problems that don't fit GPUs well. Not quite sure what those problems are, though...

Well, the thing is that GPU's are really good at floating point. So anything that is parallel and binary combinatorial and definitely not floating point would show the difference between fpga and gpu implementations. What you could also do is hop by the math department and ask if they know of any cool combinatorial things like that.
 
All valid arguments, as for OpenCL. I'll see into it. I did read OpenCL spec, and I've written several small programs, so I am a bit familiar with it.
Mmmmh, genetic algo implementation of traveling salesman? The benefit of traveling salesman is that the problem domain is well understood and you can read a lot about it. That always helps on the timetable. :p
What you could also do is hop by the math department and ask if they know of any cool combinatorial things like that.
Good thinking, thanks!
I was also thinking about NP-complete problems that require brute-force search. Like finding password by it's hash - that's a very practical task, I figure I could even get money from certain people for it :)
I also like simple (algorithm-wise) tasks like searching for prome numbers, generating random numbers and so on, but those are meaningless on their own.

Memory access and overall bandwidth happens to be a pet peeve of mine as well. Yes, a pet peeve. I have several. :p All the boards I have are max 1 Gbit link (Atlys has gbit ethernet). Now 1 gbit is nice enough for a lot of things, but for some other stuff I'd really like the bandwidth of PCI-e x16.
You are lucky, to at least have that! My board has 10/100 PHY, and it seems to malfunction (either this, or that's my brain mailfunctioning :) )

But hey, $$ == euro for computer stuff, right?
I don't know, in your country - possibly, but in mine euro have more buying power than $$, and significantly more.

Well, the thing is that GPU's are really good at floating point.
Oh, about that! I've found out that FPULib (the most popular VHDL floating-point library) also has LNS subset (logarithmic number system). Because those numbers are logarithmic, multiplication and division is essentially addition and subtraction. Addition and subtraction itself in LNS is tricky and require from 2 to 3 additional digits to produce same or less error than IEEE 754 FP, but this LNS is fast and small. I'd like to try it out. Still, it doesn't seem to be popular, didn't manage to figure out why... What do you think about it?

---------- Post added at 17:09 ---------- Previous post was at 17:08 ----------

P.S. Thank you for your time and effort in helping with my problem, I really appreciate it.
 

All right, I have finished writing and testing the basic functionality (never thought it will take a month!).
Have almost no idea as to what problems I can actually accelerate with my coprocessor. Couldn't find HDL implementation for any genetic algorithm, and couldn't think of any other good tasks.
Suggestions?..
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top