Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

implementing a graph traversal algorithm on FPGA

Status
Not open for further replies.

doost4

Junior Member level 3
Joined
Apr 2, 2017
Messages
27
Helped
0
Reputation
0
Reaction score
0
Trophy points
1
Activity points
253
Hello everybody,

I'm going to implement a graph traversal algorithm on FPGA so that it takes a design as its input and traverse the graph from the beginning point to the end on the FPGA.

In this design every node has some information to be processed during the traversal.

How should I start to implement this algorithm? Since the graph for a design is large and it doesn't fit on the FPGA, it seems that I also have to work with external memory, right ?

Is there any similar project over the net for graph traversing algorithms like Depth-First search?

Thanks
 
Last edited:

first step would be to have the algorithm working in something like C or Matlab so you have some form of reference model.
Then architect the algorithm so that it works on an FPGA.
Your post is rather vague, and lacking in detail. What have you done so far and what are you having trouble with?
 

first step would be to have the algorithm working in something like C or Matlab so you have some form of reference model.
Then architect the algorithm so that it works on an FPGA.
Your post is rather vague, and lacking in detail. What have you done so far and what are you having trouble with?

As you mentioned, I have the algorithm written in C language. In this algorithm we have a large graph needed to be traversed, so it takes long time to perform. Because of this matter, I want to accelerate the algorithm by running on FPGA.
In fact, I want to implement a platform on FPGA for running my C code instead of running it on CPU and as you said, I don't know how to design the architecture of this platform. So I asked here for help
 

We arnt here to complete the work for you, but we can help with specific problems.
If you have no digital logic experience, then you're really going to struggle.
You might have to have a look at Xilinx HLS or Altera's OpenCL compilers, as they may be able to convert your C to an FPGA implementation. While this may not produce the most optimal solution, it should give you something that will work on an FPGA - but note you will still need an understand of how an FPGA works.

This may be something a little too advanced as a first foray into the world of FPGAs.
 

We arnt here to complete the work for you, but we can help with specific problems.
If you have no digital logic experience, then you're really going to struggle.
You might have to have a look at Xilinx HLS or Altera's OpenCL compilers, as they may be able to convert your C to an FPGA implementation. While this may not produce the most optimal solution, it should give you something that will work on an FPGA - but note you will still need an understand of how an FPGA works.

This may be something a little too advanced as a first foray into the world of FPGAs.

I didn't ask someone to complete the work for me ! All I want is to get help and know how to start to implement this framework.
I think you didn't get what I meant, I want to run a C program on FPGA instead of CPU, not converting the program.
By converting the C program to HDL there is no need for implementing a framework on FPGA and definitely it is not optimal.
 

I didn't ask someone to complete the work for me ! All I want is to get help and know how to start to implement this framework.
I think you didn't get what I meant, I want to run a C program on FPGA instead of CPU, not converting the program.
By converting the C program to HDL there is no need for implementing a framework on FPGA and definitely it is not optimal.

Any C program run on an FPGA is either run on a processor (far slower then your PC processor) or you have to use a C to gates conversion a.k.a. Xilinx HLS or Altera's OpenCL compilers already mentioned by Tricky.

If you go the route of C program to HDL you are still going to have to know how FPGAs work and as Tricky says this may be more than you can handle as a first time FPGA user. Don't be misled by the marketing hype about jumping straight from being a C software coder and becoming a full fledged FPGA expert using HLS.
 

Any C program run on an FPGA is either run on a processor (far slower then your PC processor) or you have to use a C to gates conversion a.k.a. Xilinx HLS or Altera's OpenCL compilers already mentioned by Tricky.

If you go the route of C program to HDL you are still going to have to know how FPGAs work and as Tricky says this may be more than you can handle as a first time FPGA user. Don't be misled by the marketing hype about jumping straight from being a C software coder and becoming a full fledged FPGA expert using HLS.

Well, You are right. But I think we deviated from the problem, so I re-explain it.

You know, some algorithms are very time consuming. It takes months or even years to run on a PC and it's because of the serial nature of the CPU. So, we have to accelerate these algorithms by running them on GPU, FPGA, ASIC, etc. But there are challenges. For example to run an algorithm on GPU, the algorithm should have the capability to be parallelized, and there is a shared memory bottleneck.

The algorithm I want to accelerate, graph traversal as I said, doesn't have a parallel nature. For this reason and other reasons, the best choice is to accelerate by running on FPGA.

By running on FPGA, I mean running the ALGORITHM on FPGA, not making compiler and interpreter for the C language! As my friends mentioned before, we can convert C program to HDL or RTL by some tools, but I don't think this really helps because there might be no negligible speedup.

I want to build a best optimal unit on FPGA for running this algorithm by means of VHDL coding and I asked for help for this reason.
 

If the algorithm cannot be parrellelised, then it will be slow, or even slower on an FPGA. FPGAs work best with large data sets that can be parrallelised or efficiently pipelined.
If you do not understand VHDL, or FPGA fabric, then you need to start by getting a good book/training course on digital logic design, otherwise you will go nowhere. writing VHDL is nothing at all like software, but requires a good understanding of the underlying hardware.

So - my advice, learn VHDL and FPGA archutecture first (this could be 12+months study) before attempting to convert an algorithm to VHDL.
 
  • Like
Reactions: doost4

    doost4

    Points: 2
    Helpful Answer Positive Rating
If the algorithm cannot be parrellelised, then it will be slow, or even slower on an FPGA. FPGAs work best with large data sets that can be parrallelised or efficiently pipelined.
If you do not understand VHDL, or FPGA fabric, then you need to start by getting a good book/training course on digital logic design, otherwise you will go nowhere. writing VHDL is nothing at all like software, but requires a good understanding of the underlying hardware.

So - my advice, learn VHDL and FPGA archutecture first (this could be 12+months study) before attempting to convert an algorithm to VHDL.

That's what I was talking about!

I've done some works with Xilinx FPGAs and somehow I'm familiar with VHDL. In this particular case, I don't understand how to convert the algorithm steps to digital blocks, it's really confusing. For instance, you can code the graph nodes easily with linked list in C and you traverse the nodes one by one. But what is the equal block for describing a node in VHDL? :thinker:
 

There is no "equal" block, and without knowing your system we cannot really comment in what you need. Linked lists are a software concept, and are a few layers up the abstraction. Underneath they are just lists of memory addresses and some data.
 

This means nothing to most people, including me. Why not post a link to a paper?
 

That's what I was talking about!

I've done some works with Xilinx FPGAs and somehow I'm familiar with VHDL. In this particular case, I don't understand how to convert the algorithm steps to digital blocks, it's really confusing. For instance, you can code the graph nodes easily with linked list in C and you traverse the nodes one by one. But what is the equal block for describing a node in VHDL? :thinker:

Algorithms that have things like linked lists and such, which are easily implemented in software are typically not something a novice at FPGAs should take on.

I've converted software algorithms into hardware and they end up looking like some sort of microcode machine (basically a dedicated hand made processor for the specific task).

Other algorithms end up being a huge very complex pipelined design that runs at some multiple of the clock (due to things like read modify write requirements of the data being manipulated).

In either case they can potentially run faster than a software implementation running on a multi-GHz processor, but they are hardly something a novice can tackle. The last two designs that I implemented were really software algorithms both had >20 stage pipelines with one having 3 phases of the clock used for various operations read/calc/write to RAM for each piece of processed data along with that pipeline (i.e. >60 clock cycles before the operation was completed on the first input data). Tracking all the stages and the keeping the pipeline was a headache. It would have been better to do those designs with simulink or HLS, but in both cases they were not available, so I had to code it all in Verilog, not something I would recommend doing. If your design requires information based on previous data then you might have an even more clock cycles than my 3 clocks per data, basically you need to store stuff for later use by the pipeline, it can get very convoluted very quickly.

My recommendation if it can be done using a higher level of abstraction, i.e. HLS, or simulink then do it. Take the hit on performance. If you go the route I used to "optimize" the design you will be in way over your head as a novice at FPGAs.
 
  • Like
Reactions: doost4

    doost4

    Points: 2
    Helpful Answer Positive Rating
Algorithms that have things like linked lists and such, which are easily implemented in software are typically not something a novice at FPGAs should take on.

I've converted software algorithms into hardware and they end up looking like some sort of microcode machine (basically a dedicated hand made processor for the specific task).

Other algorithms end up being a huge very complex pipelined design that runs at some multiple of the clock (due to things like read modify write requirements of the data being manipulated).

In either case they can potentially run faster than a software implementation running on a multi-GHz processor, but they are hardly something a novice can tackle. The last two designs that I implemented were really software algorithms both had >20 stage pipelines with one having 3 phases of the clock used for various operations read/calc/write to RAM for each piece of processed data along with that pipeline (i.e. >60 clock cycles before the operation was completed on the first input data). Tracking all the stages and the keeping the pipeline was a headache. It would have been better to do those designs with simulink or HLS, but in both cases they were not available, so I had to code it all in Verilog, not something I would recommend doing. If your design requires information based on previous data then you might have an even more clock cycles than my 3 clocks per data, basically you need to store stuff for later use by the pipeline, it can get very convoluted very quickly.

My recommendation if it can be done using a higher level of abstraction, i.e. HLS, or simulink then do it. Take the hit on performance. If you go the route I used to "optimize" the design you will be in way over your head as a novice at FPGAs.

Thank you for your complete explanation dear @ads-ee .
Now I've got the concept. It seems that it's kind of designing a simple specific-task processor, and of course it has the complexities as you mentioned. Beside the challenges you mentioned, I have to deal with feedback loops and it makes it even more complex.

So your suggestion is to convert the algorithm by means of HLS or any similar way, but do you think it really takes effect on performance?
 

So your suggestion is to convert the algorithm by means of HLS or any similar way, but do you think it really takes effect on performance?

If your algorithm cant be parallelized in any meaningful way, then pipelining the entire algorithm can improve the performance. What you are doing is not making the design parallel as making parallel copies of the design that work on each input in parallel.
 
  • Like
Reactions: doost4

    doost4

    Points: 2
    Helpful Answer Positive Rating
Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top