Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

Look up table for a given function

Status
Not open for further replies.

ctzof

Full Member level 3
Joined
Mar 1, 2012
Messages
157
Helped
12
Reputation
24
Reaction score
11
Trophy points
1,298
Location
Munich
Activity points
2,516
Hello,

I am rather new to FPGA design and I am having some difficulties with a design. Here is the problem:

I want to build a look up table to generate the outputs of a specific function. So I have a temperature sensor and depending on the measured temperature I want to produce an offset which is going to be add to the output signal. The offset is straightforward and follows a specific line. The temperature value is a 10-bit unsigned integer which means that is going to produce 1024 different values-offsets. Its value in the look up table is 10-bit which means that I need a space of 1024x10 =10 kbit. The thing is that I want to use the same look up table many times in my project and I am rather limited in terms of resources. What I was thinking is to use less points than 1024 and some how compute the output value when the input value is somewhere in between (I am not really sure but I think this term called linear interpolation). Is there a possible way to do that in an FPGA any ideas?
My computations are made with unsigned numbers.
 

ads-ee

Super Moderator
Staff member
Joined
Sep 10, 2013
Messages
7,807
Helped
1,810
Reputation
3,630
Reaction score
1,769
Trophy points
1,393
Location
USA
Activity points
58,934
If the offset is linear (which I'm assuming at this point) then just calculate all of them not just some of them. If the offset isn't a linear function then yes you use interpolation and that means you use math.
 

ctzof

Full Member level 3
Joined
Mar 1, 2012
Messages
157
Helped
12
Reputation
24
Reaction score
11
Trophy points
1,298
Location
Munich
Activity points
2,516
If the offset is linear (which I'm assuming at this point) then just calculate all of them not just some of them. If the offset isn't a linear function then yes you use interpolation and that means you use math.

Thanks for the answer. Unfortunately is not linear. It similar to y=e^-x function. Can you give me a guidance or a rough idea of how to do this?
 

ads-ee

Super Moderator
Staff member
Joined
Sep 10, 2013
Messages
7,807
Helped
1,810
Reputation
3,630
Reaction score
1,769
Trophy points
1,393
Location
USA
Activity points
58,934
Do what is described here. Just remember to scale after you've finished everything, you don't want to lose any precision until you've finished calculating everything. (i.e. don't throw any bits away until the end result)
 

vGoodtimes

Advanced Member level 4
Joined
Feb 16, 2015
Messages
1,089
Helped
307
Reputation
614
Reaction score
302
Trophy points
83
Activity points
8,730
For LUTs in resource constrained designs, you might want to either move away from the LUT, run the LUT at a higher clock rate, or time share the LUT.

For a Xilinx BRAM, you get 2 reads per cycle per BRAM. Thus a 10kBit LUT (BRAM18) with 10 reads will use 5 BRAM, and be duplicated 5 times. The BRAMs _can_ run up near 500-600MHz. If your normal design runs at 100MHz, this means you could provide 10 input addresses per 100MHz cycle and get 10 output values per 100MHz cycle. You would need a small amount of logic running at 500MHz, and would need to have appropriate pipelining considerations to ensure the high-speed logic works out.

If many things use the LUT, but only infrequently, you might look into some form of arbitration to provide access to a reasonable number of LUTs, but with variable latency. This is a similar idea -- serialize access to the LUT -- but it doesn't use a high speed clock. Routing and arbitration logic might become an issue.

(if you can have 512-1024 cycles of latency, you can also cycle through the entire LUT and broadcast the result.)

--edit: linear interpolation might help, but it is hard to say. you should try to shrink the LUT by a factor of two in order to make up for double reads.
 

ctzof

Full Member level 3
Joined
Mar 1, 2012
Messages
157
Helped
12
Reputation
24
Reaction score
11
Trophy points
1,298
Location
Munich
Activity points
2,516
For LUTs in resource constrained designs, you might want to either move away from the LUT, run the LUT at a higher clock rate, or time share the LUT.

For a Xilinx BRAM, you get 2 reads per cycle per BRAM. Thus a 10kBit LUT (BRAM18) with 10 reads will use 5 BRAM, and be duplicated 5 times. The BRAMs _can_ run up near 500-600MHz. If your normal design runs at 100MHz, this means you could provide 10 input addresses per 100MHz cycle and get 10 output values per 100MHz cycle. You would need a small amount of logic running at 500MHz, and would need to have appropriate pipelining considerations to ensure the high-speed logic works out.

If many things use the LUT, but only infrequently, you might look into some form of arbitration to provide access to a reasonable number of LUTs, but with variable latency. This is a similar idea -- serialize access to the LUT -- but it doesn't use a high speed clock. Routing and arbitration logic might become an issue.

(if you can have 512-1024 cycles of latency, you can also cycle through the entire LUT and broadcast the result.)

--edit: linear interpolation might help, but it is hard to say. you should try to shrink the LUT by a factor of two in order to make up for double reads.

Hi, Thanks for the answer. Is there a different approach to my problem in your opinion rather than LUT?
 

K-J

Advanced Member level 2
Joined
Jan 26, 2012
Messages
658
Helped
308
Reputation
620
Reaction score
301
Trophy points
1,343
Activity points
7,053
Hi, Thanks for the answer. Is there a different approach to my problem in your opinion rather than LUT?
'Yes' is the short answer. But if you want more detailed answers, you're going to have to define what you're doing. So far, you haven't adequately defined function, performance or constraints. Without that info, you're only going to get speculative responses.
- Function: Exactly what function are you trying to implement and over what what input domain?
- Performance: How quickly do you need things? One per clock? Multiple clock cycles? Etc.
- Constraints: Is the FPGA or the FPGA family or maybe even just the supplier chosen? If so, which one? Are there resources that are likely limited because of other stuff that you have going on in your design? For example, maybe the rest of your design is pretty much locked in and you have only one spare LUT.

Kevin Jennings
 

ctzof

Full Member level 3
Joined
Mar 1, 2012
Messages
157
Helped
12
Reputation
24
Reaction score
11
Trophy points
1,298
Location
Munich
Activity points
2,516
'Yes' is the short answer. But if you want more detailed answers, you're going to have to define what you're doing. So far, you haven't adequately defined function, performance or constraints. Without that info, you're only going to get speculative responses.
- Function: Exactly what function are you trying to implement and over what what input domain?
- Performance: How quickly do you need things? One per clock? Multiple clock cycles? Etc.
- Constraints: Is the FPGA or the FPGA family or maybe even just the supplier chosen? If so, which one? Are there resources that are likely limited because of other stuff that you have going on in your design? For example, maybe the rest of your design is pretty much locked in and you have only one spare LUT.

Kevin Jennings

So the function look like the above image. For larger values of temperature I want less offset. I have chossen my FPGA it is a Mpicrosemi Proasic3/E A3P250. As for the parameters of frequency I am not pretty sure at the moment. The main clock is going to be 10 Mhz so the frequency of operation is not so high. The thing is that I need 5 of these LUT and some of them are 14x14 bits which is translated to 229kb which is quite a lot of space I think.

Untitled.png
 

andre_teprom

Super Moderator
Staff member
Joined
Nov 7, 2006
Messages
9,240
Helped
1,151
Reputation
2,321
Reaction score
1,127
Trophy points
1,403
Location
Brazil
Activity points
53,784
How quickly do you need things? One per clock? Multiple clock cycles?

For the most varied applications, temperature change occurs at a quite small rate, so that I guess speed should not be a problem.
 

ads-ee

Super Moderator
Staff member
Joined
Sep 10, 2013
Messages
7,807
Helped
1,810
Reputation
3,630
Reaction score
1,769
Trophy points
1,393
Location
USA
Activity points
58,934
You've given a graph, but is this graph based on a relationship (e.g. a mathematical equation) or is it created based off of empirical data?

If it's measured data you may have to use some piece wise linear or curve fit algorithm to reduce the required table size and calculate the intermediate values between table points. If it is derived from an equation, well then just compute the offsets. Either way it doesn't seem like you need extremely high speed results so time sharing the resource is probably feasible.

And if you need performance you could always pipeline the algorithm(s) and stuff in 5 inputs (1/clock) and get 5 outputs after some amount of latency.
 

FvM

Super Moderator
Staff member
Joined
Jan 22, 2008
Messages
48,300
Helped
14,233
Reputation
28,727
Reaction score
12,925
Trophy points
1,393
Location
Bochum, Germany
Activity points
279,672
Designing an approximation function, e.g. table interpolation, starts with a specification of the ideal function and acceptable error amount. Having this, you can figure out how many linear segments are necessary.

You mentioned that the data is obtained from a temperature measurement, so we would expect a rather low data rate (e.g. < 1 kS/s). "use the same look up table many times" should be possible by sharing the function block in a sequential multiplex scheme.
 

HaydenDekker

Newbie level 1
Joined
May 1, 2015
Messages
1
Helped
0
Reputation
0
Reaction score
0
Trophy points
1
Activity points
6
If you're only receiving 10 bits from the ADC it's probably best to use every value so steer away from interpolation method. As FVM mentioned a temperature data-rate doesn't have to be fast so any FPGA should cater for the problem.

The best solution would be to implement an exponential function based on input value, which I don't know how to do but I'd like to know now.
 

ctzof

Full Member level 3
Joined
Mar 1, 2012
Messages
157
Helped
12
Reputation
24
Reaction score
11
Trophy points
1,298
Location
Munich
Activity points
2,516
Thanks for all the answers. As I said I am rather new to Verilog and I don't have so much expirirnce with coding. Is there a reference on how I can share the LUT block many time on the design? Also some LUT in my design are 14x14 bit=229kb and I have only 36kb of RAM in my FPGA so probably I have to stick with interpolation. :bang:

In ads-ee question
The data in the line are actually measured data or to say it more accurately precomputed offset data points so the line doesn't follow any specific equation.
 

sreevenkjan

Full Member level 5
Joined
Nov 4, 2013
Messages
268
Helped
27
Reputation
54
Reaction score
26
Trophy points
28
Location
Germany
Activity points
1,834
what do you mean exactly by share??. Do you want to reuse the remaining address lines in the LUT block or reuse the LUT block itself??
What is the length of your LUT and also the data size written into it??
 

ads-ee

Super Moderator
Staff member
Joined
Sep 10, 2013
Messages
7,807
Helped
1,810
Reputation
3,630
Reaction score
1,769
Trophy points
1,393
Location
USA
Activity points
58,934
Thanks for all the answers. As I said I am rather new to Verilog and I don't have so much expirirnce with coding. Is there a reference on how I can share the LUT block many time on the design? Also some LUT in my design are 14x14 bit=229kb and I have only 36kb of RAM in my FPGA so probably I have to stick with interpolation. :bang:

In ads-ee question
The data in the line are actually measured data or to say it more accurately precomputed offset data points so the line doesn't follow any specific equation.

Let me repeat what has been stated before...

To share a memory you either have to have a multi-port memory (FPGA support dual-port memories) or share it virtually by using time division multiplexing of the resource to share the bandwidth into the memory.

Which way you go depends on how often you have to access the memory in a given amount of time.

FYI, your real question isn't about not knowing how to code this in Verilog, it's not understanding how to architect a design to do what you want within the context of the resources available in an FPGA. To help you with that will require a detailed specification on the data rates and clock frequencies of the design along with quantity of LUTs required.
 

ctzof

Full Member level 3
Joined
Mar 1, 2012
Messages
157
Helped
12
Reputation
24
Reaction score
11
Trophy points
1,298
Location
Munich
Activity points
2,516
Let me repeat what has been stated before...

To share a memory you either have to have a multi-port memory (FPGA support dual-port memories) or share it virtually by using time division multiplexing of the resource to share the bandwidth into the memory.

Which way you go depends on how often you have to access the memory in a given amount of time.

FYI, your real question isn't about not knowing how to code this in Verilog, it's not understanding how to architect a design to do what you want within the context of the resources available in an FPGA. To help you with that will require a detailed specification on the data rates and clock frequencies of the design along with quantity of LUTs required.

I understand what you are saying. The architecture and data rates is not yet specified but the clock frequency is going to be low (6-10 Mhz), so I maybe return later with exact specifications. The choice of the final FPGA has been made is a Microsemi Proasic3 A3P250
https://www.microsemi.com/products/fpga-soc/fpga/proasic3-e#product-tables

As I said in a previous post what I am really concern with is the fact that some of the tables are 14x14bit (229kb) and the available memory of this FPGA is 36Kb which means that it doesn't fit a single table thats why I want to use the interpolation approach.
 

FvM

Super Moderator
Staff member
Joined
Jan 22, 2008
Messages
48,300
Helped
14,233
Reputation
28,727
Reaction score
12,925
Trophy points
1,393
Location
Bochum, Germany
Activity points
279,672
As I said in a previous post what I am really concern with is the fact that some of the tables are 14x14bit (229kb) and the available memory of this FPGA is 36Kb which means that it doesn't fit a single table thats why I want to use the interpolation approach.
The problem of sharing the non-linear function block between multiple channels is independent of using a direct look-up table or linear interpolation. There are of course several relations:

- using interpolation can reduce the table size by a large factor and allows separate instances for each data channel.
- the table interpolation may need to access two succeeding entries to calculate the segment slope, can be either done sequentially in two clock cycles or using both ports of a dual-port ROM. Or by making a separate slope table.
 

andre_teprom

Super Moderator
Staff member
Joined
Nov 7, 2006
Messages
9,240
Helped
1,151
Reputation
2,321
Reaction score
1,127
Trophy points
1,403
Location
Brazil
Activity points
53,784
A variable with 14bits of magnitude to store the temperature value means that you are working with a maximum resolution of 1/16,384 ( 0,006% ) which surely is unreachable for practical meters, therefore should have an optimization of the available resources of the core by proper scaling. Another point is that using 14k words to store the entire table, due to the nonlinear shape, would be expected a lot of addresses with almost the same value. You should consider to perform this task by a algebraic expression, instead of LUT.
 

FvM

Super Moderator
Staff member
Joined
Jan 22, 2008
Messages
48,300
Helped
14,233
Reputation
28,727
Reaction score
12,925
Trophy points
1,393
Location
Bochum, Germany
Activity points
279,672
You should consider to perform this task by a algebraic expression, instead of LUT.
I tend to contradict. Polynomial interpolation is an option if the function is explicitely defined this way, e.g. Pt100 or thermocouple linearisation. But the calculation is rather inconvenient with integer or fixed point arithmetic. Piecewise linear interpolation is in contrast simple and straightforward. And it can be much easier fitted to arbitrary calibration functions.
 

ctzof

Full Member level 3
Joined
Mar 1, 2012
Messages
157
Helped
12
Reputation
24
Reaction score
11
Trophy points
1,298
Location
Munich
Activity points
2,516
One of these tables (the 10x10 bit) produces the temperature offset. The other table (14x14 bit) are for different purposes.
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Top