Random number generation with normal distribution.

mrflibble · Jul 16, 2014

I'll try to keep it short and simple (loads of math crap on request).

I need 16-bits random numbers of reasonable quality (*), and the budget is only 1 flipflop and a xor gate. Okay, maybe a bit more but that's really pushing it.

*) Doesn't have to be crypto grade, but no dodgy spectral content.

The distribution has to be normal. Also, not uniform.

So a LFSR would be nice and simple, except it generates a uniform distribution...

Broadly two methods I could come up with that seem reasonable:
- run several independent LFSRs in parallel, and sum them. Wave central limit theorem wand et presto, normal distribution.
- use a lookup table to map LFSR output onto the wanted distribution

I did a few checks, and for lets say 4 summed LFSRs the distribution was not good enough. At about 8 the distribution got to be "normal enough". So that would mean running 8 LFSRs, and sum them. And obviously those LFSRs would each have a different seed.

A random number is required at 1/16th clock rate, so summing the LFSRs can be done by a single accumulator. For full rate something else would be needed, as in pipelined adders.

This is for synthesizable hardware. The simulation side was almost too easy in systemverilog. Oh and for today I think I'll stick with determinism, so no toasty warm ring oscillators.

Anyways, I was wondering if someone had this same problem (no doubt) and how they solved it.

PS: I realize I didn't give hard specs for the "non-dodgy spectral content" and "distribution is normal enough". That is 100% totally on purpose, since I'm looking for several solutions for several small projects in parallel all with somewhat different requirements. But they all want random numbers for breakfast.

barry · Jul 16, 2014

Ok, probably not the answer you're looking for, but how about a noise source(diode) and an ADC?

First you state that you want "maybe a bit more" than a single FF and gate, and then you talk about 8 LFSRs, which is way more than "a bit more".

There's a bunch of stuff on the net (where I'm sure you've already looked) but they all seem to use memory blocks and lots of resources.

mrflibble · Jul 16, 2014

barry said:
Ok, probably not the answer you're looking for, but how about a noise source(diode) and an ADC?

First you state that you want "maybe a bit more" than a single FF and gate, and then you talk about 8 LFSRs, which is way more than "a bit more".

There's a bunch of stuff on the net (where I'm sure you've already looked) but they all seem to use memory blocks and lots of resources.

*grin* You guessed right. Sampling analog noise sources is indeed not what I'm going for this time.

For other projects I use bjt transistor avalanche noise, but not for this one. The "a bit more than 1 ff + xor" was my failed attempt at humor.

A single LFSR in this case is a 33-bit shift register with 2 taps. The implementation of this isn't too bad resource wise, since you can use SRLs for a lot of the registers. But the moment you are starting to use lookup tables to remap the distribution it's going to get a bit more expensive in terms of BRAMs, as you mention. Which is why I'm hoping there's some clever way I didn't manage to find while looking for solutions.

Anyways, any and all suggestions are welcome, even if it's async logic or sampled analog noise. Good ideas tend to find a way into future projects. But for now I'm going for boring deterministic pseudo-random noise. I only need so many moving parts, and in this case I'd like to be able to reproduce everything including the noise sources.

TrickyDicky · Jul 16, 2014

put two reg's in opposite corners of the die. Put a single not gate between them to make it oscillate. Then sample the output of one of them though a double reg to avoid metastability. The oscilation should vary with temperature. Im note sure how normal the distrubution will be though.

std_match · Jul 16, 2014

You can make a parallel version of one LFSR so it can generate a new random word in one clock cycle. It is the same principle as for updating a CRC word in one clock cycle. You then have 16 random numbers you can add to generate one with the wanted distribution.
If you have high statistical requirements it may be a good idea to only use 16 bits of a longer LFSR. If the LFSR is only 16 bits the randomness is limited since all of the added 16-bit words will always be different from each other.

mrflibble · Jul 16, 2014

std_match said:
You can make a parallel version of one LFSR so it can generate a new random word each clock cycle. It is the same principle as for updating a CRC word in one clock cycle. You then have 16 random numbers you can add to generate one with the wanted distribution.

Yeah I thought of that. At least, I thought of what I think you mean there. You mean the leap forward technique right? If yes, then ... yes, but no. What that does do is get you a generator of random 16-bit words. Unfortunately, if I'm right (and I'd love to be wrong here) you cannot grab N consecutive words generated by this single, highly correlated with itself random number generator, add them and then hope for the best. Well, you can hope for the best, but AFAIK you will not get a proper normal distribution as per central limit theorem. Maybe I can get away with this for some of my modules requirements, but definitely not for all of them.

I did look at using the leap ahead method in another way as well:
- keep N words with the LFSR seeds in it
- use leap ahead LFSR's that do 16 steps
- either use a MUX or a circular buffer (aka just more shift registers) to select the seed to operate on
- operate feedback XOR/XNOR logic, and store output in buffer. also send 16-bit result to accumulator

That would then get you N uncorrelated 16-bit words that have been added by the accumulator. The idea was to save some resources by doing the logic operations on one seed at a time, thus saving you some XORs and routing. But after looking at it a bit closer that will not save you any resources. You do save on the XOR logic, but that hardly matters. What you save there you spend elsewhere because you can no longer use SRLs at several places (because you now need to access more taps for each step). And thus you will have flip-flops for all the bits, where in the single step version you could use SRLs.

If you have high statistical requirements it may be a good idea to only use 16 bits of a longer LFSR. If the LFSR is only 16 bits the randomness is limited since all of the added 16-bit words will always be different from each other.

Indeed. Currently I'm looking at using a maximal length 33-bit LFSR. So that's a 2^33-1 cycle repeat. Also, if you want to use a 16 step leap ahead then you need a longer LFSR. Otherwise you will get overlapping operators, which is a nogo. But 16 step fits just fine in a 33-bit LFSR.

std_match · Jul 17, 2014

Yes, I meant "leap ahead". The LFSR length and the "leap ahead" step size can be chosen independently.
A 33-bit LFSR sounds good. I am not sure that a "leap ahead" step of size 16 is the best choice. Maybe a longer step will give a lower correlation.
Simulations are needed. One step size to test is to have it identical to the LFSR length.

You can't use SRL together with "leap ahead" since the input to one register is not the adjacent bit in the LFSR.

mrflibble · Jul 17, 2014

std_match said:
Yes, I meant "leap ahead". The LFSR length and the "leap ahead" step size can be chosen independently.
A 33-bit LFSR sounds good. I am not sure that a "leap ahead" step of size 16 is the best choice. Maybe a longer step will give a lower correlation.
Simulations are needed. One step size to test is to have it identical to the LFSR length.

Just to be clear, since there are several things at play here.

Forgetting the adding of several results, and we only want a 16-bit number every clock cycle. For that we'd use a 33 length LFSR, with 16 steps leap ahead. That should give you 16-bit random numbers with uniform distribution. So for applications that are okay with uniform distribution ==> job done.

Now the other flavor:
- Same as above, only N times the identical 33-bit implementation, each instantiated with a different seed. The seeds are chosen such that the sequences have low correlation.
- Add those, and if all is well we have a normal distribution.

The problem with the above is that we need N pieces of hardware. Now what you describe is the middle road between the above two. Use only one LFSR, but use a LARGE leap ahead step. And indeed, 16 will not be enough there. The 16 was purely to get a 16-bit result. If you want a) the 16-bit result and b) sufficiently low correlation then I think you need a step size of quite a bit more. I'd think that 2^16 would be a starting point, give or take a few to end up on a nice prime number just to be sure. Okay, so 2^16+1 then.

Buuut, taking the transition matrix to the power of that ... you can be sure that everything XORs pretty much with everything. As in, every input will more or less depend on every other one. And using some more hand waving math and laziness, I'd guess that you get at least 33/2 (lets call it 16) XOR inputs on average for each of the taps. That's a bit much.... Can of course be done, but it's not exactly pretty. Rough guess ... about 2 or 3 pipeline stages to XOR it all together and register it. And 4 LUT6's for every input *33 = 132 LUT6's. Plus extra registers to get it pipelined.

Mmh, it does have some interesting properties, but I think for this one it is cheaper to do the N instantiations with seperate seeds. Unless of course I am missing some handy properties for the transition matrix to a large power... Could very well be.

If I am understanding it correctly, one disadvantage of the large step leap ahead is that you will have to recalculate the transition matrix (should be no biggy), and then generate the taps (also no biggy), and then translate that into hardware (might be painful, a bit depending).

The N instantiations with seperate seeds has no such problem. You "simply" pick the new appropriate seed, and you've got a new starting point along the cyclical number line.

You can't use SRL together with "leap ahead" since the input to one register is not the adjacent bit in the LFSR.

Indeed. Well, with for example a 2 step leap ahead 33-bit LFSR you still can use SRLs and get a resource advantage, but for 16-step it's a nogo.

jpanhalt · Jul 17, 2014

Have you looked at Roman Black's approach?

https://www.romanblack.com/HardRNG/HardRNG1.htm

John

mrflibble · Jul 17, 2014

jpanhalt said:
Have you looked at Roman Black's approach?

https://www.romanblack.com/HardRNG/HardRNG1.htm

John

No. But now I have...

Firstly, this RNG is SLOW. If produces 100 or 120 bits per second, which is 12.5 bytes or 15 bytes per second, depending on if your AC mains frequency is 50Hz or 60Hz.

That is far too slow. The lowest requirement I have is 16-bit random words at about a 1 MHz rate, so anything below that is a nogo. It's also not reproducable, which is definitely required.

mrflibble · Jul 17, 2014

As a followup, I did some mucking about in matlab and it's indeed roughly as expected.

When you take a 33-bit LFSR (taps at 20 and 33), then do the transition matrix to the power of (2^16+1), you get on average a little over half of all the bits in your XOR terms.

% transition matrix for 33-bit LFSR with feedback from taps (20,33)
T =

0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0.0
1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0.0
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0.0
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0.0
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0.0
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0.0
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0.0
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0.0
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0.0
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.0
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1
1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0

The first column is the XOR inputs to the 1st shift register. You can see the taps at 33 and 20 there. And the diagonal is ye olde shift action. And all the dots are there in an attempt to make it a bit more readable, nothing more.

% transition matrix for leap forward by N=2^16+1 steps
L = leappower(T, 2^16+1)

1.1.1.0.0.0.1.1.0.1.1.0.0.1.0.1.1.0.0.0.0.1.1.0.0.0.1.1.0.1.1.0.1
1.1.1.1.0.0.0.1.1.0.1.1.0.0.1.0.1.1.0.0.0.0.1.1.0.0.0.1.1.0.1.1.0
0.1.1.1.1.0.0.0.1.1.0.1.1.0.0.1.0.1.1.0.0.0.0.1.1.0.0.0.1.1.0.1.1
1.0.1.1.1.1.0.0.0.1.1.0.1.1.0.0.1.0.1.1.0.0.0.0.1.1.0.0.0.1.1.0.1
0.1.0.1.1.1.1.0.0.0.1.1.0.1.1.0.0.1.0.1.1.0.0.0.0.1.1.0.0.0.1.1.0
1.0.1.0.1.1.1.1.0.0.0.1.1.0.1.1.0.0.1.0.1.1.0.0.0.0.1.1.0.0.0.1.1
1.1.0.1.0.1.1.1.1.0.0.0.1.1.0.1.1.0.0.1.0.1.1.0.0.0.0.1.1.0.0.0.1
0.1.1.0.1.0.1.1.1.1.0.0.0.1.1.0.1.1.0.0.1.0.1.1.0.0.0.0.1.1.0.0.0
0.0.1.1.0.1.0.1.1.1.1.0.0.0.1.1.0.1.1.0.0.1.0.1.1.0.0.0.0.1.1.0.0
0.0.0.1.1.0.1.0.1.1.1.1.0.0.0.1.1.0.1.1.0.0.1.0.1.1.0.0.0.0.1.1.0
1.0.0.0.1.1.0.1.0.1.1.1.1.0.0.0.1.1.0.1.1.0.0.1.0.1.1.0.0.0.0.1.1
0.1.0.0.0.1.1.0.1.0.1.1.1.1.0.0.0.1.1.0.1.1.0.0.1.0.1.1.0.0.0.0.1
1.0.1.0.0.0.1.1.0.1.0.1.1.1.1.0.0.0.1.1.0.1.1.0.0.1.0.1.1.0.0.0.0
1.1.0.1.0.0.0.1.1.0.1.0.1.1.1.1.0.0.0.1.1.0.1.1.0.0.1.0.1.1.0.0.0
1.1.1.0.1.0.0.0.1.1.0.1.0.1.1.1.1.0.0.0.1.1.0.1.1.0.0.1.0.1.1.0.0
0.1.1.1.0.1.0.0.0.1.1.0.1.0.1.1.1.1.0.0.0.1.1.0.1.1.0.0.1.0.1.1.0
0.0.1.1.1.0.1.0.0.0.1.1.0.1.0.1.1.1.1.0.0.0.1.1.0.1.1.0.0.1.0.1.1
1.0.0.1.1.1.0.1.0.0.0.1.1.0.1.0.1.1.1.1.0.0.0.1.1.0.1.1.0.0.1.0.1
0.1.0.0.1.1.1.0.1.0.0.0.1.1.0.1.0.1.1.1.1.0.0.0.1.1.0.1.1.0.0.1.0
1.0.1.0.0.1.1.1.0.1.0.0.0.1.1.0.1.0.1.1.1.1.0.0.0.1.1.0.1.1.0.0.1
1.0.1.1.0.0.0.0.1.1.0.0.0.1.1.0.1.1.0.1.1.0.0.0.0.0.0.0.0.0.0.0.1
0.1.0.1.1.0.0.0.0.1.1.0.0.0.1.1.0.1.1.0.1.1.0.0.0.0.0.0.0.0.0.0.0
0.0.1.0.1.1.0.0.0.0.1.1.0.0.0.1.1.0.1.1.0.1.1.0.0.0.0.0.0.0.0.0.0
1.0.0.1.0.1.1.0.0.0.0.1.1.0.0.0.1.1.0.1.1.0.1.1.0.0.0.0.0.0.0.0.0
1.1.0.0.1.0.1.1.0.0.0.0.1.1.0.0.0.1.1.0.1.1.0.1.1.0.0.0.0.0.0.0.0
0.1.1.0.0.1.0.1.1.0.0.0.0.1.1.0.0.0.1.1.0.1.1.0.1.1.0.0.0.0.0.0.0
1.0.1.1.0.0.1.0.1.1.0.0.0.0.1.1.0.0.0.1.1.0.1.1.0.1.1.0.0.0.0.0.0
1.1.0.1.1.0.0.1.0.1.1.0.0.0.0.1.1.0.0.0.1.1.0.1.1.0.1.1.0.0.0.0.0
0.1.1.0.1.1.0.0.1.0.1.1.0.0.0.0.1.1.0.0.0.1.1.0.1.1.0.1.1.0.0.0.0
0.0.1.1.0.1.1.0.0.1.0.1.1.0.0.0.0.1.1.0.0.0.1.1.0.1.1.0.1.1.0.0.0
0.0.0.1.1.0.1.1.0.0.1.0.1.1.0.0.0.0.1.1.0.0.0.1.1.0.1.1.0.1.1.0.0
1.0.0.0.1.1.0.1.1.0.0.1.0.1.1.0.0.0.0.1.1.0.0.0.1.1.0.1.1.0.1.1.0
1.1.0.0.0.1.1.0.1.1.0.0.1.0.1.1.0.0.0.0.1.1.0.0.0.1.1.0.1.1.0.1.1

And as a quick check to see how many XOR terms there are for each register input:

s = sum(L,1)

18.17.18.18.18.18.17.16.16.17.16.16.16.17.17.16.17.17.17.17.17.16.15.15.15.15.14.13.13.12.11.11.12

So the 1st register has 18 XORed taps as input, and the last tap (numero 33) has 12 XORed taps as input. XORing that many terms will take 2 logic levels when using LUT6. And depending on how fast things need to go it would need extra registers to pipeline it.

While this did give me some interesting other ideas, I think a leap ahead with a large step size will cost a bit too much in terms of resources. Definitely a lot more expensive than the single step version.

Anyways, the question then becomes: what leap ahead step size is "good enough" to get a "low enough" correlation. Or put another way, what's the relation (if any) between leap ahead step size and correlation. Does anyone know of any good reading material on that subject? All the leap ahead related material I could find didn't really touch on that...

- - - Updated - - -

Okay, this one had some useful hints: **broken link removed**

As well as an amusing error in the conclusion where "method we'd like to be better for the purpose of this paper" is clearly better because "result that is not actually the closest to pi in monte carlo test" is the value closest to pi. See table 4. But it had enough info in it that I'll have to revisit my stance on leap ahead with a relatively low number of steps.

std_match · Jul 18, 2014

You should also look the other way, to increase the LFSR length and keep a low leap ahead step size.
That will save a lot of XOR gates, but I don't know how much it will reduce the correlation.

ads-ee · Jul 18, 2014

Did you take a look at the reference papers? This one seems to have a possible solution you could try with multiple SRL based LFSRs.
https://pdf.aminer.org/000/226/522/an_analysis_of_random_number_generators_for_a_hardware_implementation.pdf

The 32CA type looks like it might be a nice uncorrelated option.

Welcome to EDAboard.com

Random number generation with normal distribution.

mrflibble

Advanced Member level 5

barry

Advanced Member level 7

mrflibble

Advanced Member level 5

TrickyDicky

Advanced Member level 7

std_match

Advanced Member level 4

mrflibble

Advanced Member level 5

std_match

Advanced Member level 4

mrflibble

Advanced Member level 5

jpanhalt

Advanced Member level 3

mrflibble

Advanced Member level 5

mrflibble

Advanced Member level 5

std_match

Advanced Member level 4

ads-ee

Super Moderator

Similar threads

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Connect with us

Online statistics

Forum statistics