BTW I think there is no machine that bears 10e7 bits, this will be extra-ordinary for an ordinary machine.
You may try this trick, it crossed my mind now and I've never tried it:
When reaching the SNR with 10e-6 BER expected, start an internal loop with 10 separate bit streams, each with 10e6 length, create 10 10e6 AWGN arrays and add every one to one of the 10 bit stream arrays. At the receiver, demodulate every array of the ten separately, count the errors and the whole 10 of them, divide by 10e7 to get the BER !
This is quite confusing, but consider this example (the numbers are hypothetical):
for SNR BER Stream size
-1 dB 10e-1 100
0 dB 10e-2 1000
2 dB 10e-3 10000
3 dB 10e-4 1e5
4 dB 10e-5 1e6
here comes the problem:
5 dB 10e-6 1e6,1e6,1e6....1e6 (Ten seperate streams are sent)
So instead of having your main loop with 6 iterations for this example, you'll need 15 iterations with the last ten results to be cumulative, you can then avoid having a 10e7 bit stream
I based this technique on the assumption that the random processes are ergodic. The other way around is to simply extra-polate !
Regards!