Continue to Site

# ASM02 - How Fast Dividing, with Rounding, a 32-Bit dividend by an 8-Bit Divisor Could be?

Status
Not open for further replies.

#### KerimF

As far as you know, is it possible to divide (by an AVR code) a 32-bit dividend by an 8-bit register, with rounding, in less than 140 cycles?
You can assume that there is not less than 1200 bytes free for this fast code (in ATmega8, for example).

Since I expect no one around here knows (or will know) how to do it (likely due to lack of interest), I will post the code next week.
ASM01 – Fast AVR Code for Dividing 16-Bit Register by 8-Bit Register (0 to 255) with Rounding | Forum for Electronics (edaboard.com)

Have fun,
Kerim

I also had to work on this division code because I did it in the past for another range than 0 to 255 (the possible values of an 8-bit register, as a divisor).

The best speed I was able to achieve is 142 cycles (instead of 140), and its size is less than 1300 bytes (instead of 1200). I failed in my own exam

Now, I let you gain 2 cycles and 100 bytes. Be happy.

Have fun,
Kerim

ldi r16, low(0x12345678) ; Load low byte of dividend
ldi r17, high(0x12345678) ; Load high byte of dividend
ldi r18, hh(0x12345678) ; Load high byte of high word of dividend
ldi r19, hhh(0x12345678) ; Load high byte of highest word of dividend
ldi r20, 0x12 ; Load divisor into register r20
clr r21 ; Clear remainder register
clr r22 ; Clear carry register

divs r16, r20 ; Divide low word of dividend by divisor
mov r23, r0 ; Move quotient into register r23
mov r24, r1 ; Move remainder into register r24
eor r25, r25 ; Clear carry flag

divs r17, r20 ; Divide next word of dividend by divisor

divs r18, r20 ; Divide next word of dividend by divisor

divs r19, r20 ; Divide highest word of dividend by divisor

; At this point, r23:r24 contains the 32-bit quotient, and r25 contains any remainder.

Credits: openAI
--- Updated ---

The div cmd uses 4 cycles

ldi r16, low(0xABCD) ; Load the low byte of the dividend
ldi r17, high(0xABCD) ; Load the high byte of the dividend
ldi r18, hh(0xABCD) ; Load the high byte of the high word of the dividend
ldi r19, 0x12 ; Load the divisor
div r19 ; Divide the 32-bit dividend by the 8-bit divisor
--- Updated ---

ldi r16, 0 ; clear the high 8 bits of the 16-bit divisor
ldi r17, 8-bit-divisor ; load the 8-bit divisor into the low 8 bits of the 16-bit divisor

movw r20, r22 ; copy the 32-bit dividend into the 16-bit registers r22:r23 and r24:r25
movw r22, r24

clr r26 ; clear the high 8 bits of the quotient
clr r27

div r20, r16 ; perform the hardware division operation, dividing the 32-bit dividend in r22:r23 and r24:r25 by the 8-bit divisor in r17

; the quotient is now in r26:r27, and the remainder is in r22:r23

Last edited:

divs r16, r20 ; Divide low word of dividend by divisor

Thank you.

So DIVS is an instruction of the MCU family of... ?
I just have ATmega8

You are right, AVR has no divide instruction. DIVS looks like Motorola 68k processor.

You are right, AVR has no divide instruction. DIVS looks like Motorola 68k processor.

It seems to me that Motorola 68k processor a CPU which needs an external memory for its instructions.
I am familiar with Z80 (in my early years of programming and by which I produced my first outdoor moving message signs using 25W/220V bulbs driven by triacs), then with AT89 family (or C51?), before ending up with AVR MCUs (when I was able downloading AVR studio. But its last version, I had the privilege to download, is 6.2).

For instance, I started writing code in machine language on paper, yes. bit by bit. since I couldn't get an assembler for Z80 (this was around year 1979). Manually, I had to enter these bits (using switches and push buttons) in order to save them in a set of SRAM ICs before saving them on an EPROM (which could be erased by ultraviolet). It took me about 3 months to debug my first code (to let it work) which can be written and debugged now in a few hours at most. Naturally, after I got an assembler for Z80 then for MCUs, I wrote all my programs in their assembly language. But I had to learn C in order to write PC programs (though I have a C compiler for DOS only) when necessary.

AVR32 code

AVR32 code

I see. And I wonder for how long I will need to live before I will have the privilege (which almost all engineers around the world have) to work with AVR32 family and the like
Fortunately, AVR8 gave me the opportunity, even by using its old ATmega8 (and ATmega32 when real necessary), to produce products which can usually compete with imported similar ones, besides the ones for special applications.

Anyway, you did well (I say this based on trust since I personally cannot check your work ). And it is not your fault that I wasn't very clear that my question is for AVR8 (I just said 'AVR code' which certainly includes AVR32).

That's OK, as I didn't check my openAI answer either, which I supplied since I suspected no one else would.

div r20, r16 ; perform the hardware division operation, dividing the 32-bit dividend in r22:r23 and r24:r25 by the 8-bit divisor in r17

; the quotient is now in r26:r27, and the remainder is in r22:r23

I guess now, I have to check the code you got, even not closely, since it is not yours

If the divisor, r17, happens to be 2 and the 32-bit dividend, r22:r25, is 2^32-1, the quotient will be 2147483648 (after rounding) which also needs four 8-bit registers, not just two, as r26:r27 which are mentioned on the comment line above.

This is where human feedback is needed to verify openAI output and give reasons why the result is bad so it learns just like humans from mistakes

This is where human feedback is needed to verify openAI output and give reasons why the result is bad so it learns just like humans from mistakes

So, I wonder how old this openAI is?
I mean I started learning from my mistakes since I perceived my special existence, as every human is special. This happened since age 14. Before this, I was usually guided by some adults (my father died when I was 9) who used to tell me what is supposed to be good or bad for me.
Till now, I am in the leaning stage

I think it is time, as I promised, to post what I did best for dividing (by an AVR8 code) a 32-bit dividend by an 8-bit register, with rounding:
For N=0 or 1, please see notes at the end of the code.

Kerim

Code:
;=============================================================
; *R* Division of 32-bit dividend {A} by 8-bit divisor {N} ***
;     and a Kd table for the {N} = 2 to 255
;=============================================================

; {R} = {A}/{N} = r31,r30,r29,r28 / {N}
;     = r31,r30,r29,r28 * {Kd} /256/256/256/256 then rounding
; {A} dividend in r19,r18,r17,r16 [0 to 2^32-1]
; {N} constant divisor in r10 [N=0 to 255, see Kd32b2B]
; {R} result in r31,r30,r29,r28 = {A}/{N} rounded
; {Kd} = round(256*256*256*256/{N},0) , the division constant
; {Kr} = INT[ (1-{N})/2 ] , the rounding constant
; not used registers: r25:r24, r9:r2
; 114 [code] + 512 [table] = 626 words, *2 = 1254 bytes
; 131, 140 or 142 cycles = 124c|133c|135c code + 4c RET + 3c RCALL

D32var8:
MOV   r26, r10              ;1abc
CPI   r26, 1                ;1abc, Note_2
BREQ  DO_same               ;1abc, Note_2

TST   r26                   ;1abc, Note_1 and Note_2
BREQ  DOoverF               ;1abc, Note_1 and Note_2

CLR   r27                   ;1abc
LSL   r26                   ;1abc
ROL   r27                   ;1abc
LSL   r26                   ;1abc
ROL   r27                   ;1abc
; r27:r26 = r26 * 4 [table index]

LDI   ZH, high(Kd32b2B*2)   ;1abc, address in bytes, 0x??00
MOV   ZL, r26               ;1abc
LPM   r20, Z+               ;3abc
LPM   r21, Z+               ;3abc
LPM   r22, Z+               ;3abc
LPM   r23, Z+               ;3abc,  [25abc]
; r23:r20 = reg_Kd

; multiplicand in r19,r18,r17,r16 = {A}
; multiplier   in r23,r22,r21,r20 = {Kd}
; mul. result  in r31:r26, r14:r13
; valid result in r31,r30,r29,r28

MUL   r16, r20              ;2abc
MOV   r13, r0               ;1abc
MOV   r14, r1               ;1abc
MUL   r17, r21              ;2abc
MOVW  r27:r26, r1:r0        ;1abc
MUL   r18, r22              ;2abc
MOVW  r29:r28, r1:r0        ;1abc
MUL   r19, r23              ;2abc
MOVW  r31:r30, r1:r0        ;1abc

MUL   r17, r20              ;2abc
MUL   r18, r20              ;2abc
MUL   r19, r20              ;2abc

MUL   r16, r21              ;2abc
MUL   r18, r21              ;2abc
MUL   r19, r21              ;2abc

MUL   r16, r22              ;2abc
MUL   r17, r22              ;2abc
MUL   r19, r22              ;2abc

MUL   r16, r23              ;2abc
MUL   r17, r23              ;2abc
MUL   r18, r23              ;2abc
ADC   r31, r15              ;1abc, +77abc [102abc]
; {B} = r31:r28 = r19:r16 * r23:r20 /256/256/256
; {B} = {A}*{Kd}/256/256/256
; {R} = {B} or {B}+1

; for rounding
; multiplicand in r31,r28 = {B}
; multiplier   in r10 = {N}
; valid result in r23,r20
MUL   r28, r10              ;2abc
MOVW  r21:r20, r1:r0        ;1abc
CLR   r22                   ;1abc
CLR   r23                   ;1abc
MUL   r29, r10              ;2abc
MUL   r30, r10              ;2abc
MUL   r31, r10              ;2abc
; {C} = r23:r20 = {B}*{N} = r31:r28 * r10

; the following conditions were deduced empirically
; if( Carry_1=0, {R}={B}, if( Zero_2=1 OR Carry_2=0, {R}={B}, {R}={B}+1 ) )

SUB   r20, r16              ;1abc
SBC   r21, r17              ;1abc
SBC   r22, r18              ;1abc
SBC   r23, r19              ;1abc
; {D1} = r23:r20 = {C} - {A} = r23:r20 - r19:r16

BRCC  DIV_ret               ;2a|1bc, +22a [124a]
; if Carry_1=0, {R}={B}

MOV   r26, r10              ;1bc
DEC   r26                   ;1bc
LSR   r26                   ;1bc
NEG   r26                   ;1bc
; r26 = reg_Kr

; r26 is negative always
SUB   r20, r26              ;1bc
SBCI  r21, 0xFF             ;1bc
SBCI  r22, 0xFF             ;1bc
SBCI  r23, 0xFF             ;1bc
; {D2} = r23:r20 = {D1} + {-Kr} = r23:r20 + {-Kr}

BRPL  DIV_ret               ;2b|1c, +31b [134b]
; if {D1} positive, {R}={B}

SBCI  r30, byte3(-1)        ;1c
SBCI  r31, byte4(-1)        ;1c, +33c [135c]
; {R}={B}+1

DIV_ret:
RET                         ;4
; {R} = r31:r28 = {B} or {B}+1 [ {A}/{N} rounded ]

DOoverF:                        ; Note 2
SER   r28                   ; Note 2
SER   r29                   ; Note 2
SER   r30                   ; Note 2
SER   r31                   ; Note 2
RET                         ; Note 2
; {R} = r31:r28 = 0xFFFFFFFF

DO_same:                        ; Note 2
MOVW  r29:r28, r17:r16      ; Note 2
MOVW  r31:r30, r19:r19      ; Note 2
RET                         ; Note 2
; {R} = r31:r28 = r19:r16

;====================================

.org 0x??00

; read Kd_L, Kd_H, Kd_U then Kd_Q
;  x [usable index] = 1 < x < 256, 1024-word table
; 256 entries * 2 words = 1024 bytes [512 words]
Kd32b2B:
.dw   0,0,0,0,0,32768,21845,21845
.dw   0,16384,13107,13107,43691,10922,18725,9362
.dw   0,8192,50972,7281,39322,6553,53620,5957
.dw   21845,5461,15124,5041,9362,4681,4369,4369
.dw   0,4096,3855,3855,58254,3640,17246,3449
.dw   52429,3276,49932,3120,59578,2978,25645,2849
.dw   43691,2730,28836,2621,40330,2520,16991,2427
.dw   37449,2340,56497,2259,34953,2184,4228,2114
.dw   0,2048,61564,1985,34696,1927,29959,1872
.dw   29127,1820,15941,1771,41391,1724,26887,1680
.dw   26214,1638,28772,1598,24966,1560,6096,1524
.dw   29789,1489,23302,1456,45590,1424,25099,1394
.dw   21845,1365,30762,1337,47186,1310,1285,1285
.dw   20165,1260,34623,1236,41263,1213,36938,1191
.dw   18725,1170,49439,1149,61016,1129,51096,1110
.dw   17476,1092,23636,1074,2114,1057,16644,1040
.dw   0,1024,16132,1008,63550,992,9781,978
.dw   50116,963,52239,949,14980,936,2769,923
.dw   14564,910,49376,897,40739,885,53303,873
.dw   20696,862,7660,851,13443,840,37331,829
.dw   13107,819,5664,809,14386,799,38690,789
.dw   12483,780,771,771,3048,762,18832,753
.dw   47663,744,23564,736,11651,728,11523,720
.dw   22795,712,45100,704,12549,697,55878,689
.dw   43691,682,41213,675,48149,668,64212,661
.dw   23593,655,57101,648,33411,642,17816,636
.dw   10082,630,9986,624,17311,618,31849,612
.dw   53400,606,16234,601,51237,595,27159,590
.dw   9362,585,63216,579,57488,574,57558,569
.dw   63276,564,8962,560,25548,555,47362,550
.dw   8738,546,40621,541,11818,537,53281,532
.dw   33825,528,18874,524,8322,520,2064,516
.dw   0,512,2032,508,8066,504,18010,500
.dw   31775,496,49275,492,4891,489,29613,485
.dw   57826,481,23918,478,58887,474,31589,471
.dw   7490,468,52057,464,34153,461,19248,458
.dw   7282,455,63728,451,57456,448,53945,445
.dw   53137,442,54980,439,59419,436,868,434
.dw   10348,431,22274,428,36598,425,53274,422
.dw   6722,420,27968,417,51433,414,11541,412
.dw   39322,409,3664,407,35600,404,4021,402
.dw   39961,399,12313,397,52113,394,28255,392
.dw   6242,390,51576,387,33154,385,16480,383
.dw   1524,381,53793,378,42184,376,32206,374
.dw   23831,372,17032,370,11782,368,8055,366
.dw   5825,364,5069,362,5761,360,7879,358
.dw   11398,356,16295,354,22550,352,30140,350
.dw   39043,348,49239,346,60707,344,7892,343
.dw   21845,341,37013,339,53375,337,5377,336
.dw   24074,334,43912,332,64874,330,21406,329
.dw   44564,327,3260,326,28550,324,54882,322
.dw   16705,321,45076,319,8908,318,39258,316
.dw   5041,315,37315,313,4993,312,39135,310
.dw   8656,309,44614,307,15925,306,53648,304
.dw   26700,303,604,302,40885,300,16459,299
.dw   58387,297,35585,296,13580,295,57895,293
.dw   37449,292,17768,291,64376,289,46193,288
.dw   28744,287,12020,286,61547,284,46244,283
.dw   31638,282,17720,281,4481,280,57449,278
.dw   45542,277,34289,276,23681,275,13710,274
.dw   4369,273,61185,271,53079,270,45579,269
.dw   38677,268,32367,267,26641,266,21492,265
.dw   16913,264,12897,263,9437,262,6527,261
.dw   4161,260,2331,259,1032,258,257,257

/*
Note_1:
If N=0, the returned result [r31:r28] is 2^32-1 = 4294967295
4294967295 is the highest possible number to replace infinity.
But, this could be replaced by setting an overflow flag, for example.

Note_2:
In case reg_N [r10] won’t be 0 or 1 for sure,
"CPI r26, 1" and " TST r26" at the beginning of the routine could be removed,
with their the codes at "DOoverF:" and "DO_same:"
*/

Status
Not open for further replies.