Continue to Site

Welcome to EDAboard.com

Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.

ASM02 - How Fast Dividing, with Rounding, a 32-Bit dividend by an 8-Bit Divisor Could be?

Status
Not open for further replies.

KerimF

Advanced Member level 5
Joined
May 17, 2011
Messages
1,556
Helped
376
Reputation
760
Reaction score
379
Trophy points
1,373
Location
Aleppo city - Syria
Activity points
13,106
As far as you know, is it possible to divide (by an AVR code) a 32-bit dividend by an 8-bit register, with rounding, in less than 140 cycles?
You can assume that there is not less than 1200 bytes free for this fast code (in ATmega8, for example).

Since I expect no one around here knows (or will know) how to do it (likely due to lack of interest), I will post the code next week.
Hint: You may like reading:
ASM01 – Fast AVR Code for Dividing 16-Bit Register by 8-Bit Register (0 to 255) with Rounding | Forum for Electronics (edaboard.com)

Have fun,
Kerim
 
I also had to work on this division code because I did it in the past for another range than 0 to 255 (the possible values of an 8-bit register, as a divisor).

The best speed I was able to achieve is 142 cycles (instead of 140), and its size is less than 1300 bytes (instead of 1200). I failed in my own exam :(

Now, I let you gain 2 cycles and 100 bytes. Be happy.

Have fun,
Kerim
 

ldi r16, low(0x12345678) ; Load low byte of dividend
ldi r17, high(0x12345678) ; Load high byte of dividend
ldi r18, hh(0x12345678) ; Load high byte of high word of dividend
ldi r19, hhh(0x12345678) ; Load high byte of highest word of dividend
ldi r20, 0x12 ; Load divisor into register r20
clr r21 ; Clear remainder register
clr r22 ; Clear carry register

divs r16, r20 ; Divide low word of dividend by divisor
mov r23, r0 ; Move quotient into register r23
mov r24, r1 ; Move remainder into register r24
eor r25, r25 ; Clear carry flag

divs r17, r20 ; Divide next word of dividend by divisor
add r23, r0 ; Add quotient to register r23
adc r24, r1 ; Add remainder to register r24
adc r25, r25 ; Add carry flag to register r25

divs r18, r20 ; Divide next word of dividend by divisor
add r23, r0 ; Add quotient to register r23
adc r24, r1 ; Add remainder to register r24
adc r25, r25 ; Add carry flag to register r25

divs r19, r20 ; Divide highest word of dividend by divisor
add r23, r0 ; Add quotient to register r23
adc r24, r1 ; Add remainder to register r24
adc r25, r25 ; Add carry flag to register r25

; At this point, r23:r24 contains the 32-bit quotient, and r25 contains any remainder.

Credits: openAI
--- Updated ---

The div cmd uses 4 cycles

ldi r16, low(0xABCD) ; Load the low byte of the dividend
ldi r17, high(0xABCD) ; Load the high byte of the dividend
ldi r18, hh(0xABCD) ; Load the high byte of the high word of the dividend
ldi r19, 0x12 ; Load the divisor
div r19 ; Divide the 32-bit dividend by the 8-bit divisor
--- Updated ---

ldi r16, 0 ; clear the high 8 bits of the 16-bit divisor
ldi r17, 8-bit-divisor ; load the 8-bit divisor into the low 8 bits of the 16-bit divisor

movw r20, r22 ; copy the 32-bit dividend into the 16-bit registers r22:r23 and r24:r25
movw r22, r24

clr r26 ; clear the high 8 bits of the quotient
clr r27

div r20, r16 ; perform the hardware division operation, dividing the 32-bit dividend in r22:r23 and r24:r25 by the 8-bit divisor in r17

; the quotient is now in r26:r27, and the remainder is in r22:r23
 
Last edited:

You are right, AVR has no divide instruction. DIVS looks like Motorola 68k processor.
 
You are right, AVR has no divide instruction. DIVS looks like Motorola 68k processor.

It seems to me that Motorola 68k processor a CPU which needs an external memory for its instructions.
I am familiar with Z80 (in my early years of programming and by which I produced my first outdoor moving message signs using 25W/220V bulbs driven by triacs), then with AT89 family (or C51?), before ending up with AVR MCUs (when I was able downloading AVR studio. But its last version, I had the privilege to download, is 6.2).

For instance, I started writing code in machine language on paper, yes. bit by bit. since I couldn't get an assembler for Z80 (this was around year 1979). Manually, I had to enter these bits (using switches and push buttons) in order to save them in a set of SRAM ICs before saving them on an EPROM (which could be erased by ultraviolet). It took me about 3 months to debug my first code (to let it work) which can be written and debugged now in a few hours at most. Naturally, after I got an assembler for Z80 then for MCUs, I wrote all my programs in their assembly language. But I had to learn C in order to write PC programs (though I have a C compiler for DOS only) when necessary.
 

AVR32 code

I see. And I wonder for how long I will need to live before I will have the privilege (which almost all engineers around the world have) to work with AVR32 family and the like :)
Fortunately, AVR8 gave me the opportunity, even by using its old ATmega8 (and ATmega32 when real necessary), to produce products which can usually compete with imported similar ones, besides the ones for special applications.

Anyway, you did well (I say this based on trust since I personally cannot check your work :) ). And it is not your fault that I wasn't very clear that my question is for AVR8 (I just said 'AVR code' which certainly includes AVR32).
 

div r20, r16 ; perform the hardware division operation, dividing the 32-bit dividend in r22:r23 and r24:r25 by the 8-bit divisor in r17

; the quotient is now in r26:r27, and the remainder is in r22:r23

I guess now, I have to check the code you got, even not closely, since it is not yours :)

For example, could you please help me understand the last line (comment)?

If the divisor, r17, happens to be 2 and the 32-bit dividend, r22:r25, is 2^32-1, the quotient will be 2147483648 (after rounding) which also needs four 8-bit registers, not just two, as r26:r27 which are mentioned on the comment line above.
 

This is where human feedback is needed to verify openAI output and give reasons why the result is bad so it learns just like humans from mistakes ;)

So, I wonder how old this openAI is?
I mean I started learning from my mistakes since I perceived my special existence, as every human is special. This happened since age 14. Before this, I was usually guided by some adults (my father died when I was 9) who used to tell me what is supposed to be good or bad for me.
Till now, I am in the leaning stage :)
 

I think it is time, as I promised, to post what I did best for dividing (by an AVR8 code) a 32-bit dividend by an 8-bit register, with rounding:
For N=0 or 1, please see notes at the end of the code.

Kerim

Code:
;=============================================================
; *R* Division of 32-bit dividend {A} by 8-bit divisor {N} ***
;     and a Kd table for the {N} = 2 to 255
;=============================================================

; {R} = {A}/{N} = r31,r30,r29,r28 / {N}
;     = r31,r30,r29,r28 * {Kd} /256/256/256/256 then rounding
; {A} dividend in r19,r18,r17,r16 [0 to 2^32-1]
; {N} constant divisor in r10 [N=0 to 255, see Kd32b2B]
; {R} result in r31,r30,r29,r28 = {A}/{N} rounded
; {Kd} = round(256*256*256*256/{N},0) , the division constant
; {Kr} = INT[ (1-{N})/2 ] , the rounding constant
; not used registers: r25:r24, r9:r2
; 114 [code] + 512 [table] = 626 words, *2 = 1254 bytes
; 131, 140 or 142 cycles = 124c|133c|135c code + 4c RET + 3c RCALL

D32var8:
    MOV   r26, r10              ;1abc
    CPI   r26, 1                ;1abc, Note_2
    BREQ  DO_same               ;1abc, Note_2

    TST   r26                   ;1abc, Note_1 and Note_2
    BREQ  DOoverF               ;1abc, Note_1 and Note_2

    CLR   r27                   ;1abc
    LSL   r26                   ;1abc
    ROL   r27                   ;1abc
    LSL   r26                   ;1abc
    ROL   r27                   ;1abc
; r27:r26 = r26 * 4 [table index]

    LDI   ZH, high(Kd32b2B*2)   ;1abc, address in bytes, 0x??00
    MOV   ZL, r26               ;1abc
    ADD   ZH, r27               ;1abc
    LPM   r20, Z+               ;3abc
    LPM   r21, Z+               ;3abc
    LPM   r22, Z+               ;3abc
    LPM   r23, Z+               ;3abc,  [25abc]
; r23:r20 = reg_Kd

; multiplicand in r19,r18,r17,r16 = {A}
; multiplier   in r23,r22,r21,r20 = {Kd}
; mul. result  in r31:r26, r14:r13
; valid result in r31,r30,r29,r28

    MUL   r16, r20              ;2abc
    MOV   r13, r0               ;1abc
    MOV   r14, r1               ;1abc
    MUL   r17, r21              ;2abc
    MOVW  r27:r26, r1:r0        ;1abc
    MUL   r18, r22              ;2abc
    MOVW  r29:r28, r1:r0        ;1abc
    MUL   r19, r23              ;2abc
    MOVW  r31:r30, r1:r0        ;1abc

    MUL   r17, r20              ;2abc
    ADD   r14, r0               ;1abc
    ADC   r26, r1               ;1abc
    MUL   r18, r20              ;2abc
    ADD   r26, r0               ;1abc
    ADC   r27, r1               ;1abc
    MUL   r19, r20              ;2abc
    ADD   r27, r0               ;1abc
    ADC   r28, r1               ;1abc

    MUL   r16, r21              ;2abc
    ADD   r14, r0               ;1abc
    ADC   r26, r1               ;1abc
    ADC   r27, r15              ;1abc
    ADC   r28, r15              ;1abc
    ADC   r29, r15              ;1abc
    MUL   r18, r21              ;2abc
    ADD   r27, r0               ;1abc
    ADC   r28, r1               ;1abc
    ADC   r29, r15              ;1abc
    MUL   r19, r21              ;2abc
    ADD   r28, r0               ;1abc
    ADC   r29, r1               ;1abc

    MUL   r16, r22              ;2abc
    ADD   r26, r0               ;1abc
    ADC   r27, r1               ;1abc
    ADC   r28, r15              ;1abc
    ADC   r29, r15              ;1abc
    ADC   r30, r15              ;1abc
    MUL   r17, r22              ;2abc
    ADD   r27, r0               ;1abc
    ADC   r28, r1               ;1abc
    ADC   r29, r15              ;1abc
    ADC   r30, r15              ;1abc
    MUL   r19, r22              ;2abc
    ADD   r29, r0               ;1abc
    ADC   r30, r1               ;1abc

    MUL   r16, r23              ;2abc
    ADD   r27, r0               ;1abc
    ADC   r28, r1               ;1abc
    ADC   r29, r15              ;1abc
    ADC   r30, r15              ;1abc
    ADC   r31, r15              ;1abc
    MUL   r17, r23              ;2abc
    ADD   r28, r0               ;1abc
    ADC   r29, r1               ;1abc
    ADC   r30, r15              ;1abc
    ADC   r31, r15              ;1abc
    MUL   r18, r23              ;2abc
    ADD   r29, r0               ;1abc
    ADC   r30, r1               ;1abc
    ADC   r31, r15              ;1abc, +77abc [102abc]
; {B} = r31:r28 = r19:r16 * r23:r20 /256/256/256
; {B} = {A}*{Kd}/256/256/256
; {R} = {B} or {B}+1

; for rounding
; multiplicand in r31,r28 = {B}
; multiplier   in r10 = {N}
; valid result in r23,r20
    MUL   r28, r10              ;2abc
    MOVW  r21:r20, r1:r0        ;1abc
    CLR   r22                   ;1abc
    CLR   r23                   ;1abc
    MUL   r29, r10              ;2abc
    ADD   r21, r0               ;1abc
    ADC   r22, r1               ;1abc
    MUL   r30, r10              ;2abc
    ADD   r22, r0               ;1abc
    ADC   r23, r1               ;1abc
    MUL   r31, r10              ;2abc
    ADD   r23, r0               ;1abc
; {C} = r23:r20 = {B}*{N} = r31:r28 * r10

; the following conditions were deduced empirically
; if( Carry_1=0, {R}={B}, if( Zero_2=1 OR Carry_2=0, {R}={B}, {R}={B}+1 ) )

    SUB   r20, r16              ;1abc
    SBC   r21, r17              ;1abc
    SBC   r22, r18              ;1abc
    SBC   r23, r19              ;1abc
; {D1} = r23:r20 = {C} - {A} = r23:r20 - r19:r16

    BRCC  DIV_ret               ;2a|1bc, +22a [124a]
; if Carry_1=0, {R}={B}

    MOV   r26, r10              ;1bc
    DEC   r26                   ;1bc
    LSR   r26                   ;1bc
    NEG   r26                   ;1bc
; r26 = reg_Kr

; r26 is negative always
    SUB   r20, r26              ;1bc
    SBCI  r21, 0xFF             ;1bc
    SBCI  r22, 0xFF             ;1bc
    SBCI  r23, 0xFF             ;1bc
; {D2} = r23:r20 = {D1} + {-Kr} = r23:r20 + {-Kr}

    BRPL  DIV_ret               ;2b|1c, +31b [134b]
; if {D1} positive, {R}={B}

    ADIW  r29:r28, 1            ;1c
    SBCI  r30, byte3(-1)        ;1c
    SBCI  r31, byte4(-1)        ;1c, +33c [135c]
; {R}={B}+1

DIV_ret:
    RET                         ;4
; {R} = r31:r28 = {B} or {B}+1 [ {A}/{N} rounded ]

DOoverF:                        ; Note 2
    SER   r28                   ; Note 2
    SER   r29                   ; Note 2
    SER   r30                   ; Note 2
    SER   r31                   ; Note 2
    RET                         ; Note 2
; {R} = r31:r28 = 0xFFFFFFFF

DO_same:                        ; Note 2
    MOVW  r29:r28, r17:r16      ; Note 2
    MOVW  r31:r30, r19:r19      ; Note 2
    RET                         ; Note 2
; {R} = r31:r28 = r19:r16

;====================================

    .org 0x??00

; read Kd_L, Kd_H, Kd_U then Kd_Q
;  x [usable index] = 1 < x < 256, 1024-word table
; 256 entries * 2 words = 1024 bytes [512 words]
Kd32b2B:
    .dw   0,0,0,0,0,32768,21845,21845
    .dw   0,16384,13107,13107,43691,10922,18725,9362
    .dw   0,8192,50972,7281,39322,6553,53620,5957
    .dw   21845,5461,15124,5041,9362,4681,4369,4369
    .dw   0,4096,3855,3855,58254,3640,17246,3449
    .dw   52429,3276,49932,3120,59578,2978,25645,2849
    .dw   43691,2730,28836,2621,40330,2520,16991,2427
    .dw   37449,2340,56497,2259,34953,2184,4228,2114
    .dw   0,2048,61564,1985,34696,1927,29959,1872
    .dw   29127,1820,15941,1771,41391,1724,26887,1680
    .dw   26214,1638,28772,1598,24966,1560,6096,1524
    .dw   29789,1489,23302,1456,45590,1424,25099,1394
    .dw   21845,1365,30762,1337,47186,1310,1285,1285
    .dw   20165,1260,34623,1236,41263,1213,36938,1191
    .dw   18725,1170,49439,1149,61016,1129,51096,1110
    .dw   17476,1092,23636,1074,2114,1057,16644,1040
    .dw   0,1024,16132,1008,63550,992,9781,978
    .dw   50116,963,52239,949,14980,936,2769,923
    .dw   14564,910,49376,897,40739,885,53303,873
    .dw   20696,862,7660,851,13443,840,37331,829
    .dw   13107,819,5664,809,14386,799,38690,789
    .dw   12483,780,771,771,3048,762,18832,753
    .dw   47663,744,23564,736,11651,728,11523,720
    .dw   22795,712,45100,704,12549,697,55878,689
    .dw   43691,682,41213,675,48149,668,64212,661
    .dw   23593,655,57101,648,33411,642,17816,636
    .dw   10082,630,9986,624,17311,618,31849,612
    .dw   53400,606,16234,601,51237,595,27159,590
    .dw   9362,585,63216,579,57488,574,57558,569
    .dw   63276,564,8962,560,25548,555,47362,550
    .dw   8738,546,40621,541,11818,537,53281,532
    .dw   33825,528,18874,524,8322,520,2064,516
    .dw   0,512,2032,508,8066,504,18010,500
    .dw   31775,496,49275,492,4891,489,29613,485
    .dw   57826,481,23918,478,58887,474,31589,471
    .dw   7490,468,52057,464,34153,461,19248,458
    .dw   7282,455,63728,451,57456,448,53945,445
    .dw   53137,442,54980,439,59419,436,868,434
    .dw   10348,431,22274,428,36598,425,53274,422
    .dw   6722,420,27968,417,51433,414,11541,412
    .dw   39322,409,3664,407,35600,404,4021,402
    .dw   39961,399,12313,397,52113,394,28255,392
    .dw   6242,390,51576,387,33154,385,16480,383
    .dw   1524,381,53793,378,42184,376,32206,374
    .dw   23831,372,17032,370,11782,368,8055,366
    .dw   5825,364,5069,362,5761,360,7879,358
    .dw   11398,356,16295,354,22550,352,30140,350
    .dw   39043,348,49239,346,60707,344,7892,343
    .dw   21845,341,37013,339,53375,337,5377,336
    .dw   24074,334,43912,332,64874,330,21406,329
    .dw   44564,327,3260,326,28550,324,54882,322
    .dw   16705,321,45076,319,8908,318,39258,316
    .dw   5041,315,37315,313,4993,312,39135,310
    .dw   8656,309,44614,307,15925,306,53648,304
    .dw   26700,303,604,302,40885,300,16459,299
    .dw   58387,297,35585,296,13580,295,57895,293
    .dw   37449,292,17768,291,64376,289,46193,288
    .dw   28744,287,12020,286,61547,284,46244,283
    .dw   31638,282,17720,281,4481,280,57449,278
    .dw   45542,277,34289,276,23681,275,13710,274
    .dw   4369,273,61185,271,53079,270,45579,269
    .dw   38677,268,32367,267,26641,266,21492,265
    .dw   16913,264,12897,263,9437,262,6527,261
    .dw   4161,260,2331,259,1032,258,257,257

/*
Note_1:
If N=0, the returned result [r31:r28] is 2^32-1 = 4294967295
4294967295 is the highest possible number to replace infinity.
But, this could be replaced by setting an overflow flag, for example.

Note_2:
In case reg_N [r10] won’t be 0 or 1 for sure,
"CPI r26, 1" and " TST r26" at the beginning of the routine could be removed,
with their the codes at "DOoverF:" and "DO_same:"
*/
 

Status
Not open for further replies.

Part and Inventory Search

Welcome to EDABoard.com

Sponsor

Back
Top