# Assembly Language Programming

## August 3, 2011

### Example PDP-11 Programs ( CRC16, TEA, MD4 )

This post contains example PDP-11 assembly language programs for the CRC-16, TEA and MD4 algorithms.

## CRC-16

The program implements the Direct Table method to calculate the CRC-16 of the input message. First the function CALC_T is called to generate the table. Then the table is used to compute the CRC.

The code uses the IBM polynomial of 0x1005, an initial XOR value of 0xFFFF and a final XOR value of 0x0000.

The input message used in the code below is: abcdef123456 (with a length of 12). The message is stored in memory beginning from the location labelled as START, while the length is stored at M_LEN. The result is labelled as CRC, and can be seen in the watch view once the code has been compiled and run. For the input message given here, the CRC is 0xCEBA (or BA CE in little endian).

 ``` ; Calc the CRC-16 of a message: ; generate the table JSR r5 CALC_T ; init the CRC mov INIT CRC ; the start of the message mov #START r0 ; use r3 as the byte counter clr r3 loop: ; has the last byte been processed? cmp r3 M_LEN beq done ; get the upper CRC byte mov CRC r1 clrb r1 swab r1 ; get the next message byte clr r2 movb (r0)+ r2 ; calc the table index xor r2 r1 ; shift the upper CRC byte out (discard it) swab CRC clrb CRC ; get the table element asl r1 add #TABLE r1 mov (r1) r2 ; update the CRC xor r2 CRC ; increment r3 inc r3 br loop done: ; xor in the FINAL value mov FINAL r2 xor r2 CRC halt ; the subroutine to generate the table CALC_T: ; r1 holds the index into the table ; start at the end of the table mov #TABLE r1 add #510d r1 ; put the Poly into r2 mov POLY r2 ; outer loop outer: ; set the index byte N as the upper byte in r3 mov N r3 swab r3 ; r4 is the bit counter mov #10 r4 ; inner loop inner: ; shift out the msb of N into the C bit ; xor in POLY if the C bit = 1 asl r3 bcc next xor r2 r3 next: ; decrement the bit counter sob r4 inner ; add r3 (the result) to the table mov r3 (r1) ; decrement the table pointer sub #2 r1 ; decrement the index byte N dec N ; continue as long as N >= 0 bpl outer ; return RTS r5 ; the data N: .word 255d POLY: .word 100005 INIT: .word 177777 FINAL: .word 0 TABLE: .blkw 256d ; the message START: .ascii "abcdef123456" M_LEN: .word 12d ; the CRC CRC: .word 0 .end 0 ```

## TEA – Tiny Encryption Algorithm

The code below implements the Tiny Encryption Algorithm (the original version). TEA encrypts data in 64 bit blocks and has a 128 bit key. The code below uses the 64 bit hexadecimal number input message: 1234567890abcdef, with the key: 11223344556677889900aabbccddeeff. The 64 bit result of this encryption is: 8df5 d71f 9c3b e43f.

The program first encrypts the input and then reverses it to recover the original 64 bit input. The input message is stored at V_0 and V_1, the cipher text at CTX and the recovered plain text at PTX. Once the program has been compiled and run, the state of the CTX and PTX memory locations should be:

 ```CTX_0H 0224 8df5 CTX_0L 0226 d71f CTX_1H 0228 9c3b CTX_1L 022a e43f PTX_0H 022c 1234 PTX_0L 022e 5678 PTX_1H 0230 90ab PTX_1L 0232 cdef ```

The PDP-11 assembly language code for TEA:

 ``` ; V[0], V[1] is the 64 bit input vector mov V_0H Y_HIGH mov V_0L Y_LOW mov V_1H Z_HIGH mov V_1L Z_LOW ; encode V[0], V[1] jsr r7 ENCODE ; CTX is the cipher text mov Y_HIGH CTX_0H mov Y_LOW CTX_0L mov Z_HIGH CTX_1H mov Z_LOW CTX_1L ; reverse the encryption jsr r7 DECODE ; PTX is the recovered plain text ; should be the same as V[0], V[1] mov Y_HIGH PTX_0H mov Y_LOW PTX_0L mov Z_HIGH PTX_1H mov Z_LOW PTX_1L halt ; encode subroutine ENCODE: mov N r4 clr SUM_L clr SUM_H loop: ; sum += delta add DEL_L SUM_L adc SUM_H add DEL_H SUM_H ; Y1 = ((z<<4) + key[0]) mov Z_HIGH r2 mov Z_LOW r3 ashc #4 r2 add KEY_0L r3 adc r2 add KEY_0H r2 ; Y2 = (z + sum) mov Z_HIGH r0 mov Z_LOW r1 add SUM_L r1 adc r0 add SUM_H r0 ; Y4 = Y1 xor Y2 xor r0 r2 xor r1 r3 ; Y3 = ((z>>5) + key[1]) mov Z_HIGH r0 mov Z_LOW r1 ; right shift = sign bit is replicated ashc #-5 r0 ; use BIC to clear unwanted replication bic #174000 r0 add KEY_1L r1 adc r0 add KEY_1H r0 ; Y4 = Y4 xor Y3 xor r0 r2 xor r1 r3 ; Y += Y4 add r3 Y_LOW adc Y_HIGH add r2 Y_HIGH ; Z1 = ((y<<4) + key[2]) mov Y_HIGH r2 mov Y_LOW r3 ashc #4 r2 add KEY_2L r3 adc r2 add KEY_2H r2 ; Z2 = (y + sum) mov Y_HIGH r0 mov Y_LOW r1 add SUM_L r1 adc r0 add SUM_H r0 ; Z4 = Z1 xor Z2 xor r0 r2 xor r1 r3 ; Z3 = ((y>>5) + key[3]) mov Y_HIGH r0 mov Y_LOW r1 ; right shift = sign bit is replicated ashc #-5 r0 ; use BIC to clear unwanted replication bic #174000 r0 add KEY_3L r1 adc r0 add KEY_3H r0 ; Z4 = Z4 xor Z3 xor r0 r2 xor r1 r3 ; Z += Z4 add r3 Z_LOW adc Z_HIGH add r2 Z_HIGH dec r4 bne loop rts r7 ; decode subroutine DECODE: mov N r4 ; sum = delta << 5 mov DEL_H r0 mov DEL_L r1 ashc #5 r0 mov r0 SUM_H mov r1 SUM_L d_loop: ; Z1 = ((y<<4) + key[2]) mov Y_HIGH r2 mov Y_LOW r3 ashc #4 r2 add KEY_2L r3 adc r2 add KEY_2H r2 ; Z2 = (y + sum) mov Y_HIGH r0 mov Y_LOW r1 add SUM_L r1 adc r0 add SUM_H r0 ; Z4 = Z1 xor Z2 xor r0 r2 xor r1 r3 ; Z3 = ((y>>5) + key[3]) mov Y_HIGH r0 mov Y_LOW r1 ; right shift = sign bit is replicated ashc #-5 r0 ; use BIC to clear unwanted replication bic #174000 r0 add KEY_3L r1 adc r0 add KEY_3H r0 ; Z4 = Z4 xor Z3 xor r0 r2 xor r1 r3 ; Z -= Z4 sub r3 Z_LOW sbc Z_HIGH sub r2 Z_HIGH ; Y1 = ((z<<4) + key[0]) mov Z_HIGH r2 mov Z_LOW r3 ashc #4 r2 add KEY_0L r3 adc r2 add KEY_0H r2 ; Y2 = (z + sum) mov Z_HIGH r0 mov Z_LOW r1 add SUM_L r1 adc r0 add SUM_H r0 ; Y4 = Y1 xor Y2 xor r0 r2 xor r1 r3 ; Y3 = ((z>>5) + key[1]) mov Z_HIGH r0 mov Z_LOW r1 ; right shift = sign bit is replicated ashc #-5 r0 ; use BIC to clear unwanted replication bic #174000 r0 add KEY_1L r1 adc r0 add KEY_1H r0 ; Y4 = Y4 xor Y3 xor r0 r2 xor r1 r3 ; Y -= Y4 sub r3 Y_LOW sbc Y_HIGH sub r2 Y_HIGH ; sum -= delta sub DEL_L SUM_L sbc SUM_H sub DEL_H SUM_H dec r4 bne d_loop rts r7 ; data Y_HIGH: .word Y_LOW: .word Z_HIGH: .word Z_LOW: .word SUM_H: .word SUM_L: .word DEL_H: .hex 9e37 DEL_L: .hex 79b9 KEY_0H: .hex 1122 KEY_0L: .hex 3344 KEY_1H: .hex 5566 KEY_1L: .hex 7788 KEY_2H: .hex 9900 KEY_2L: .hex aabb KEY_3H: .hex ccdd KEY_3L: .hex eeff N: .word 32d V_0H: .hex 1234 V_0L: .hex 5678 V_1H: .hex 90ab V_1L: .hex cdef CTX_0H: .word CTX_0L: .word CTX_1H: .word CTX_1L: .word PTX_0H: .word PTX_0L: .word PTX_1H: .word PTX_1L: .word .end 0 ```

## MD4 Message Digest

The MD4 Message Digest Algorithm is specified in RFC 1320:

The code below uses the test input string of: abcdefghijklmnopqrstuvwxyz with a bit length of 208. The resulting message digest is stored in the memory locations labelled A_L to D_H. The result is best viewed in little endian byte order (as it is given in RFC 1320). This can be done by entering A_L into the input text box next to the Jump button, and then pressing [Jump]. The top line of the display should be:

d7 9e 1c 30 8a a5 bb cd ee a8 ed 63 df 41 2d a9

Note the use of the function pointer F_ptr in the code below. F_ptr is used to select either the F(x,y,z), G(x,y,z) or H(x,y,z) functions. This, together with a couple of other tricks, allows the the same code to be called for each of the 3 rounds of MD4.

 ``` main: ; calc number of bytes in message = BitLen/8 mov BitLen r0 asr r0 asr r0 asr r0 mov r0 nBytes ; calc r0 mod 64 mov r0 r1 ash #-5 r1 ash #5 r1 ; test the quotient Q tst r1 beq PAD ; subtract Q from r0 sub r1 r0 PAD: ; handle the case where r0 = 64 cmp #64d r0 bne PAD_1 ; 64 mod 64 = 0 clr r0 PAD_1: ; r0 is now r0 mod 64 ; calc padded length ; nBytes = nBytes + 64 - (nBytes mod 64) add #64d nBytes sub r0 nBytes ; is there enough room to insert the message length (in bits)? cmp #55d r0 bpl PAD_2 ; add extra padding add #64d nBytes PAD_2: ; calc the pos of the first padding bit mov BitLen r0 asr r0 asr r0 asr r0 add #INPUT r0 ; set the first padding bit bisb #200 (r0) ; append the message bit length mov nBytes r0 add #INPUT r0 sub #8d r0 ; insert bit length (in little endian) mov BitLen r1 mov r1 (r0) ; process the input buffer ; the current pos within the buffer mov #INPUT X_ptr NEXT: ; save the current state ABCD mov A_H AA_H mov A_L AA_L mov B_H BB_H mov B_L BB_L mov C_H CC_H mov C_L CC_L mov D_H DD_H mov D_L DD_L ; the 3 rounds of MD4 jsr r7 RND_1 jsr r7 RND_2 jsr r7 RND_3 ; update the state ABCD add AA_L A_L adc A_H add AA_H A_H add BB_L B_L adc B_H add BB_H B_H add CC_L C_L adc C_H add CC_H C_H add DD_L D_L adc D_H add DD_H D_H ; get the next 64 byte block add #64d X_ptr ; has the entire buffer been processed? mov #INPUT r0 add nBytes r0 cmp r0 X_ptr bne NEXT halt ; Round 1 of MD4 RND_1: mov #F_md4 F_ptr mov #XR_1 XR clr T_H clr T_L clr xIndex mov #4 Count loop_1: clr sIndex jsr r7 Next_A jsr r7 Next_D jsr r7 Next_C jsr r7 Next_B dec Count bne loop_1 rts r7 ; Round 2 of MD4 RND_2: mov #G_md4 F_ptr mov #XR_2 XR mov ggT_H T_H mov ggT_L T_L clr xIndex mov #4 Count loop_2: mov #4 sIndex jsr r7 Next_A jsr r7 Next_D jsr r7 Next_C jsr r7 Next_B dec Count bne loop_2 rts r7 RND_3: ; Round 3 of MD4 mov #H_md4 F_ptr mov #XR_3 XR mov hhT_H T_H mov hhT_L T_L clr xIndex mov #4 Count loop_3: mov #8d sIndex jsr r7 Next_A jsr r7 Next_D jsr r7 Next_C jsr r7 Next_B dec Count bne loop_3 rts r7 ; core logic to iterate A, B, C and D Next_A: mov B_H TMP_XH mov B_L TMP_XL mov C_H TMP_YH mov C_L TMP_YL mov D_H TMP_ZH mov D_L TMP_ZL jsr r7 @F_ptr add A_L TEMP_L adc TEMP_H add A_H TEMP_H jsr r7 FFGGHH mov TEMP_H A_H mov TEMP_L A_L rts r7 Next_D: mov A_H TMP_XH mov A_L TMP_XL mov B_H TMP_YH mov B_L TMP_YL mov C_H TMP_ZH mov C_L TMP_ZL jsr r7 @F_ptr add D_L TEMP_L adc TEMP_H add D_H TEMP_H jsr r7 FFGGHH mov TEMP_H D_H mov TEMP_L D_L rts r7 Next_C: mov D_H TMP_XH mov D_L TMP_XL mov A_H TMP_YH mov A_L TMP_YL mov B_H TMP_ZH mov B_L TMP_ZL jsr r7 @F_ptr add C_L TEMP_L adc TEMP_H add C_H TEMP_H jsr r7 FFGGHH mov TEMP_H C_H mov TEMP_L C_L rts r7 Next_B: mov C_H TMP_XH mov C_L TMP_XL mov D_H TMP_YH mov D_L TMP_YL mov A_H TMP_ZH mov A_L TMP_ZL jsr r7 @F_ptr add B_L TEMP_L adc TEMP_H add B_H TEMP_H jsr r7 FFGGHH mov TEMP_H B_H mov TEMP_L B_L rts r7 ; the F function F_md4: ; F(X,Y,Z) = (X & Y) | ((~X) & Z) ; where X & Y = ~(~X | ~Y) mov TMP_XH r0 mov TMP_XL r1 com r0 com r1 mov TMP_YH r2 mov TMP_YL r3 com r2 com r3 bis r0 r2 bis r1 r3 com r2 com r3 ; X & Y is now in r2,r3 mov TMP_XH r0 mov TMP_XL r1 mov TMP_ZH r4 mov TMP_ZL r5 bic r0 r4 bic r1 r5 ; ~X & Z is now in r4,r5 bis r2 r4 bis r3 r5 ; final result is now in r4,r5 mov r4 TEMP_H mov r5 TEMP_L rts r7 ; the G function G_md4: ; G(X,Y,Z) = (X & Y) | (X & Z) | (Y & Z) ; where X & Y = ~(~X | ~Y) mov TMP_XH r0 mov TMP_XL r1 com r0 com r1 mov TMP_YH r2 mov TMP_YL r3 com r2 com r3 bis r0 r2 bis r1 r3 com r2 com r3 ; X & Y is now in r2,r3 mov TMP_XH r0 mov TMP_XL r1 com r0 com r1 mov TMP_ZH r4 mov TMP_ZL r5 com r4 com r5 bis r0 r4 bis r1 r5 com r4 com r5 ; X & Z is now in r4,r5 bis r2 r4 bis r3 r5 ; X & Y | X & Z is now in r4,r5 mov TMP_YH r0 mov TMP_YL r1 com r0 com r1 mov TMP_ZH r2 mov TMP_ZL r3 com r2 com r3 bis r0 r2 bis r1 r3 com r2 com r3 ; Y & Z is now in r2,r3 bis r2 r4 bis r3 r5 ; final result is now in r4,r5 mov r4 TEMP_H mov r5 TEMP_L rts r7 ; the H function H_md4: ; H(X,Y,Z) = X ^ Y ^ Z mov TMP_XH r0 mov TMP_XL r1 mov TMP_YH r2 mov TMP_YL r3 xor r0 r2 xor r1 r3 ; X ^ Y is now in r2,r3 mov TMP_ZH r0 mov TMP_ZL r1 xor r0 r2 xor r1 r3 ; X ^ Y ^ Z is now in r2,r3 mov r2 TEMP_H mov r3 TEMP_L rts r7 ; core logic for the FF, GG and HH functions FFGGHH: ; get the Shift from the S array clr Shift mov #S_ARR r0 add sIndex r0 bisb (r0) Shift inc sIndex ; X is the current 16 word message block ; get X[xIndex] mov XR r0 add xIndex r0 clr r1 bisb (r0) r1 asl r1 asl r1 add X_ptr r1 inc xIndex ; Mwrd = X[xIndex] mov (r1) Mwrd_L inc r1 inc r1 mov (r1) Mwrd_H ; TEMP already stores the result form the F, G or H function ; add Mwrd add Mwrd_L TEMP_L adc TEMP_H add Mwrd_H TEMP_H ; add T add T_L TEMP_L adc TEMP_H add T_H TEMP_H mov TEMP_H r0 mov TEMP_L r1 mov TEMP_H r2 mov TEMP_L r3 ; do the left rotation ; RotL(tmp,s) = (tmp << s) | (tmp >>> 32-s) ; left shift mov Shift r4 ashc r4 r0 mov r0 TEMP_H mov r1 TEMP_L ; right shift - sign bit is replicated sub #32d r4 ashc r4 r2 ; clear unwanted replication mov Shift r4 mov #-1 r0 mov #-1 r1 ashc r4 r0 bic r0 r2 bic r1 r3 ; result of left rotation in TEMP add r3 TEMP_L adc TEMP_H add r2 TEMP_H rts r7 ; data section ; storage for the X, Y, Z values used ; by the F, G, H functions TMP_XL: .word TMP_XH: .word TMP_YL: .word TMP_YH: .word TMP_ZL: .word TMP_ZH: .word ; pointer to the F, G or H function F_ptr: .word ; temp storage for the various subroutines TEMP_L: .word TEMP_H: .word ; the current 64 bit message word Mwrd_L: .word Mwrd_H: .word ; the value of the left rotation Shift: .word ; the Message Digest - A B C D A_L: .hex 2301 A_H: .hex 6745 B_L: .hex ab89 B_H: .hex efcd C_L: .hex dcfe C_H: .hex 98ba D_L: .hex 5476 D_H: .hex 1032 AA_L: .word AA_H: .word BB_L: .word BB_H: .word CC_L: .word CC_H: .word DD_L: .word DD_H: .word ; the T value T_L: .word T_H: .word ; the T values for the GG and HH functions ggT_L: .hex 7999 ggT_H: .hex 5a82 hhT_L: .hex eba1 hhT_H: .hex 6ed9 ; the byte array of values for the left rotation S_ARR: .byte 3,7,11d,19d,3,5,9d,13d,3,9d,11d,15d ; the indexes for each round into the current ; 16 element array of words from the message XR_1: .byte 0,1,2,3,4,5,6,7 .byte 8d,9d,10d,11d,12d,13d,14d,15d XR_2: .byte 0,4,8d,12d,1,5,9d,13d .byte 2,6,10d,14d,3,7,11d,15d XR_3: .byte 0,8d,4,12d,2,10d,6,14d .byte 1,9d,5,13d,3,11d,7,15d ; pointer to XR_1, XR_2 or XR_3 XR: .word sIndex: .word xIndex: .word Count: .word ; the pointer to the 512 bit block within the ; message that is currently being processed X_ptr: .word BitLen: .word 208d nBytes: .word ; the input message INPUT: .ascii "abcdefghijklmnopqrstuvwxyz" PADDING: .blkw 37 .end main ```