Assembly Language Programming

August 3, 2011

Example PDP-11 Programs ( CRC16, TEA, MD4 )

This post contains example PDP-11 assembly language programs for the CRC-16, TEA and MD4 algorithms.

CRC-16

The program implements the Direct Table method to calculate the CRC-16 of the input message. First the function CALC_T is called to generate the table. Then the table is used to compute the CRC.

The code uses the IBM polynomial of 0x1005, an initial XOR value of 0xFFFF and a final XOR value of 0x0000.

The input message used in the code below is: abcdef123456 (with a length of 12). The message is stored in memory beginning from the location labelled as START, while the length is stored at M_LEN. The result is labelled as CRC, and can be seen in the watch view once the code has been compiled and run. For the input message given here, the CRC is 0xCEBA (or BA CE in little endian).

	; Calc the CRC-16 of a message:

	; generate the table

	JSR 	r5 CALC_T

	; init the CRC 

	mov 	INIT CRC

	; the start of the message

	mov 	#START r0

	; use r3 as the byte counter

	clr 	r3

  loop:

	; has the last byte been processed?

	cmp 	r3 M_LEN
	beq 	done

	; get the upper CRC byte

	mov 	CRC r1
	clrb 	r1
	swab 	r1

	; get the next message byte

	clr 	r2
	movb 	(r0)+ r2

	; calc the table index

	xor 	r2 r1

	; shift the upper CRC byte out (discard it)

	swab 	CRC
	clrb 	CRC

	; get the table element

	asl 	r1
	add 	#TABLE r1
	mov 	(r1) r2

	; update the CRC

	xor 	r2 CRC

	; increment r3

	inc 	r3

	br 	loop

  done:

	; xor in the FINAL value

	mov 	FINAL r2 

	xor 	r2 CRC

	halt

	; the subroutine to generate the table

  CALC_T:

	; r1 holds the index into the table
	; start at the end of the table

	mov 	#TABLE r1
	add 	#510d r1

	; put the Poly into r2

	mov 	POLY r2

	; outer loop

  outer:

	; set the index byte N as the upper byte in r3

	mov 	N r3
	swab 	r3

	; r4 is the bit counter

	mov 	#10 r4

	; inner loop

  inner:

	; shift out the msb of N into the C bit
	; xor in POLY if the C bit = 1

	asl 	r3
	bcc 	next
	xor 	r2 r3

  next:

	; decrement the bit counter

	sob 	r4 inner

	; add r3 (the result) to the table

	mov 	r3 (r1)

	; decrement the table pointer

	sub 	#2 r1

	; decrement the index byte N

	dec 	N

	; continue as long as N >= 0

	bpl 	outer

	; return

	RTS 	r5

	; the data

  N: .word 255d
  POLY: .word 100005
  INIT: .word 177777
  FINAL: .word 0
  TABLE: .blkw 256d

	; the message

  START: .ascii "abcdef123456"

  M_LEN: .word 12d

	; the CRC

  CRC: .word 0

  .end 0

TEA – Tiny Encryption Algorithm

The code below implements the Tiny Encryption Algorithm (the original version). TEA encrypts data in 64 bit blocks and has a 128 bit key. The code below uses the 64 bit hexadecimal number input message: 1234567890abcdef, with the key: 11223344556677889900aabbccddeeff. The 64 bit result of this encryption is: 8df5 d71f 9c3b e43f.

The program first encrypts the input and then reverses it to recover the original 64 bit input. The input message is stored at V_0 and V_1, the cipher text at CTX and the recovered plain text at PTX. Once the program has been compiled and run, the state of the CTX and PTX memory locations should be:

CTX_0H		0224		8df5
CTX_0L		0226		d71f
CTX_1H		0228		9c3b
CTX_1L		022a		e43f
PTX_0H		022c		1234
PTX_0L		022e		5678
PTX_1H		0230		90ab
PTX_1L		0232		cdef

The PDP-11 assembly language code for TEA:

	; V[0], V[1] is the 64 bit input vector

	mov 	V_0H Y_HIGH
	mov 	V_0L Y_LOW
	mov 	V_1H Z_HIGH
	mov 	V_1L Z_LOW

	; encode V[0], V[1]

	jsr 	r7 ENCODE

	; CTX is the cipher text

	mov 	Y_HIGH CTX_0H
	mov 	Y_LOW CTX_0L
	mov 	Z_HIGH CTX_1H
	mov 	Z_LOW CTX_1L

	; reverse the encryption

	jsr r7 DECODE

	; PTX is the recovered plain text

	; should be the same as V[0], V[1]

	mov 	Y_HIGH PTX_0H
	mov 	Y_LOW PTX_0L
	mov 	Z_HIGH PTX_1H
	mov 	Z_LOW PTX_1L

	halt

	; encode subroutine

  ENCODE:

	mov 	N r4

	clr 	SUM_L
	clr 	SUM_H

  loop:

	; sum += delta

	add 	DEL_L SUM_L
	adc 	SUM_H
	add 	DEL_H SUM_H

	; Y1 = ((z<<4) + key[0])

	mov 	Z_HIGH r2
	mov 	Z_LOW r3

	ashc 	#4 r2

	add 	KEY_0L r3
	adc 	r2
	add 	KEY_0H r2

	; Y2 = (z + sum)

	mov 	Z_HIGH r0
	mov 	Z_LOW r1

	add 	SUM_L r1
	adc 	r0
	add 	SUM_H r0

	; Y4 = Y1 xor Y2

	xor 	r0 r2
	xor 	r1 r3

	; Y3 = ((z>>5) + key[1])

	mov 	Z_HIGH r0
	mov 	Z_LOW r1

	; right shift = sign bit is replicated

	ashc 	#-5 r0

	; use BIC to clear unwanted replication

	bic 	#174000 r0

	add 	KEY_1L r1
	adc 	r0
	add 	KEY_1H r0

	; Y4 = Y4 xor Y3

	xor 	r0 r2
	xor 	r1 r3

	; Y += Y4

	add 	r3 Y_LOW
	adc 	Y_HIGH
	add 	r2 Y_HIGH

	; Z1 = ((y<<4) + key[2])

	mov 	Y_HIGH r2
	mov 	Y_LOW r3

	ashc 	#4 r2

	add 	KEY_2L r3
	adc 	r2
	add 	KEY_2H r2

	; Z2 = (y + sum)

	mov 	Y_HIGH r0
	mov 	Y_LOW r1

	add 	SUM_L r1
	adc 	r0
	add 	SUM_H r0

	; Z4 = Z1 xor Z2

	xor 	r0 r2
	xor 	r1 r3

	; Z3 = ((y>>5) + key[3])

	mov 	Y_HIGH r0
	mov 	Y_LOW r1

	; right shift = sign bit is replicated

	ashc 	#-5 r0

	; use BIC to clear unwanted replication

	bic 	#174000 r0

	add 	KEY_3L r1
	adc 	r0
	add 	KEY_3H r0

	; Z4 = Z4 xor Z3

	xor 	r0 r2
	xor 	r1 r3

	; Z += Z4

	add 	r3 Z_LOW
	adc 	Z_HIGH
	add 	r2 Z_HIGH

	dec 	r4 
	bne 	loop

	rts 	r7

	; decode subroutine

  DECODE:

	mov 	N r4

	; sum = delta << 5

	mov 	DEL_H r0
	mov 	DEL_L r1

	ashc 	#5 r0

	mov 	r0 SUM_H
	mov 	r1 SUM_L

  d_loop:

	; Z1 = ((y<<4) + key[2])

	mov 	Y_HIGH r2
	mov 	Y_LOW r3

	ashc 	#4 r2

	add 	KEY_2L r3
	adc 	r2
	add 	KEY_2H r2

	; Z2 = (y + sum)

	mov 	Y_HIGH r0
	mov 	Y_LOW r1

	add 	SUM_L r1
	adc 	r0
	add 	SUM_H r0

	; Z4 = Z1 xor Z2

	xor 	r0 r2
	xor 	r1 r3

	; Z3 = ((y>>5) + key[3])

	mov 	Y_HIGH r0
	mov 	Y_LOW r1

	; right shift = sign bit is replicated

	ashc 	#-5 r0

	; use BIC to clear unwanted replication

	bic 	#174000 r0

	add 	KEY_3L r1
	adc 	r0
	add 	KEY_3H r0

	; Z4 = Z4 xor Z3

	xor 	r0 r2
	xor 	r1 r3

	; Z -= Z4

	sub 	r3 Z_LOW
	sbc 	Z_HIGH
	sub 	r2 Z_HIGH

	; Y1 = ((z<<4) + key[0])

	mov 	Z_HIGH r2
	mov 	Z_LOW r3

	ashc 	#4 r2

	add 	KEY_0L r3
	adc 	r2
	add 	KEY_0H r2

	; Y2 = (z + sum)

	mov 	Z_HIGH r0
	mov 	Z_LOW r1

	add 	SUM_L r1
	adc 	r0
	add 	SUM_H r0

	; Y4 = Y1 xor Y2

	xor 	r0 r2
	xor 	r1 r3

	; Y3 = ((z>>5) + key[1])

	mov 	Z_HIGH r0
	mov 	Z_LOW r1

	; right shift = sign bit is replicated

	ashc 	#-5 r0

	; use BIC to clear unwanted replication

	bic 	#174000 r0

	add 	KEY_1L r1
	adc 	r0
	add 	KEY_1H r0

	; Y4 = Y4 xor Y3

	xor 	r0 r2
	xor 	r1 r3

	; Y -= Y4

	sub 	r3 Y_LOW
	sbc 	Y_HIGH
	sub 	r2 Y_HIGH

	; sum -= delta

	sub 	DEL_L SUM_L
	sbc 	SUM_H
	sub 	DEL_H SUM_H

	dec 	r4 
	bne 	d_loop

	rts 	r7

	; data

  Y_HIGH: .word
  Y_LOW: .word

  Z_HIGH: .word
  Z_LOW: .word

  SUM_H: .word
  SUM_L: .word

  DEL_H: .hex 9e37
  DEL_L: .hex 79b9

  KEY_0H: .hex 1122
  KEY_0L: .hex 3344
  KEY_1H: .hex 5566
  KEY_1L: .hex 7788
  KEY_2H: .hex 9900
  KEY_2L: .hex aabb
  KEY_3H: .hex ccdd
  KEY_3L: .hex eeff

  N: .word 32d

  V_0H: .hex 1234
  V_0L: .hex 5678
  V_1H: .hex 90ab
  V_1L: .hex cdef

  CTX_0H: .word
  CTX_0L: .word
  CTX_1H: .word
  CTX_1L: .word

  PTX_0H: .word
  PTX_0L: .word
  PTX_1H: .word
  PTX_1L: .word

  .end 0

MD4 Message Digest

The MD4 Message Digest Algorithm is specified in RFC 1320:

http://www.faqs.org/rfcs/rfc1320.html

The code below uses the test input string of: abcdefghijklmnopqrstuvwxyz with a bit length of 208. The resulting message digest is stored in the memory locations labelled A_L to D_H. The result is best viewed in little endian byte order (as it is given in RFC 1320). This can be done by entering A_L into the input text box next to the Jump button, and then pressing [Jump]. The top line of the display should be:

d7 9e 1c 30 8a a5 bb cd ee a8 ed 63 df 41 2d a9

Note the use of the function pointer F_ptr in the code below. F_ptr is used to select either the F(x,y,z), G(x,y,z) or H(x,y,z) functions. This, together with a couple of other tricks, allows the the same code to be called for each of the 3 rounds of MD4.

  main:

	; calc number of bytes in message = BitLen/8

	mov 	BitLen r0
	asr 	r0
	asr 	r0
	asr 	r0
	mov 	r0 nBytes

	; calc r0 mod 64

	mov 	r0 r1
	ash 	#-5 r1
	ash 	#5 r1

	; test the quotient Q

	tst 	r1
	beq 	PAD

	; subtract Q from r0

	sub 	r1 r0

  PAD:

	; handle the case where r0 = 64

	cmp 	#64d r0
	bne 	PAD_1

	; 64 mod 64 = 0

	clr 	r0

  PAD_1:

	; r0 is now r0 mod 64

	; calc padded length
	; nBytes = nBytes + 64 - (nBytes mod 64)
 
	add 	#64d nBytes
	sub 	r0 nBytes

	; is there enough room to insert the message length (in bits)?

	cmp 	#55d r0
	bpl 	PAD_2 

	; add extra padding

	add 	#64d nBytes

  PAD_2:

	; calc the pos of the first padding bit

	mov 	BitLen r0
	asr 	r0
	asr 	r0
	asr 	r0
	add 	#INPUT r0

	; set the first padding bit

	bisb 	#200 (r0)

	; append the message bit length

	mov 	nBytes r0
	add 	#INPUT r0
	sub 	#8d r0

	; insert bit length (in little endian)

	mov 	BitLen r1
	mov 	r1 (r0)

	; process the input buffer
	; the current pos within the buffer

	mov 	#INPUT X_ptr

  NEXT:

	; save the current state ABCD

	mov 	A_H AA_H
	mov 	A_L AA_L
	mov 	B_H BB_H
	mov 	B_L BB_L
	mov 	C_H CC_H
	mov 	C_L CC_L
	mov 	D_H DD_H
	mov 	D_L DD_L

	; the 3 rounds of MD4

	jsr 	r7 RND_1
	jsr 	r7 RND_2
	jsr 	r7 RND_3

	; update the state ABCD

	add 	AA_L A_L
	adc 	A_H
	add 	AA_H A_H
	add 	BB_L B_L
	adc 	B_H
	add 	BB_H B_H
	add 	CC_L C_L
	adc 	C_H
	add 	CC_H C_H
	add 	DD_L D_L
	adc 	D_H
	add 	DD_H D_H

	; get the next 64 byte block

	add 	#64d X_ptr

	; has the entire buffer been processed?

	mov 	#INPUT r0
	add 	nBytes r0

	cmp 	r0 X_ptr
	bne 	NEXT

	halt

	; Round 1 of MD4

  RND_1:

	mov 	#F_md4 F_ptr
	mov 	#XR_1 XR

	clr 	T_H
	clr 	T_L

	clr 	xIndex

	mov 	#4 Count

  loop_1:

	clr 	sIndex

	jsr 	r7 Next_A
	jsr 	r7 Next_D
	jsr 	r7 Next_C
	jsr 	r7 Next_B

	dec 	Count
	bne 	loop_1

	rts 	r7

	; Round 2 of MD4

  RND_2:

	mov 	#G_md4 F_ptr
	mov 	#XR_2 XR

	mov 	ggT_H T_H
	mov 	ggT_L T_L

	clr 	xIndex

	mov 	#4 Count

  loop_2:

	mov 	#4 sIndex

	jsr 	r7 Next_A
	jsr 	r7 Next_D
	jsr 	r7 Next_C
	jsr 	r7 Next_B

	dec 	Count
	bne 	loop_2

	rts 	r7

  RND_3:

	; Round 3 of MD4

	mov 	#H_md4 F_ptr
	mov 	#XR_3 XR

	mov 	hhT_H T_H
	mov 	hhT_L T_L

	clr 	xIndex

	mov 	#4 Count

  loop_3:

	mov 	#8d sIndex

	jsr 	r7 Next_A
	jsr 	r7 Next_D
	jsr 	r7 Next_C
	jsr 	r7 Next_B

	dec 	Count
	bne 	loop_3

	rts 	r7

	; core logic to iterate A, B, C and D

  Next_A:

	mov 	B_H TMP_XH
	mov 	B_L TMP_XL
	mov 	C_H TMP_YH
	mov 	C_L TMP_YL
	mov 	D_H TMP_ZH
	mov 	D_L TMP_ZL

	jsr 	r7 @F_ptr

	add 	A_L TEMP_L
	adc 	TEMP_H
	add 	A_H TEMP_H

	jsr 	r7 FFGGHH

	mov 	TEMP_H A_H
	mov 	TEMP_L A_L

	rts r7

  Next_D:

	mov 	A_H TMP_XH
	mov 	A_L TMP_XL
	mov 	B_H TMP_YH
	mov 	B_L TMP_YL
	mov 	C_H TMP_ZH
	mov 	C_L TMP_ZL

	jsr 	r7 @F_ptr

	add 	D_L TEMP_L
	adc 	TEMP_H
	add 	D_H TEMP_H

	jsr 	r7 FFGGHH

	mov 	TEMP_H D_H
	mov 	TEMP_L D_L

	rts 	r7

  Next_C:

	mov 	D_H TMP_XH
	mov 	D_L TMP_XL
	mov 	A_H TMP_YH
	mov 	A_L TMP_YL
	mov 	B_H TMP_ZH
	mov 	B_L TMP_ZL

	jsr 	r7 @F_ptr

	add 	C_L TEMP_L
	adc 	TEMP_H
	add 	C_H TEMP_H

	jsr 	r7 FFGGHH

	mov 	TEMP_H C_H
	mov 	TEMP_L C_L

	rts 	r7

  Next_B:

	mov 	C_H TMP_XH
	mov 	C_L TMP_XL
	mov 	D_H TMP_YH
	mov 	D_L TMP_YL
	mov 	A_H TMP_ZH
	mov 	A_L TMP_ZL

	jsr 	r7 @F_ptr

	add 	B_L TEMP_L
	adc 	TEMP_H
	add 	B_H TEMP_H

	jsr 	r7 FFGGHH

	mov 	TEMP_H B_H
	mov 	TEMP_L B_L

	rts 	r7

	; the F function

  F_md4:

	; F(X,Y,Z) = (X & Y) | ((~X) & Z)

	; where X & Y = ~(~X | ~Y)

	mov 	TMP_XH r0
	mov 	TMP_XL r1

	com 	r0
	com 	r1

	mov 	TMP_YH r2
	mov 	TMP_YL r3

	com 	r2
	com 	r3

	bis 	r0 r2
	bis 	r1 r3

	com 	r2
	com 	r3

	; X & Y is now in r2,r3

	mov 	TMP_XH r0
	mov 	TMP_XL r1

	mov 	TMP_ZH r4
	mov 	TMP_ZL r5

	bic 	r0 r4
	bic 	r1 r5

	; ~X & Z is now in r4,r5

	bis 	r2 r4
	bis 	r3 r5

	; final result is now in r4,r5

	mov 	r4 TEMP_H
	mov 	r5 TEMP_L

	rts 	r7

	; the G function

  G_md4:

	; G(X,Y,Z) = (X & Y) | (X & Z) | (Y & Z)

	; where X & Y = ~(~X | ~Y)

	mov 	TMP_XH r0
	mov 	TMP_XL r1

	com 	r0
	com 	r1

	mov 	TMP_YH r2
	mov 	TMP_YL r3

	com 	r2
	com 	r3

	bis 	r0 r2
	bis 	r1 r3

	com 	r2
	com 	r3

	; X & Y is now in r2,r3

	mov 	TMP_XH r0
	mov 	TMP_XL r1

	com 	r0
	com 	r1

	mov 	TMP_ZH r4
	mov 	TMP_ZL r5

	com 	r4
	com 	r5

	bis 	r0 r4
	bis 	r1 r5

	com 	r4
	com 	r5

	; X & Z is now in r4,r5

	bis 	r2 r4
	bis 	r3 r5

	; X & Y | X & Z is now in r4,r5

	mov 	TMP_YH r0
	mov 	TMP_YL r1

	com 	r0
	com 	r1

	mov 	TMP_ZH r2
	mov 	TMP_ZL r3

	com 	r2
	com 	r3

	bis 	r0 r2
	bis 	r1 r3

	com 	r2
	com 	r3

	; Y & Z is now in r2,r3

	bis 	r2 r4
	bis 	r3 r5

	; final result is now in r4,r5

	mov 	r4 TEMP_H
	mov 	r5 TEMP_L

	rts 	r7

	; the H function

  H_md4:

	; H(X,Y,Z) = X ^ Y ^ Z

	mov 	TMP_XH r0
	mov 	TMP_XL r1
	mov 	TMP_YH r2
	mov 	TMP_YL r3

	xor 	r0 r2
	xor 	r1 r3

	; X ^ Y is now in r2,r3

	mov 	TMP_ZH r0
	mov 	TMP_ZL r1

	xor 	r0 r2
	xor 	r1 r3

	; X ^ Y ^ Z is now in r2,r3

	mov 	r2 TEMP_H
	mov 	r3 TEMP_L

	rts 	r7

	; core logic for the FF, GG and HH functions

  FFGGHH:

	; get the Shift from the S array

	clr 	Shift
	mov 	#S_ARR r0
	add 	sIndex r0
	bisb 	(r0) Shift

	inc 	sIndex

	; X is the current 16 word message block

	; get X[xIndex]

	mov 	XR r0
	add 	xIndex r0
	clr 	r1
	bisb 	(r0) r1
	asl 	r1
	asl 	r1	
	add 	X_ptr r1

	inc 	xIndex

	; Mwrd = X[xIndex]

	mov 	(r1) Mwrd_L
	inc 	r1
	inc 	r1
	mov 	(r1) Mwrd_H

	; TEMP already stores the result form the F, G or H function

	; add Mwrd

	add 	Mwrd_L TEMP_L
	adc 	TEMP_H
	add 	Mwrd_H TEMP_H

	; add T

	add 	T_L TEMP_L
	adc 	TEMP_H
	add 	T_H TEMP_H

	mov 	TEMP_H r0
	mov 	TEMP_L r1

	mov 	TEMP_H r2
	mov 	TEMP_L r3

	; do the left rotation

	; RotL(tmp,s) = (tmp << s) | (tmp >>> 32-s)

	; left shift

	mov 	Shift r4
	ashc 	r4 r0

	mov 	r0 TEMP_H
	mov 	r1 TEMP_L

	; right shift - sign bit is replicated

	sub 	#32d r4
	ashc 	r4 r2

	; clear unwanted replication

	mov 	Shift r4
	mov 	#-1 r0
	mov 	#-1 r1
	ashc 	r4 r0
	bic 	r0 r2
	bic 	r1 r3

	; result of left rotation in TEMP

	add 	r3 TEMP_L
	adc 	TEMP_H
	add 	r2 TEMP_H

	rts 	r7

	; data section

	; storage for the X, Y, Z values used 
	; by the F, G, H functions

  TMP_XL: .word
  TMP_XH: .word
  TMP_YL: .word
  TMP_YH: .word
  TMP_ZL: .word
  TMP_ZH: .word

	; pointer to the F, G or H function

  F_ptr: .word

	; temp storage for the various subroutines

  TEMP_L: .word
  TEMP_H: .word

	; the current 64 bit message word

  Mwrd_L: .word
  Mwrd_H: .word

	; the value of the left rotation

  Shift: .word

	; the Message Digest - A B C D

  A_L: .hex 2301
  A_H: .hex 6745
  B_L: .hex ab89
  B_H: .hex efcd
  C_L: .hex dcfe
  C_H: .hex 98ba
  D_L: .hex 5476
  D_H: .hex 1032

  AA_L: .word
  AA_H: .word
  BB_L: .word
  BB_H: .word
  CC_L: .word
  CC_H: .word
  DD_L: .word
  DD_H: .word

	; the T value

  T_L: .word
  T_H: .word

	; the T values for the GG and HH functions

  ggT_L: .hex 7999
  ggT_H: .hex 5a82
  hhT_L: .hex eba1
  hhT_H: .hex 6ed9

	; the byte array of values for the left rotation

  S_ARR: .byte 3,7,11d,19d,3,5,9d,13d,3,9d,11d,15d

	; the indexes for each round into the current 
	; 16 element array of words from the message 

  XR_1: .byte 0,1,2,3,4,5,6,7
        .byte 8d,9d,10d,11d,12d,13d,14d,15d

  XR_2: .byte 0,4,8d,12d,1,5,9d,13d
        .byte 2,6,10d,14d,3,7,11d,15d

  XR_3: .byte 0,8d,4,12d,2,10d,6,14d
        .byte 1,9d,5,13d,3,11d,7,15d

	; pointer to XR_1, XR_2 or XR_3

  XR: .word

  sIndex: .word

  xIndex: .word

  Count: .word

	; the pointer to the 512 bit block within the
	; message that is currently being processed

  X_ptr: .word

  BitLen: .word 208d

  nBytes: .word

	; the input message

  INPUT: .ascii "abcdefghijklmnopqrstuvwxyz"

  PADDING: .blkw 37

  .end main
Advertisements

Blog at WordPress.com.

%d bloggers like this: