Assembly Language Programming

November 6, 2013

FASM – 6502 Assembly Language Simulator – Part 1

This post describes a simulator for the 6502 assembly language that I have written in the x86 assembly language using the Flat Assembler (FASM).

x86 Code

The actual x86 code for the simulator is given in Part 2.

References

This simulator is based partly on the LISA assembler for the Apple II computer as described in the book:

Using 6502 Assembly Language – Randy Hyde

A manual for the LISA assembler can be found here:

LISA Assembler Version 2.5 Manual Online

A detailed reference for the 6502 instruction set can be found here:

homepage.ntlworld.com/cyborgsystems/CS_Main/6502/6502.htm

My simulator in no way supports every feature of the LISA assembler as described in the above sources. The sections below describe what elements of the 6502 assembly language are supported by my simulator.

Memory

Total memory available is 65536 bytes, with absolute memory addresses being 16 bits in length. The memory is organised into 256 pages of 256 bytes each.

Registers

Accumulator An 8-bit register. Most operations are carried out through the accumulator.
Index Register X An 8-bit register. Used for counters and offsets.
Index Register Y An 8-bit register. Used for counters and offsets.
Stack Pointer An 8-bit register pointing to the stack. Incremented when a byte is pulled. Decremented when a byte is pushed.
Program Counter A 16-bit register pointing to the next instruction to be executed.
Processor Status An 8-bit register holding the processor status word (PSW).

Status Flags

Includes the carry flag (C), the zero flag (Z), the overflow flag (V) and the negative flag (N). These are bits within the PSW (processor status word).

Operand Expressions

An operand expression can contain one or two arguments. If there are two arguments they must be separated by a plus or minus sign. An argument can be a number, symbol or char constant as described in the table:

Hex Number If a dollar sign $ is detected, the number following it will be read as a hex number.
Decimal Number If an exclamation ! is detected, the number following it will be read as a decimal number. A decimal number does not need to be preceded by an exclamation. If a number is detected (without a $, % or !) the compiler will try to read it as a decimal number.
Binary Number If a percent sign % is detected, the number following it will be read as a binary number.
Symbol If a symbol is detected, the compiler will look it up in the symbol table to get it’s value.
* If an asterix is detected, the compiler will substitute the current value of the location counter.
Character Constant If a single character enclosed in quotes is detected, the compiler will substitute it’s ASCII value.
Some examples:

  lda $1A2B
  adc $F6
  adc %10110110
  adc !56
  adc 56
  lda SUM
  cmp "A"
  lda $C5+%1101
  sbc 123+$EE
  and A-B
  ora $12+XYZ
  lda #A+B
  lda #$FD+ASDF
  lda /SUM+%11010011
  bne *+10
  lda #"T"+"5"
  eor $123+99,x
  lda (%1101-$AC,x)
  sta (1234+ZXC),y

Symbols

A symbol can contain up to 14 characters, and is terminated by a white space character or CRLF. Symbols can contain letters, numbers or an underscore. The compiler stores any symbols it finds in the symbol table which starts at page 252 in memory. 16 bytes are allocated for each symbol in this table. The first 14 bytes are for the symbol, while the last 2 bytes are for the 16 bit value (address) of the symbol. Symbols are case insensitive (all lower case chars are converted to upper case by the compiler).

A symbol must always begin with a letter. The compiler detects a symbol by looking in the first column of a line for this first letter. If a line does not contain a symbol then the first column must be left blank.

Pseudo-Ops

EQU Label EQU operand Assign a 2 byte value to a symbol. The operand can only contain a decimal, binary or hex number.
EPZ Label EPZ operand Assign a 1 byte value to a symbol. The operand can only contain a decimal, binary or hex number.
DFS Label DFS expression Define storage. The expression gives the number of bytes to reserve in memory.
ORG Label ORG expression Set the value of the location counter, as given by the expression. The label is optional.
HEX Label HEX operand Place up to 4 bytes into memory, starting from the current position as given by the location counter. The operand can only contain hex digits (up to 8). No $ is required. The label is optional.
ADR Label ADR expression Place the 2 bytes as given by the expression into memory, first the low byte and then the high byte. The label is optional.
BYT Label BYT expression Place the low byte as given by the expression into memory. The label is optional.
HBY Label HBY expression Place the high byte as given by the expression into memory. The label is optional.
DBY Label DBY expression Place the 2 bytes as given by the expression into memory, first the high byte and then the low byte. The label is optional.
Some examples:

A equ $1234
B equ !68
C equ 70
D equ %11001101

XYZ epz $A5
ASDF epz 10
ZXC epz %1011

ARR dfs 100
KEY dfs 32,0
 
  org $800
  org 2000
  org %1111

  hex 1A2B3C4D
  hex ABCDEF
  hex 48

  adr ARR+7
  adr $3456

  byt $12
  byt A

  hby 4567
  hby %1110001110011

  dby 64532
  dby $9876
  dby D+100

Addressing Modes

Accumulator No operand is required. The instruction acts upon the accumulator.
Implied No operand is required. The register acted upon is implied by the instruction.
Relative Used only by the branch instructions. The operand is an address that must be within -128 to +127 bytes of the address of the instruction that follows the branch instruction.
Immediate The operand is an immediate value, preceded by a # or /. If # the lower byte of the operand is used. If / the upper byte of the operand is used.
Zero Page The operand is a 1 byte address value that points only to a zero page location.
Absolute The operand is a 2 byte address value that can point to any location in memory.
zp,x Zero page indexed with x: zp is a zero page base address. The target address is obtained by adding the contents of the x index register. If Base + Index is greater than $FF, wrapping occurs.
zp,y Zero page indexed with y: zp is a zero page base address. The target address is obtained by adding the contents of the y index register. If Base + Index is greater than $FF, wrapping occurs.
a,x Absolute address indexed with x: a is an absolute base address. The target address is obtained by adding the contents of the x index register. If Base + Index is greater than $FFFF, wrapping occurs.
a,y Absolute address indexed with y: a is an absolute base address. The target address is obtained by adding the contents of the y index register. If Base + Index is greater than $FFFF, wrapping occurs.
(zp,x) Indexed indirect addressing: a zero page address is calculated by adding the contents of the x index register to zp. If the result is greater than $FF, wrapping occurs. Then an absolute 2 byte target address is read from the location given by the zero page address.
(zp),y Indirect indexed addressing: a base address is read from the zero page location given by zp. Then the contents of the index register y is added to the base to give the final target address. If Base + Index is greater than $FFFF, wrapping occurs.
(a) Indirect addressing: the target address is read from the absolute memory location a. Used only by the JMP instruction.

Addressing Mode Examples

Accumulator:

  asl
  ror
  lsr
  ror

Implied:

  nop
  clc
  inx
  pha
  tax
  tsx
  plp

Relative:

  bne *+40
  beq EQ

Immediate:

  lda #12
  lda /$4567
  adc #$FE

Zero Page:

  sta 10
  sbc 40
  lsr 77
  eor A

Absolute:

  lda $3456
  ora $7890
  jmp $5678
  jsr CALC

Indirect:

  jmp ($3456)
  jmp (P_ADD)

Demonstration of the zp,x mode, showing how zp+x wraps back around when it becomes greater than $FF.

; ================================================

  ldx #0

LOOP

  txa
  sta $80,x
  inx
  cpx #0
  bne LOOP

; ================================================

Demonstration of the a,y mode: write the value of y to the absolute base address of $300 offset by y.

; ================================================

  ldy #0

LOOP

  tya
  sta $300,y
  iny
  cpy #0
  bne LOOP

; ================================================

Demonstration of the (zp,x) mode. First a table of 2 byte absolute addresses is created on the zero page (LOOP_1). Loop two then uses the sta instruction in the (zp,x) mode to write a table of values to the target addresses.

; ================================================

  ldx #0

LOOP_1

  ; create a table of 2 byte target addresses 
  ; (= $300 + x) on the zero page

  ; the first byte of each address is x
  ; the second byte is 3

  txa
  lsr
  sta 0,x
  inx
  lda #3
  sta 0,x
  inx
  cpx #0
  bne LOOP_1

LOOP_2

  ; 0+x is the zero page address that stores
  ; the 2 byte target address

  txa
  sta (0,x)
  inx
  inx
  cpx #0
  bne LOOP_2  
  
; ================================================

Demonstration of the (zp),y mode. This code creates exactly the same table of data at absolute address $300. The difference is that only one address needs to be stored on the zero page. The statement sta (0),y reads this address from location 0 and then adds y to it to get the target address.

; ================================================

  ; store the address $300 at zero page location 0

  lda #0
  sta 0
  lda #3
  sta 1

  ldy #0

LOOP

  ; store the value 2*y at $300+y

  tya
  asl
  sta (0),y
  iny
  cpy #$80
  bne LOOP  

; ================================================

Add & Subtract

ADC is add with carry, so before using ADC the carry bit should be cleared (CLC).

Before using the SBC instruction the carry bit should be set using the SEC instruction. Setting the carry to 1 is the same as setting the borrow bit to 0, and vice versa. If after the execution of a SBC instruction the carry is 0 (borrow = 1), it means that a bit needed to be borrowed to successfully complete the subtraction. If no borrow was required then the carry bit will be 1.

See the Big Number Subtraction example program below.

  ADC SBC
Immediate 69 E9
Zero Page 65 E5
Absolute 6D ED
zp,x 75 F5
a,x 7D FD
a,y 79 F9
(zp,x) 61 E1
(zp),y 71 F1
Flags CVZN CVZN

Logical

  AND ORA EOR BIT
Immediate 29 09 49
Zero Page 25 05 45 24
Absolute 2D 0D 4D 2C
zp,x 35 15 55
a,x 3D 1D 5D
a,y 39 19 59
(zp,x) 21 01 41
(zp),y 31 11 51
Flags ZN ZN ZN VZN

Note that XOR can be used in place of EOR.

Shift & Rotate

These are the only type of instructions that use the Accumulator mode.

  ASL LSR ROL ROR
Accumulator 0A 4A 2A 6A
Zero Page 06 46 26 66
Absolute 0E 4E 2E 6E
zp,x 16 56 36 76
a,x 1E 5E 3E 7E
Flags CZN CZN CZN CZN

Load & Store

  LDA STA LDX STX LDY STY
Immediate A9 A2 A0
Zero Page A5 85 A6 86 A4 84
Absolute AD 8D AE 8E AC 8C
zp,x B5 95 B4 94
zp,y B6 96
a,x BD 9D BC
a,y B9 99 BE
(zp,x) A1 81
(zp),y B1 91
Flags ZN ZN ZN

Increment & Decrement

  INC DEC
Zero Page E6 C6
Absolute EE CE
zp,x F6 D6
a,x FE DE
Flags ZN ZN

Compare

The CMP instruction compares a byte in memory M with the accumulator ACC.

– If ACC – M >= 0 the C flag is set.

– If ACC – M = 0 the Z flag is set.

– If ACC – M < 0 the N flag is set.

The CPX and CPY instructions work in the same way, but for the X and Y registers.

  CMP CPX CPY
Immediate C9 E0 C0
Zero Page C5 E4 C4
Absolute CD EC CC
zp,x D5
a,x DD
a,y D9
(zp,x) C1
(zp),y D1
Flags CZN CZN CZN

Branch

The branches are the only type of instruction to use relative mode. The operand is an address in memory, which is used to calculate a single byte offset that is relative to the address of the instruction after the branch instruction. The branch instructions all assemble to 2 bytes – the opcode byte plus the offset byte. Therefore the branch destination must be within -126 to +129 bytes of the address of the branch instruction.

  Opcode Description
BPL 10 Branch if N = 0
BMI 30 Branch if N = 1
BVC 50 Branch if V = 0
BVS 70 Branch if V = 1
BCC 90 Branch if C = 0
BCS B0 Branch if C = 1
BNE D0 Branch if Z = 0
BEQ F0 Branch if Z = 1

Note that BGE can be used in place of BCS, BLT in place of BCC, BTR in place of BEQ, and BFL in place of BNE.

Implied Mode Instructions

  Opcode Flags Description
NOP EA No operation
SEC 38 C Set C
CLC 18 C Clear C
CLV B8 V Clear V
INX E8 ZN Increment X
DEX CA ZN Decrement X
INY C8 ZN Increment Y
DEY 88 ZN Decrement Y
PHA 48 Push ACC to the stack
PLA 68 ZN Pull ACC from the stack
TAX AA ZN Copy ACC to X
TXA 8A ZN Copy X to ACC
TAY A8 ZN Copy ACC to Y
TYA 98 ZN Copy Y to ACC
TSX BA ZN Copy SP to X
TXS 9A Copy X to SP
PHP 08 Push PSW to the stack
PLP 28 CZVN Pull PSW from the stack

Jump Instructions

  JMP JSR RTS
Absolute 4C 20
Indirect 6C
Implied 60

BRK

The BRK instruction is assembled as a single zero byte. When the simulator encounters a zero byte, the program counter is advanced by one and then execution stops. Thus the brk instruction can be used to stop a program altogether, or as a breakpoint where execution of a program can be resumed by pressing the [Run Code] or [Step] buttons again.

The Simulator

The simulator has two main dialogs, the Edit dialog and the Run dialog.

The Edit Dialog

6502-Edit-Dialog

The 6502 code is typed into the text input area on the left. Press the [Compile Assembly] button to assemble the code. The output will then be displayed in the output text area on the right. The code is compiled in 2 passes. On the first pass all of the symbols are resolved and added to the symbol table. On the second pass the instructions are assembled. Apart from character constants everything is case insensitive. The compiler simply converts lower case to upper case.

The code is assembled starting from location $200 in the main memory of the 6502. By default execution starts from this address.

To view the symbol table press the [View Symbols] button. To return to the listing of the assembled code press [View Output].

Press the [Clear Assembly] button to clear both text areas.

To run and debug the code press the [Execute Code] button. This will open up the Run dialog.

The Run Dialog

6502-Run-Dialog

A listing of the assembled code is displayed on the top left. The last instruction executed is shown on the middle left. The state of the registers is shown on the bottom left. The top right displays the bytes in memory two pages at a time, and also displays the watch. The bottom right is the ASCII string input area.

To run the code, press the [Run Code] button.

To execute the code one instruction at a time, press the [Step] button.

To reset the program counter to it’s default value of $200, press the [Reset PC] button.

To reset the 6502 machine and clear the entire memory contents, press the [Reset Machine] button.

To view the watch, press the [View Watch] button. To stop viewing the watch and display the pages in memory, just select a page from the combo box.

To return to the Edit dialog, press [Edit Code].

The ASCII Input Area

A text string can be entered directly into memory via the ASCII input area on the bottom right of the Run dialog. Simply enter some text and press the [Load ASCII] button. Note that CRLFs will be filtered out.

Four pages in memory are set aside for this purpose (pages 248-251) starting from memory location $F800. After the [Load ASCII] button has been pressed, any ascii data stored within these 4 pages of memory will be displayed in the ASCII input area.

ASCII data can also be entered into the area of memory starting at $F800 via a program, where it will be displayed once the [Run Code] button has been pressed. The code here prints a Hello World message.

; ================================================

 ldy #0

LOOP

 lda MESSAGE,y
 cmp #0
 beq DONE
 sta $f800,y
 iny
 jmp LOOP

DONE

 jmp STOP

MESSAGE 

 byt 'H','E','L','L','O',' '
 byt 'W','O','R','L','D'

STOP
 
 brk

; ================================================

A string containing a hex number can be entered into memory via the ASCII input area and then converted to a numerical value stored elsewhere by using a program such as this one. Simply enter a hex number and press [Load ASCII]. Then press [Run Code] to run the program. The ascii characters stored at $F800 are converted to a number and stored at location $300.

; ================================================

  ; convert the string of ASCII hex digits at ASC
  ; to a hexadecimal value at HXD

  ; HXD = big endian byte order

; ================================================

Y_SRC epz 0
Y_DEST epz 1

ASC epz 2
HXD epz 4

; ================================================

  lda #0
  sta Y_SRC
  sta Y_DEST

  sta ASC
  sta HXD

  lda #$F8
  sta ASC+1

  lda #$03
  sta HXD+1

; ================================================

D1

  ; digit 1

  ldy Y_SRC
  lda (ASC),y
  cmp #0
  beq DONE
  jsr CHAR_TO_HEX
  cmp #$FF
  beq NEXT

  asl
  asl
  asl
  asl

  ldy Y_DEST
  sta (HXD),y

D2

  ; digit 2

  inc Y_SRC
  ldy Y_SRC
  cpy #0
  beq DONE
  lda (ASC),y
  cmp #0
  beq DONE
  jsr CHAR_TO_HEX
  cmp #$FF
  beq D2
  ldy Y_DEST
  ora (HXD),y
  sta (HXD),y
  inc Y_DEST

NEXT

  inc Y_SRC
  ldy Y_SRC
  cpy #0
  beq DONE
  jmp D1 

DONE

  jmp THE_END

; ================================================

CHAR_TO_HEX

  ; convert an ASCII char to a hex value

  ; 0-9 and A-F

  cmp #'0'
  bmi NOT_HEX
  cmp #'9'
  bmi ADD_09
  beq ADD_09
  cmp #'A'
  bmi NOT_HEX
  cmp #'F'
  bmi ADD_AF
  beq ADD_AF
  jmp NOT_HEX

ADD_09

  sec
  sbc #'0'
  jmp RET

ADD_AF

  sec
  sbc #'A'
  clc
  adc #10		
  jmp RET  

NOT_HEX

 lda #$FF

RET

  rts 

; ================================================

THE_END

; ================================================

The Watch

Add an item to the watch by pressing the [Add To Watch] button. A dialog box appears into which is entered either a symbol or hexadecimal address (prefixed by a $) and the number of bytes (up to 256) to include in the watch. So to watch how execution of a program affects the 100 bytes starting from location $600, just enter $600 as the address to watch and 100 as the number of bytes to include in the watch. Press the [View Watch] button to display the addresses currently being watched. Up to 12 addresses can be watched at any one time.

Memory Map

The memory map for the simulator. The memory is organised into 256 pages of 256 bytes each, giving a total of 65536 bytes,

Page 0 $00 – $FF: zero page memory.
Page 1 $100 – $1FF: the stack.
Page 2 Pages 2-247 contain assembled code and data. By default code is assembled and executed starting from the address $200.
Page 248 Pages 248-251 can be used to directly load ASCII strings.
Page 252 Pages 252-255 are used by the symbol table.

Example Programs

Below are some programs I have written and tested for the simulator.

Big Number Bitwise Rotation

Rotate left by one bit. Keep on pressing the [Run Code] button to repeat. The brk instruction here acts as a breakpoint.

; ================================================

  ldy #0
  lda #1

FILL

  ; fill the zero page with data
 
  sta 0,y
  iny
  cpy #0
  bne FILL

REPEAT

  ldx #$FF
  clc
  php

ROTL

  ; rotate the zero page bytes left by 1 bit

  ; the bytes are in big endian byte order

  plp
  rol 0,x
  php 
  dex
  cpx #$ff
  bne ROTL

  ; or the carry bit from byte $0 back into byte $FF

  lda #0
  plp
  adc #0
  ora $FF
  sta $FF

  brk

  jmp REPEAT

; ================================================

Rotate right by one bit. Keep on pressing the [Run Code] button to repeat. The brk instruction here acts as a breakpoint.

; ================================================

  ldy #0
  lda #1

FILL

  ; fill the zero page with data
 
  sta 0,y
  iny
  cpy #0
  bne FILL

REPEAT

  ldx #0
  clc
  php

ROTR

  ; rotate the zero page bytes right by 1 bit

  ; the bytes are in big endian byte order

  plp
  ror 0,x
  php 
  inx
  cpx #0
  bne ROTR

  ; or the carry bit from byte $FF back into byte $0

  lda #0
  plp
  ror
  ora 0
  sta 0

  brk

  jmp REPEAT

; ================================================

CRC-16

CRC-16 long division method. The first 2 bytes of MSG become the CRC.

; ================================================

  ; CRC-16

; ================================================

  org 0

  ; Polynomial = 80 05

P0 byt $80
P1 byt $05

  ; the initial xor

INIT byt $ff,$ff

  ; the final xor

FINAL byt 0,0

; ================================================

  org $200

  ; xor INIT into the first 2 bytes of MSG

  lda MSG
  eor INIT
  sta MSG
  lda MSG+1
  eor INIT+1
  sta MSG+1

CRC

  ; CRC-16 - long division method

  ; get the next bit from MSG

  jsr SHL

  ; if the bit is zero - do nothing

  bcc NEXT

  ; xor the polynomial into the first 2 bytes of MSG

  lda MSG
  eor P0
  sta MSG
  lda MSG+1
  eor P1
  sta MSG+1

NEXT

  ; decrement the bit and byte counters

  dec X8
  lda X8
  cmp #0
  bne CRC
  lda #8
  sta X8
  dec LEN
  lda LEN
  cmp #0
  bne CRC

  ; xor FINAL into the first 2 bytes of MSG

  lda MSG
  eor FINAL
  sta MSG
  lda MSG+1
  eor FINAL+1
  sta MSG+1

  ; the first 2 bytes of MSG now contain the CRC

  jmp STOP

; ================================================

SHL

  ; shift left the message bytes by 1 bit

  ; shl all 256 bytes starting from MSG

  ; MSG is in big endian byte order

  ; so start from the byte at MSG + $FF

  ldx #$ff
  clc
  php

ROT

  plp
  rol MSG,x
  php 
  dex
  cpx #$ff
  bne ROT

  ; need the carry bit - so pull PSW from the stack

  plp

  rts

; ================================================

  ; count the number of bits in each byte

X8 byt 8

  ; the length of the input message (max = 256 bytes)

LEN byt 16

; ================================================

STOP

  brk

; ================================================

  org $300

  ; the input message - can be up to 256 bytes

MSG 

  byt $1,$2,$3,$4,$5,$6,$7,$8
  byt $9,$0,$A,$B,$C,$D,$E,$F

; ================================================

CRC-16 table method. The 2 byte result is stored at zero page location 0.

; ================================================

  ; CRC-16 - Table Method

; ================================================

CRC epz 0
INIT epz 16
FINAL epz 32
BIT epz 48
BYTES epz 49
T_ENTRY epz 52

; ================================================

  ; for Y = $0 to Y = $FF

  ldy #0

FILL_TABLE

  ; generate a 256 element table

  ; the low byte for each entry is stored from $600-$6FF
  ; the high byte for each entry is stored from $700-$7FF

  ; load the value of Y into the lower byte

  sty BYTES

  ; load zero into the higher byte

  lda #0
  sta BYTES+1

  ; count the bits

  lda #8
  sta BIT

XOR_P

  ; get the next bit

  asl BYTES+1
  rol BYTES
  bcc NEXT

  ; xor the polynomial into BYTES, if the bit is set

  lda #$80
  eor BYTES
  sta BYTES
  lda #$05
  eor BYTES+1
  sta BYTES+1

NEXT

  ; for each bit in BYTES

  dec BIT
  lda BIT
  cmp #0
  bne XOR_P  

  ; store the lower byte at $600 + Y

  lda BYTES
  sta $600,y

  ; store the higher byte at $700 + Y

  lda BYTES+1
  sta $700,y

  ; get the next Y

  iny
  cpy #0
  bne FILL_TABLE

; ================================================

  ; set the value of the initial xor ($FFFF)

  lda #$ff
  sta INIT
  sta INIT+1

  ; set the value of the final xor ($0000)

  lda #0
  sta FINAL
  sta FINAL+1

; ================================================

  ; load the first 2 bytes of MSG into CRC

  lda MSG
  sta CRC
  lda MSG+1
  sta CRC+1

  ; xor INIT into the 2 bytes of CRC

  lda INIT
  eor CRC
  sta CRC
  lda INIT+1
  eor CRC+1
  sta CRC+1

  ldy #2

CALC_CRC

  ; the first byte of CRC is the index to the table

  ldx CRC

  ; get the table entry

  lda $600,x
  sta T_ENTRY
  lda $700,x
  sta T_ENTRY+1

  ; shift out the first CRC byte
  ; copy up the next CRC byte
  ; shift in the next MSG byte

  lda CRC+1
  sta CRC
  lda MSG,y
  sta CRC+1

  ; xor in the table entry

  lda CRC
  eor T_ENTRY
  sta CRC
  lda CRC+1
  eor T_ENTRY+1
  sta CRC+1

  iny
  cpy LEN
  bne CALC_CRC

PAD_1

  ; the first padding (zero) byte

  ldx CRC

  lda $600,x
  sta T_ENTRY
  lda $700,x
  sta T_ENTRY+1

  lda CRC+1
  sta CRC
  lda #0
  sta CRC+1
 
  lda CRC
  eor T_ENTRY
  sta CRC
  lda CRC+1
  eor T_ENTRY+1
  sta CRC+1

PAD_2

  ; the second padding (zero) byte

  ldx CRC

  lda $600,x
  sta T_ENTRY
  lda $700,x
  sta T_ENTRY+1

  lda CRC+1
  sta CRC
  lda #0
  sta CRC+1
 
  lda CRC
  eor T_ENTRY
  sta CRC
  lda CRC+1
  eor T_ENTRY+1
  sta CRC+1

  ; xor FINAL into the 2 bytes of CRC

  lda FINAL
  eor CRC
  sta CRC
  lda FINAL+1
  eor CRC+1
  sta CRC+1

  jmp STOP

; ================================================

  ; the length of the input message (bytes)

LEN byt 16

  org $300

  ; the input message - can be up to 256 bytes

MSG 

  byt $1,$2,$3,$4,$5,$6,$7,$8
  byt $9,$0,$A,$B,$C,$D,$E,$F

; ================================================

STOP

  brk

; ================================================

CRC-32

CRC-32 long division method. The first 4 bytes of MSG become the CRC.

; ================================================

  ; CRC-32

; ================================================

  org 0

 ; Polynomial = 04 C1 1D B7

P0 byt $04
P1 byt $C1
P2 byt $1D
P3 byt $B7

  ; the initial xor

INIT byt $ff,$ff,$ff,$ff

  ; the final xor

FINAL byt $ff,$ff,$ff,$ff

; ================================================

  org $200

  ; xor INIT into the first 4 bytes of MSG

  lda MSG
  eor INIT
  sta MSG
  lda MSG+1
  eor INIT+1
  sta MSG+1
  lda MSG+2
  eor INIT+2
  sta MSG+2
  lda MSG+3
  eor INIT+3
  sta MSG+3

CRC

  ; CRC-32 - long division method

  ; get the next bit from MSG

  jsr SHL

  ; if the bit is zero - do nothing

  bcc NEXT

  ; xor the polynomial into the first 4 bytes of MSG

  lda MSG
  eor P0
  sta MSG
  lda MSG+1
  eor P1
  sta MSG+1
  lda MSG+2
  eor P2
  sta MSG+2
  lda MSG+3
  eor P3
  sta MSG+3

NEXT

  ; decrement the bit and byte counters

  dec X8
  lda X8
  cmp #0
  bne CRC
  lda #8
  sta X8
  dec LEN
  lda LEN
  cmp #0
  bne CRC

  ; xor FINAL into the first 4 bytes of MSG

  lda MSG
  eor FINAL
  sta MSG
  lda MSG+1
  eor FINAL+1
  sta MSG+1
  lda MSG+2
  eor FINAL+2
  sta MSG+2
  lda MSG+3
  eor FINAL+3
  sta MSG+3

  ; the first 4 bytes of MSG now contain the CRC

  jmp STOP

; ================================================

SHL

  ; shift left the message bytes by 1 bit

  ; shl all 256 bytes starting from MSG

  ; MSG is in big endian byte order

  ; so start from the byte at MSG + $FF

  ldx #$ff
  clc
  php

ROT

  plp
  rol MSG,x
  php 
  dex
  cpx #$ff
  bne ROT

  ; need the carry bit - so pull PSW from the stack

  plp

  rts

; ================================================

  ; count the number of bits in each byte

X8 byt 8

  ; the length of the input message (max = 256 bytes)

LEN byt 4

; ================================================

STOP

  brk

; ================================================

  org $300

  ; the input message - can be up to 256 bytes

MSG 

  byt $31,$32,$33,$34

; ================================================

Random Numbers

Use the LCG to generate a 256 byte random number.

; ================================================

  ; Linear Congruential Generator

  ; X(n+1) = (aX(n) + c) mod m

  ; m = 1 00 00 00 00

  ; a = 00 03 43 FD

  ; c = 00 26 9E C3

  ; X(0) = 12 34 56 78

; ================================================

  ; use the LCG to randomly fill the 
  ; 256 bytes starting at $300

; ================================================

DWORD_1 epz 0
DWORD_2 epz 4
RESULT epz 8

RND equ $300

; ================================================

  ; set DWORD_1 equal to a = 00 03 43 FD

  lda #$FD
  sta DWORD_1
  lda #$43
  sta DWORD_1+1
  lda #$03
  sta DWORD_1+2
  lda #$00
  sta DWORD_1+3

  ; set DWORD_2 equal to X(0) = 12 34 56 78

  lda #$78
  sta DWORD_2
  lda #$56
  sta DWORD_2+1
  lda #$34
  sta DWORD_2+2
  lda #$12
  sta DWORD_2+3

  ; Y is the offset from RND = $300

  ldy #0

REPEAT

  ; calculate: aX(n)

  jsr MULT

  ; add c = 00 26 9E C3

  lda RESULT
  clc
  adc #$C3
  sta DWORD_2
  lda RESULT+1
  adc #$9E
  sta DWORD_2+1
  lda RESULT+2
  adc #$26
  sta DWORD_2+2
  lda RESULT+3
  adc #0
  sta DWORD_2+3

  ; store the result at RND + Y

  lda RESULT
  sta RND,y
  iny
  lda RESULT+1
  sta RND,y
  iny
  lda RESULT+2
  sta RND,y
  iny
  lda RESULT+3
  sta RND,y
  iny

  cpy #0
  bne REPEAT

  brk

; ================================================

MULT

  ; multiply two 4 byte values mod m = 1 00 00 00 00

  ; X counts the bits

  ldx #0

  ; initialise RESULT = 0

  lda #0
  sta RESULT
  sta RESULT+1
  sta RESULT+2
  sta RESULT+3

LOOP

  ; left shift RESULT

  asl RESULT
  rol RESULT+1
  rol RESULT+2
  rol RESULT+3

  ; left shift DWORD_2 to set the carry bit

  asl DWORD_2
  rol DWORD_2+1
  rol DWORD_2+2
  rol DWORD_2+3

  bcc NEXT

  ; if the carry bit is set, add DWORD_1 to RESULT

  lda DWORD_1
  clc
  adc RESULT
  sta RESULT
  lda DWORD_1+1
  adc RESULT+1
  sta RESULT+1
  lda DWORD_1+2
  adc RESULT+2
  sta RESULT+2
  lda DWORD_1+3
  adc RESULT+3
  sta RESULT+3

NEXT

  ; for each of the 32 bits in DWORD_2

  inx
  cpx #32
  bne LOOP

  rts

; ================================================

Big Number Subtraction

Two 256 byte random numbers are generated: RND_1 and RND_2. Then RND_1 is subtracted from RND_2. The subtraction is done in two different ways:

(1) SUBTRACT uses SBC to compute RND_2,x – RND_1,x for each of the 256 bytes.

(2) ADD_NOT uses ADC to compute RND_2,x + NOT(RND_1),x for each of the 256 bytes.

Note that for 6502 subtraction the carry bit must be initially set (SEC).

; ================================================

  ; use the linear congruential generator to 
  ; create two 256 byte random numbers, and 
  ; then subtract one from the other

; ================================================

DWORD_1 epz 0
DWORD_2 epz 4
RESULT epz 8
TEMP epz 12
COUNT epz 14

RND_1 equ $400
RND_2 equ $500
DIFF equ $600
INV_SUM equ $700

; ================================================

  ; use the LCG to randomly fill the 
  ; 256 bytes starting at $400

; ================================================

  ; set DWORD_1 equal to a = 00 03 43 FD

  lda #$FD
  sta DWORD_1
  lda #$43
  sta DWORD_1+1
  lda #$03
  sta DWORD_1+2
  lda #$00
  sta DWORD_1+3

  ; set DWORD_2 equal to X(0) = 12 34 56 78

  lda #$78
  sta DWORD_2
  lda #$56
  sta DWORD_2+1
  lda #$34
  sta DWORD_2+2
  lda #$12
  sta DWORD_2+3

  ldy #0

LOOP_1

  ; calculate: aX(n)

  jsr MULT

  ; add c = 00 26 9E C3

  lda RESULT
  clc
  adc #$C3
  sta DWORD_2
  lda RESULT+1
  adc #$9E
  sta DWORD_2+1
  lda RESULT+2
  adc #$26
  sta DWORD_2+2
  lda RESULT+3
  adc #0
  sta DWORD_2+3

  ; store the result at RND_1 + Y

  lda RESULT
  sta RND_1,y
  iny
  lda RESULT+1
  sta RND_1,y
  iny
  lda RESULT+2
  sta RND_1,y
  iny
  lda RESULT+3
  sta RND_1,y
  iny

  cpy #0
  bne LOOP_1

; ================================================

  ; use the LCG to randomly fill the 
  ; 256 bytes starting at $500

; ================================================

  ldy #0

LOOP_2

  ; calculate: aX(n)

  jsr MULT

  ; add c = 00 26 9E C3

  lda RESULT
  clc
  adc #$C3
  sta DWORD_2
  lda RESULT+1
  adc #$9E
  sta DWORD_2+1
  lda RESULT+2
  adc #$26
  sta DWORD_2+2
  lda RESULT+3
  adc #0
  sta DWORD_2+3

  ; store the result at RND_2 + Y

  lda RESULT
  sta RND_2,y
  iny
  lda RESULT+1
  sta RND_2,y
  iny
  lda RESULT+2
  sta RND_2,y
  iny
  lda RESULT+3
  sta RND_2,y
  iny

  cpy #0
  bne LOOP_2

; ================================================

  ; subtract RND_1 from RND_2

  jsr SUBTRACT

; ================================================

  ; add NOT(RND_1) to RND_2

  jsr ADD_NOT

; ================================================

STOP

  brk

; ================================================

MULT

  ; multiply two 4 byte values mod m = 1 00 00 00 00

  ldx #0

  ; initialise RESULT = 0

  lda #0
  sta RESULT
  sta RESULT+1
  sta RESULT+2
  sta RESULT+3

LOOP

  ; left shift RESULT

  asl RESULT
  rol RESULT+1
  rol RESULT+2
  rol RESULT+3

  ; left shift DWORD_2 to set the carry bit

  asl DWORD_2
  rol DWORD_2+1
  rol DWORD_2+2
  rol DWORD_2+3

  bcc NEXT

  ; if the carry bit is set, add DWORD_1 to RESULT

  lda DWORD_1
  clc
  adc RESULT
  sta RESULT
  lda DWORD_1+1
  adc RESULT+1
  sta RESULT+1
  lda DWORD_1+2
  adc RESULT+2
  sta RESULT+2
  lda DWORD_1+3
  adc RESULT+3
  sta RESULT+3

NEXT

  ; for each of the 32 bits in DWORD_2

  inx
  cpx #32
  bne LOOP

  rts

; ================================================

SUBTRACT

  ; DIFF = RND_2 - RND_1

  ; big endian byte order

  ldx #$FF

  sec
  php

S_LOOP

  lda RND_2,x
  plp
  sbc RND_1,x
  php
  sta DIFF,x

  dex
  cpx #$FF
  bne S_LOOP

  plp

  rts

; ================================================

ADD_NOT

  ; INV_SUM = RND_2 + NOT(RND_1)

  ; big endian byte order

  ldx #$FF

  sec
  php

I_LOOP

  lda RND_1,x
  eor #$FF
  plp
  adc RND_2,x
  php
  sta INV_SUM,x

  dex
  cpx #$FF
  bne I_LOOP

  plp

  rts

; ================================================

Big Number Multiply

Two 256 byte random numbers are generated: RND_1 and RND_2. They are then multiplied together and the 512 byte result stored at OUTPUT_1 and OUTPUT_2.

; ================================================

  ; use the linear congruential generator to 
  ; create two 256 byte random numbers, and 
  ; then multiply them together

; ================================================

DWORD_1 epz 0
DWORD_2 epz 4
RESULT epz 8
TEMP epz 12
COUNT epz 14

RND_1 equ $400
RND_2 equ $500

OUTPUT_1 equ $600
OUTPUT_2 equ $700

; ================================================

  ; use the LCG to randomly fill the 
  ; 256 bytes starting at $400

; ================================================

  ; set DWORD_1 equal to a = 00 03 43 FD

  lda #$FD
  sta DWORD_1
  lda #$43
  sta DWORD_1+1
  lda #$03
  sta DWORD_1+2
  lda #$00
  sta DWORD_1+3

  ; set DWORD_2 equal to X(0) = 12 34 56 78

  lda #$78
  sta DWORD_2
  lda #$56
  sta DWORD_2+1
  lda #$34
  sta DWORD_2+2
  lda #$12
  sta DWORD_2+3

  ldy #0

LOOP_1

  ; calculate: aX(n)

  jsr MULT

  ; add c = 00 26 9E C3

  lda RESULT
  clc
  adc #$C3
  sta DWORD_2
  lda RESULT+1
  adc #$9E
  sta DWORD_2+1
  lda RESULT+2
  adc #$26
  sta DWORD_2+2
  lda RESULT+3
  adc #0
  sta DWORD_2+3

  ; store the result at RND_1 + Y

  lda RESULT
  sta RND_1,y
  iny
  lda RESULT+1
  sta RND_1,y
  iny
  lda RESULT+2
  sta RND_1,y
  iny
  lda RESULT+3
  sta RND_1,y
  iny

  cpy #0
  bne LOOP_1

; ================================================

  ; use the LCG to randomly fill the 
  ; 256 bytes starting at $500

; ================================================

  ldy #0

LOOP_2

  ; calculate: aX(n)

  jsr MULT

  ; add c = 00 26 9E C3

  lda RESULT
  clc
  adc #$C3
  sta DWORD_2
  lda RESULT+1
  adc #$9E
  sta DWORD_2+1
  lda RESULT+2
  adc #$26
  sta DWORD_2+2
  lda RESULT+3
  adc #0
  sta DWORD_2+3

  ; store the result at RND_2 + Y

  lda RESULT
  sta RND_2,y
  iny
  lda RESULT+1
  sta RND_2,y
  iny
  lda RESULT+2
  sta RND_2,y
  iny
  lda RESULT+3
  sta RND_2,y
  iny

  cpy #0
  bne LOOP_2

; ================================================

  ; now multiply RND_1 by RND_2

  jsr BIG_MULT

; ================================================

STOP

  brk

; ================================================

MULT

  ; multiply two 4 byte values mod m = 1 00 00 00 00

  ldx #0

  ; initialise RESULT = 0

  lda #0
  sta RESULT
  sta RESULT+1
  sta RESULT+2
  sta RESULT+3

LOOP

  ; left shift RESULT

  asl RESULT
  rol RESULT+1
  rol RESULT+2
  rol RESULT+3

  ; left shift DWORD_2 to set the carry bit

  asl DWORD_2
  rol DWORD_2+1
  rol DWORD_2+2
  rol DWORD_2+3

  bcc NEXT

  ; if the carry bit is set, add DWORD_1 to RESULT

  lda DWORD_1
  clc
  adc RESULT
  sta RESULT
  lda DWORD_1+1
  adc RESULT+1
  sta RESULT+1
  lda DWORD_1+2
  adc RESULT+2
  sta RESULT+2
  lda DWORD_1+3
  adc RESULT+3
  sta RESULT+3

NEXT

  ; for each of the 32 bits in DWORD_2

  inx
  cpx #32
  bne LOOP

  rts

; ================================================

BIG_MULT

  ; multiply RND_1 by RND_2

  ; store the result in |OUTPUT_1|OUTPUT_2|
  
  ldx #0
  stx COUNT

LP_1

  ldy #0

  lda RND_2,x
  sta TEMP

LP_2

  jsr SHL

  asl TEMP
  bcc NEXT_BIT

  jsr ADD

NEXT_BIT

  iny
  cpy #8
  bne LP_2

  inc COUNT
  ldx COUNT
  cpx #0
  bne LP_1

  rts

; ================================================

ADD

  ; add |RND_1| to |OUTPUT_1|OUTPUT_2|

  ; big endian byte order

  ldx #$FF

  clc
  php

L1

  ; add RND_1 to OUTPUT_2

  lda RND_1,x
  plp
  adc OUTPUT_2,x
  php
  sta OUTPUT_2,x
  
  dex
  cpx #$FF
  bne L1

L2

  ; add the carry bit to OUTPUT_1

  plp
  bcc DONE
  lda #0
  adc OUTPUT_1,x
  php
  sta OUTPUT_1,x

  dex
  cpx #$FF
  bne L2

DONE

  rts

; ================================================

SHL

  ; rotate the bytes left by 1 bit

  ; the bytes are in big endian byte order

  ldx #$FF
  clc
  php

ROTL_2

  ; the 256 bytes starting at OUTPUT_2

  plp
  rol OUTPUT_2,x
  php 
  dex
  cpx #$FF
  bne ROTL_2

ROTL_1

  ; the 256 bytes starting at OUTPUT_1

  plp
  rol OUTPUT_1,x
  php 
  dex
  cpx #$FF
  bne ROTL_1

  ; get the carry bit from (OUTPUT_1 + 0)

  lda #0
  plp
  adc #0

  ; or the carry bit into (OUTPUT_2 + FF)

  ora OUTPUT_2+$FF
  sta OUTPUT_2+$FF

  rts

; ================================================

Serpent-1 Encryption

A 32 byte key is used to generate a 560 byte key schedule. Pointers are used to access the key schedule because it is bigger than 256 bytes. The 16 byte input plain text is stored at zero page location 32. The 32 byte key is stored at zero page location 80. The 16 byte output cipher text is written to zero page location 48 once the program has finished execution.

; ================================================

  ; Serpent-1 Encryption

; ================================================

  ; pointers to the sub-key array

PTR_KS_0 epz 0
PTR_KS_1 epz 2
PTR_KS_2 epz 4
PTR_KS_3 epz 6
PTR_KS_4 epz 8

INDEX epz 10
TEMP_W epz 12
ROUND epz 16

  ; the plain text (16 bytes)

P_TEXT epz 32

  ; the cipher text (16 bytes)

C_TEXT epz 48

  ; the key (32 bytes)

THE_KEY epz 80

; ================================================

  ; initialise the plain text

  org P_TEXT

  byt $00,$11,$22,$33
  byt $44,$55,$66,$77
  byt $88,$99,$AA,$BB
  byt $CC,$DD,$EE,$FF

; ================================================

  ; initialise the key

  org THE_KEY

  byt $12,$34,$56,$78
  byt $90,$AB,$CD,$EF
  byt $12,$34,$56,$78
  byt $90,$AB,$CD,$EF
  byt $12,$34,$56,$78
  byt $90,$AB,$CD,$EF
  byt $12,$34,$56,$78
  byt $90,$AB,$CD,$EF

; ================================================

  org $200

  ldy #0

COPY_K

  ; copy the 32 byte key to the first 
  ; 32 bytes of the key schedule

  lda THE_KEY,y
  sta KEY_SCHEDULE,y

  iny
  cpy #32
  bne COPY_K

  ; generate the key schedule

  jsr GENERATE_KS

  jsr SUB_KS

  ldy #0

COPY_P

  ; copy the 16 byte plain text to
  ; |WORD_0|WORD_1|WORD_2|WORD_3|

  lda P_TEXT,y
  sta WORD_0,y

  iny
  cpy #16
  bne COPY_P

  ; do the encryption

  jsr ENCRYPT

  ldy #0

COPY_C

  ; copy |WORD_0|WORD_1|WORD_2|WORD_3|
  ; back to the cipher text

  lda WORD_0,y
  sta C_TEXT,y

  iny
  cpy #16
  bne COPY_C

  jmp STOP

; ================================================

ENCRYPT

  ; encrypt the message in:

  ; |WORD_0|WORD_1|WORD_2|WORD_3|

  ; initialise the pointers to the sub-keys

  lda #KEY_SCHEDULE+32
  sta PTR_KS_0
  lda /KEY_SCHEDULE+32
  sta PTR_KS_0+1

  lda #KEY_SCHEDULE+36
  sta PTR_KS_1
  lda /KEY_SCHEDULE+36
  sta PTR_KS_1+1	

  lda #KEY_SCHEDULE+40
  sta PTR_KS_2
  lda /KEY_SCHEDULE+40
  sta PTR_KS_2+1	

  lda #KEY_SCHEDULE+44
  sta PTR_KS_3
  lda /KEY_SCHEDULE+44
  sta PTR_KS_3+1

  ; 32 rounds - ROUND = 0 to 31

  lda #0
  sta ROUND

LOOP

  ; xor the current sub-keys into the message

  jsr SUBKEY_XOR

  ; s-box index = ROUND mod 8 (SHL 4 bits)

  lda ROUND
  asl
  asl
  asl
  asl
  and #$70 
  sta S_INDEX

  ; apply the s-box

  jsr CALC_S

  ; skip the linear transform for the last round

  lda ROUND
  cmp #31
  beq LAST	

  ; apply the linear transform

  jsr LIN_TRANS

  ; next round

  inc ROUND

  jmp LOOP

LAST

  ; do the final xor of the sub-key

  jsr SUBKEY_XOR

  rts

; ================================================

SUBKEY_XOR

  ; xor the current 16 byte sub-key into the text

  ldy #0

  ; the first byte of each word

  lda (PTR_KS_0),y
  eor WORD_0
  sta WORD_0

  lda (PTR_KS_1),y
  eor WORD_1
  sta WORD_1

  lda (PTR_KS_2),y
  eor WORD_2
  sta WORD_2

  lda (PTR_KS_3),y
  eor WORD_3
  sta WORD_3

  iny

  ; the second byte of each word

  lda (PTR_KS_0),y
  eor WORD_0+1
  sta WORD_0+1

  lda (PTR_KS_1),y
  eor WORD_1+1
  sta WORD_1+1

  lda (PTR_KS_2),y
  eor WORD_2+1
  sta WORD_2+1

  lda (PTR_KS_3),y
  eor WORD_3+1
  sta WORD_3+1

  iny

  ; the third byte of each word

  lda (PTR_KS_0),y
  eor WORD_0+2
  sta WORD_0+2

  lda (PTR_KS_1),y
  eor WORD_1+2
  sta WORD_1+2

  lda (PTR_KS_2),y
  eor WORD_2+2
  sta WORD_2+2

  lda (PTR_KS_3),y
  eor WORD_3+2
  sta WORD_3+2

  iny

  ; the fourth byte of each word

  lda (PTR_KS_0),y
  eor WORD_0+3
  sta WORD_0+3

  lda (PTR_KS_1),y
  eor WORD_1+3
  sta WORD_1+3

  lda (PTR_KS_2),y
  eor WORD_2+3
  sta WORD_2+3

  lda (PTR_KS_3),y
  eor WORD_3+3
  sta WORD_3+3

  ; increment the pointers to the sub-key

  lda PTR_KS_0
  clc
  adc #16
  sta PTR_KS_0
  lda #0
  adc PTR_KS_0+1
  sta PTR_KS_0+1

  lda PTR_KS_1
  clc
  adc #16
  sta PTR_KS_1
  lda #0
  adc PTR_KS_1+1
  sta PTR_KS_1+1

  lda PTR_KS_2
  clc
  adc #16
  sta PTR_KS_2
  lda #0
  adc PTR_KS_2+1
  sta PTR_KS_2+1

  lda PTR_KS_3
  clc
  adc #16
  sta PTR_KS_3
  lda #0
  adc PTR_KS_3+1
  sta PTR_KS_3+1

  rts

; ================================================

LIN_TRANS

  ; the linear transformation

  ; LT: |WORD_0|WORD_1|WORD_2|WORD_3|

  ; w0 = ROL(w0,13)  

  ldx #0

ROL_13

  asl WORD_0
  rol WORD_0+1
  rol WORD_0+2
  rol WORD_0+3

  php
  pla
  and #1
  ora WORD_0
  sta WORD_0

  inx
  cpx #13
  bne ROL_13

  ; w2 = ROL(w2,3)

  ldx #0

ROL_3

  asl WORD_2
  rol WORD_2+1
  rol WORD_2+2
  rol WORD_2+3

  php
  pla
  and #1
  ora WORD_2
  sta WORD_2

  inx
  cpx #3
  bne ROL_3

  ; w1 = w1 xor w0 xor w2

  lda WORD_1
  eor WORD_0
  eor WORD_2
  sta WORD_1

  lda WORD_1+1
  eor WORD_0+1
  eor WORD_2+1
  sta WORD_1+1

  lda WORD_1+2
  eor WORD_0+2
  eor WORD_2+2
  sta WORD_1+2

  lda WORD_1+3
  eor WORD_0+3
  eor WORD_2+3
  sta WORD_1+3

  ; w3 = w3 xor w2 xor SHL(w0,3)

  lda WORD_0
  sta TEMP_W
  lda WORD_0+1
  sta TEMP_W+1
  lda WORD_0+2
  sta TEMP_W+2
  lda WORD_0+3
  sta TEMP_W+3

  ldx #0

SHL_3

  asl TEMP_W
  rol TEMP_W+1
  rol TEMP_W+2
  rol TEMP_W+3

  inx
  cpx #3
  bne SHL_3

  lda WORD_3
  eor WORD_2
  eor TEMP_W
  sta WORD_3

  lda WORD_3+1
  eor WORD_2+1
  eor TEMP_W+1
  sta WORD_3+1

  lda WORD_3+2
  eor WORD_2+2
  eor TEMP_W+2
  sta WORD_3+2

  lda WORD_3+3
  eor WORD_2+3
  eor TEMP_W+3
  sta WORD_3+3

  ; w1 = ROL(w1,1)

ROL_1

  asl WORD_1
  rol WORD_1+1
  rol WORD_1+2
  rol WORD_1+3

  php
  pla
  and #1
  ora WORD_1
  sta WORD_1

  ; w3 = ROL(w3,7)

  ldx #0

ROL_7

  asl WORD_3
  rol WORD_3+1
  rol WORD_3+2
  rol WORD_3+3

  php
  pla
  and #1
  ora WORD_3
  sta WORD_3

  inx
  cpx #7
  bne ROL_7

  ; w0 = w0 xor w1 xor w3

  lda WORD_0
  eor WORD_1
  eor WORD_3
  sta WORD_0

  lda WORD_0+1
  eor WORD_1+1
  eor WORD_3+1
  sta WORD_0+1

  lda WORD_0+2
  eor WORD_1+2
  eor WORD_3+2
  sta WORD_0+2

  lda WORD_0+3
  eor WORD_1+3
  eor WORD_3+3
  sta WORD_0+3

  ; w2 = w2 xor w3 xor SHL(w1,7)

  lda WORD_1
  sta TEMP_W
  lda WORD_1+1
  sta TEMP_W+1
  lda WORD_1+2
  sta TEMP_W+2
  lda WORD_1+3
  sta TEMP_W+3

  ldx #0

SHL_7

  asl TEMP_W
  rol TEMP_W+1
  rol TEMP_W+2
  rol TEMP_W+3

  inx
  cpx #7
  bne SHL_7

  lda WORD_2
  eor WORD_3
  eor TEMP_W
  sta WORD_2

  lda WORD_2+1
  eor WORD_3+1
  eor TEMP_W+1
  sta WORD_2+1

  lda WORD_2+2
  eor WORD_3+2
  eor TEMP_W+2
  sta WORD_2+2

  lda WORD_2+3
  eor WORD_3+3
  eor TEMP_W+3
  sta WORD_2+3

  ; w0 = ROL(w0,5)
 
  ldx #0

ROL_5

  asl WORD_0
  rol WORD_0+1
  rol WORD_0+2
  rol WORD_0+3

  php
  pla
  and #1
  ora WORD_0
  sta WORD_0

  inx
  cpx #5
  bne ROL_5

  ; w2 = ROL(w2,22)

  ldx #0

ROL_22

  asl WORD_2
  rol WORD_2+1
  rol WORD_2+2
  rol WORD_2+3

  php
  pla
  and #1
  ora WORD_2
  sta WORD_2

  inx
  cpx #22
  bne ROL_22

  rts

; ================================================

CALC_S

  ; apply the s-box to the 16 byte input block

  ; |WORD_0|WORD_1|WORD_2|WORD_3|

TO_BITSLICE

  ; break up the input block into 32 x 4-bit bitslices

  ldx #0

BYTE_1

  asl WORD_3
  rol BIT_SLICE,x
  asl WORD_2
  rol BIT_SLICE,x
  asl WORD_1
  rol BIT_SLICE,x
  asl WORD_0
  rol BIT_SLICE,x

  inx
  cpx #8
  bne BYTE_1

BYTE_2

  asl WORD_3+1
  rol BIT_SLICE,x
  asl WORD_2+1
  rol BIT_SLICE,x
  asl WORD_1+1
  rol BIT_SLICE,x
  asl WORD_0+1
  rol BIT_SLICE,x

  inx
  cpx #16
  bne BYTE_2
  
BYTE_3

  asl WORD_3+2
  rol BIT_SLICE,x
  asl WORD_2+2
  rol BIT_SLICE,x
  asl WORD_1+2
  rol BIT_SLICE,x
  asl WORD_0+2
  rol BIT_SLICE,x

  inx
  cpx #24
  bne BYTE_3
  
BYTE_4

  asl WORD_3+3
  rol BIT_SLICE,x
  asl WORD_2+3
  rol BIT_SLICE,x
  asl WORD_1+3
  rol BIT_SLICE,x
  asl WORD_0+3
  rol BIT_SLICE,x

  inx
  cpx #32
  bne BYTE_4

  ldx #0

SUB

  ; put each of the 32 bitslices through the s-box

  lda BIT_SLICE,x
  ora S_INDEX
  tay
  lda S_BOX,y
  sta BIT_SLICE,x

  inx
  cpx #32
  bne SUB

TO_WORD

  ; put the results back into the input block

  ldx #0

WD_1

  lsr BIT_SLICE,x
  rol WORD_0
  lsr BIT_SLICE,x
  rol WORD_1
  lsr BIT_SLICE,x
  rol WORD_2
  lsr BIT_SLICE,x
  rol WORD_3

  inx
  cpx #8
  bne WD_1

WD_2

  lsr BIT_SLICE,x
  rol WORD_0+1
  lsr BIT_SLICE,x
  rol WORD_1+1
  lsr BIT_SLICE,x
  rol WORD_2+1
  lsr BIT_SLICE,x
  rol WORD_3+1

  inx
  cpx #16
  bne WD_2

WD_3

  lsr BIT_SLICE,x
  rol WORD_0+2
  lsr BIT_SLICE,x
  rol WORD_1+2
  lsr BIT_SLICE,x
  rol WORD_2+2
  lsr BIT_SLICE,x
  rol WORD_3+2

  inx
  cpx #24
  bne WD_3

WD_4

  lsr BIT_SLICE,x
  rol WORD_0+3
  lsr BIT_SLICE,x
  rol WORD_1+3
  lsr BIT_SLICE,x
  rol WORD_2+3
  lsr BIT_SLICE,x
  rol WORD_3+3

  inx
  cpx #32
  bne WD_4

  rts

; ================================================

GENERATE_KS

  ; generate the key schedule from the 32 byte key

  ; initialise the pointers to the 4 byte words
  ; within the key schedule

  ; Key[i-8]

  lda #KEY_SCHEDULE
  sta PTR_KS_4
  lda /KEY_SCHEDULE
  sta PTR_KS_4+1

  ; Key[i-5]

  lda #KEY_SCHEDULE+12
  sta PTR_KS_3
  lda /KEY_SCHEDULE+12
  sta PTR_KS_3+1

  ; Key[i-3]

  lda #KEY_SCHEDULE+20
  sta PTR_KS_2
  lda /KEY_SCHEDULE+20
  sta PTR_KS_2+1	

  ; Key[i-1]

  lda #KEY_SCHEDULE+28
  sta PTR_KS_1
  lda /KEY_SCHEDULE+28
  sta PTR_KS_1+1	

  ; Key[i]

  lda #KEY_SCHEDULE+32
  sta PTR_KS_0
  lda /KEY_SCHEDULE+32
  sta PTR_KS_0+1

  ; FOR INDEX = 0 TO INDEX = 131

  lda #0
  sta INDEX

NEXT_KS

  ldy #0

  lda (PTR_KS_4),y
  eor (PTR_KS_3),y
  eor (PTR_KS_2),y
  eor (PTR_KS_1),y
  eor #$b9
  eor INDEX	
  sta (PTR_KS_0),y

  iny

  lda (PTR_KS_4),y
  eor (PTR_KS_3),y
  eor (PTR_KS_2),y
  eor (PTR_KS_1),y
  eor #$79
  sta (PTR_KS_0),y

  iny

  lda (PTR_KS_4),y
  eor (PTR_KS_3),y
  eor (PTR_KS_2),y
  eor (PTR_KS_1),y
  eor #$37
  sta (PTR_KS_0),y

  iny

  lda (PTR_KS_4),y
  eor (PTR_KS_3),y
  eor (PTR_KS_2),y
  eor (PTR_KS_1),y
  eor #$9e
  sta (PTR_KS_0),y

  ldx #0

ROL_11

  ldy #0

  ; ROTL Key[i] by 11 bits

  lda (PTR_KS_0),y
  asl 
  sta (PTR_KS_0),y
  iny
  lda (PTR_KS_0),y
  rol 
  sta (PTR_KS_0),y
  iny
  lda (PTR_KS_0),y
  rol 
  sta (PTR_KS_0),y
  iny
  lda (PTR_KS_0),y
  rol 
  sta (PTR_KS_0),y

  ldy #0

  php
  pla
  and #1
  ora (PTR_KS_0),y
  sta (PTR_KS_0),y

  inx
  cpx #11
  bne ROL_11

  ; increment the pointers to the 4 byte
  ; words within the key schedule

  ; Key[i-8]

  lda PTR_KS_4
  clc
  adc #4
  sta PTR_KS_4
  lda #0
  adc PTR_KS_4+1
  sta PTR_KS_4+1

  ; Key[i-5]

  lda PTR_KS_3
  clc
  adc #4
  sta PTR_KS_3
  lda #0
  adc PTR_KS_3+1
  sta PTR_KS_3+1  

  ; Key[i-3]

  lda PTR_KS_2
  clc
  adc #4
  sta PTR_KS_2
  lda #0
  adc PTR_KS_2+1
  sta PTR_KS_2+1 

  ; Key[i-1]

  lda PTR_KS_1
  clc
  adc #4
  sta PTR_KS_1
  lda #0
  adc PTR_KS_1+1
  sta PTR_KS_1+1

  ; Key[i]

  lda PTR_KS_0
  clc
  adc #4
  sta PTR_KS_0
  lda #0
  adc PTR_KS_0+1
  sta PTR_KS_0+1

  ; increment the index counter
  ; exit if index = 132

  inc INDEX
  lda INDEX
  eor #132
  beq DONE

  jmp NEXT_KS

DONE

  rts

; ================================================

SUB_KS

  ; apply the s-box to each 16 byte
  ; sub-key within the key schedule

  ; initialise the pointers to each
  ; 4 byte word within the sub-key

  lda #KEY_SCHEDULE+32
  sta PTR_KS_0
  lda /KEY_SCHEDULE+32
  sta PTR_KS_0+1

  lda #KEY_SCHEDULE+36
  sta PTR_KS_1
  lda /KEY_SCHEDULE+36
  sta PTR_KS_1+1

  lda #KEY_SCHEDULE+40
  sta PTR_KS_2
  lda /KEY_SCHEDULE+40
  sta PTR_KS_2+1

  lda #KEY_SCHEDULE+44
  sta PTR_KS_3
  lda /KEY_SCHEDULE+44
  sta PTR_KS_3+1

  ; the s-box index is initialised to 3

  lda #$30
  sta S_INDEX

  ; FOR INDEX = 0 TO INDEX = 32 (33 sub-keys)

  lda #0
  sta INDEX	

NEXT_BLOCK

  ; copy the 16 byte sub-key to

  ; |WORD_0|WORD_1|WORD_2|WORD_3|

  ldy #0

  lda (PTR_KS_0),y
  sta WORD_0
  lda (PTR_KS_1),y
  sta WORD_1
  lda (PTR_KS_2),y
  sta WORD_2
  lda (PTR_KS_3),y
  sta WORD_3

  iny

  lda (PTR_KS_0),y
  sta WORD_0,y
  lda (PTR_KS_1),y
  sta WORD_1,y
  lda (PTR_KS_2),y
  sta WORD_2,y
  lda (PTR_KS_3),y
  sta WORD_3,y

  iny

  lda (PTR_KS_0),y
  sta WORD_0,y
  lda (PTR_KS_1),y
  sta WORD_1,y
  lda (PTR_KS_2),y
  sta WORD_2,y
  lda (PTR_KS_3),y
  sta WORD_3,y

  iny

  lda (PTR_KS_0),y
  sta WORD_0,y
  lda (PTR_KS_1),y
  sta WORD_1,y
  lda (PTR_KS_2),y
  sta WORD_2,y
  lda (PTR_KS_3),y
  sta WORD_3,y

  ; apply the s-box

  jsr CALC_S

  ; the result is in: |WORD_0|WORD_1|WORD_2|WORD_3|

  ; copy back to the sub-key

  ldy #0

  lda WORD_0
  sta (PTR_KS_0),y
  lda WORD_1
  sta (PTR_KS_1),y
  lda WORD_2
  sta (PTR_KS_2),y
  lda WORD_3
  sta (PTR_KS_3),y

  iny

  lda WORD_0,y
  sta (PTR_KS_0),y
  lda WORD_1,y
  sta (PTR_KS_1),y
  lda WORD_2,y
  sta (PTR_KS_2),y
  lda WORD_3,y
  sta (PTR_KS_3),y

  iny

  lda WORD_0,y
  sta (PTR_KS_0),y
  lda WORD_1,y
  sta (PTR_KS_1),y
  lda WORD_2,y
  sta (PTR_KS_2),y
  lda WORD_3,y
  sta (PTR_KS_3),y

  iny

  lda WORD_0,y
  sta (PTR_KS_0),y
  lda WORD_1,y
  sta (PTR_KS_1),y
  lda WORD_2,y
  sta (PTR_KS_2),y
  lda WORD_3,y
  sta (PTR_KS_3),y

  ; update the s-box index

  lda S_INDEX

  cmp #0
  bne SKIP
  lda #$70
  jmp INC_BLOCK

SKIP

  sec
  sbc #$10

INC_BLOCK

  sta S_INDEX

  ; increment to the next 16 byte sub-key

  lda PTR_KS_0
  clc
  adc #16
  sta PTR_KS_0
  lda #0
  adc PTR_KS_0+1
  sta PTR_KS_0+1 

  lda PTR_KS_1
  clc
  adc #16
  sta PTR_KS_1
  lda #0
  adc PTR_KS_1+1
  sta PTR_KS_1+1 

  lda PTR_KS_2
  clc
  adc #16
  sta PTR_KS_2
  lda #0
  adc PTR_KS_2+1
  sta PTR_KS_2+1 

  lda PTR_KS_3
  clc
  adc #16
  sta PTR_KS_3
  lda #0
  adc PTR_KS_3+1
  sta PTR_KS_3+1 

  ; update the counter

  inc INDEX
  lda INDEX
  eor #33
  beq QUIT
  jmp NEXT_BLOCK

QUIT

  rts

; ================================================

STOP

  brk

; ================================================

  org $800

  ; the block containing the 16 byte input message

  ; the encryption, linear transform and s-box 
  ; calculations are directly applied to this block

WORD_0 byt 0,0,0,0
WORD_1 byt 0,0,0,0
WORD_2 byt 0,0,0,0
WORD_3 byt 0,0,0,0

  ; the array of the 32 x 4-bit bitslices

BIT_SLICE dfs 32,0

  ; the upper 4-bits of s_index is the s-box number (0-7)

  ; this is ORed into a bitslice to get an offset
  ; from the start of the s-box array

S_INDEX byt 0

  ; the 8 s-box arrays

S_BOX

  byt 3,8,15,1,10,6,5,11,14,13,4,2,7,0,9,12
  byt 15,12,2,7,9,0,5,10,1,11,14,8,6,13,3,4
  byt 8,6,7,9,3,12,10,15,13,1,14,4,0,11,5,2
  byt 0,15,11,8,12,9,6,3,13,1,2,4,10,7,5,14
  byt 1,15,8,3,12,0,11,6,2,5,4,10,9,14,7,13
  byt 15,5,2,11,4,10,9,12,0,3,14,8,13,6,7,1
  byt 7,2,12,5,8,4,6,11,14,9,1,15,13,3,10,0
  byt 1,13,15,0,14,8,2,11,7,4,12,10,9,3,5,6

; ================================================

  ; the 560 byte key schedule

  ; the first 32 bytes are the actual key

  ; the next 528 bytes are the 33 x 16 byte sub-keys

  org $900

KEY_SCHEDULE dfs 560

; ================================================

GOST 28147-89 ECB

The 32 byte key stored at KDS is used to encrypt the 8 byte input block stored at N1 and N2. Once the program has finished execution N1 and N2 store the result.

; ================================================

KDS epz 0
N1 epz 48
N2 epz 52
CM epz 80
P_XR epz 96
ROUND epz 98
TEMP epz 100

; ================================================

  org KDS

  hex 75713134 
  hex B60FEC45 
  hex A607BB83 
  hex AA3746AF 
  hex 4FF99DA6 
  hex D1B53B5B 
  hex 1B402A1B 
  hex AA030D1B

  org N1

  hex 11223344 
  hex 55667788 

; ================================================

  org $200

; ================================================

  lda #0
  sta ROUND

RND_1_24

  ; rounds 1 to 24

  ; set the pointer to the round subkey X(R)

  ; P_XR = KDS + 4(ROUND mod 8)

  lda ROUND
  and #7
  asl
  asl
  clc
  adc #KDS
  sta P_XR

  jsr CALC_ROUND

  ; update N1 and N2

  jsr N1_TO_N2
  jsr CM_TO_N1

  inc ROUND
  lda ROUND
  cmp #24
  bne RND_1_24

  ; for the last 8 rounds, set ROUND = 7 and count down

  lda #7
  sta ROUND
  
RND_25_32

  ; rounds 25 to 32

  ; set the pointer to the round subkey X(R)

  ; P_XR = KDS + 4(ROUND mod 8)

  lda ROUND
  asl
  asl
  clc
  adc #KDS
  sta P_XR

  jsr CALC_ROUND

  dec ROUND
  lda ROUND
  cmp #$FF
  beq CM_TO_N2

  ; update N1 and N2

  jsr N1_TO_N2
  jsr CM_TO_N1

  jmp RND_25_32

CM_TO_N2

  ; set N2 = CM

  lda CM
  sta N2
  lda CM+1
  sta N2+1
  lda CM+2
  sta N2+2
  lda CM+3
  sta N2+3

  jmp STOP

; ================================================

CALC_ROUND

  ; CM = N1
  ; CM = CM + X(R)
  ; CM = S_BOXES(CM)
  ; CM = ROTL(CM,11)
  ; CM = CM xor N2

  jsr N1_TO_CM
  jsr CALC_CM1
  jsr SUB_CM
  jsr ROTL_CM
  jsr CALC_CM2

  rts

; ================================================

N1_TO_CM

  ; set CM = N1

  lda N1
  sta CM
  lda N1+1
  sta CM+1
  lda N1+2
  sta CM+2
  lda N1+3
  sta CM+3

  rts

; ================================================

CALC_CM1

  ; calc CM = N1 + X(R)

  ; X(R) is the subkey for round R

  ; P_XR points to X(R)

  ldy #0

  lda CM
  clc
  adc (P_XR),y	
  sta CM

  iny
  lda CM+1
  adc (P_XR),y	
  sta CM+1

  iny
  lda CM+2
  adc (P_XR),y	
  sta CM+2

  iny
  lda CM+3
  adc (P_XR),y	
  sta CM+3

  rts

; ================================================

SUB_CM

  ; apply the S boxes

  ; apply K8 to the upper 4 bits of CM

  lda CM+3
  and #$F0
  lsr
  lsr
  lsr
  lsr
  tax
  lda K8,x
  asl
  asl
  asl
  asl
  pha

  ; apply K7 to the next 4 bits of CM
  
  lda CM+3
  and #$F
  tax
  lda K7,x

  ; recombine the nibbles and write back to CM

  sta CM+3
  pla
  ora CM+3
  sta CM+3

  ; apply K6 to the next 4 bits of CM

  lda CM+2
  and #$F0
  lsr
  lsr
  lsr
  lsr
  tax
  lda K6,x
  asl
  asl
  asl
  asl
  pha

  ; apply K5 to the next 4 bits of CM
  
  lda CM+2
  and #$F
  tax
  lda K5,x

  ; recombine the nibbles and write back to CM

  sta CM+2
  pla
  ora CM+2
  sta CM+2

  ; apply K4 to the next 4 bits of CM

  lda CM+1
  and #$F0
  lsr
  lsr
  lsr
  lsr
  tax
  lda K4,x
  asl
  asl
  asl
  asl
  pha

  ; apply K3 to the next 4 bits of CM
  
  lda CM+1
  and #$F
  tax
  lda K3,x

  ; recombine the nibbles and write back to CM

  sta CM+1
  pla
  ora CM+1
  sta CM+1

  ; apply K2 to the next 4 bits of CM

  lda CM
  and #$F0
  lsr
  lsr
  lsr
  lsr
  tax
  lda K2,x
  asl
  asl
  asl
  asl
  pha

  ; apply K1 to the lower 4 bits of CM
  
  lda CM
  and #$F
  tax
  lda K1,x

  ; recombine the nibbles and write back to CM

  sta CM
  pla
  ora CM
  sta CM

  rts

; ================================================

ROTL_CM

  ; ROTL CM by 11 bits

  ; copy up by 1 byte (8 bits)

  lda CM+3
  pha
  lda CM+2
  sta CM+3
  lda CM+1
  sta CM+2
  lda CM
  sta CM+1
  pla
  sta CM

  ; ROTL by 1 bit

  asl CM
  rol CM+1
  rol CM+2
  rol CM+3
  php
  pla
  and #1
  ora CM
  sta CM

  ; ROTL by 1 bit

  asl CM
  rol CM+1
  rol CM+2
  rol CM+3
  php
  pla
  and #1
  ora CM
  sta CM

  ; ROTL by 1 bit

  asl CM
  rol CM+1
  rol CM+2
  rol CM+3
  php
  pla
  and #1
  ora CM
  sta CM

  rts

; ================================================

CALC_CM2

  ; calc CM = CM xor N2

  lda CM
  eor N2
  sta CM
  lda CM+1
  eor N2+1
  sta CM+1
  lda CM+2
  eor N2+2
  sta CM+2
  lda CM+3
  eor N2+3
  sta CM+3

  rts

; ================================================

N1_TO_N2

  ; set N2 = N1

  lda N1
  sta N2
  lda N1+1
  sta N2+1
  lda N1+2
  sta N2+2
  lda N1+3
  sta N2+3

  rts

; ================================================

CM_TO_N1

  ; set N1 = CM

  lda CM
  sta N1
  lda CM+1
  sta N1+1
  lda CM+2
  sta N1+2
  lda CM+3
  sta N1+3

  rts

; ================================================

  ; the S boxes

K8 byt $1,$F,$D,$0,$5,$7,$A,$4,$9,$2,$3,$E,$6,$B,$8,$C
K7 byt $D,$B,$4,$1,$3,$F,$5,$9,$0,$A,$E,$7,$6,$8,$2,$C
K6 byt $4,$B,$A,$0,$7,$2,$1,$D,$3,$6,$8,$5,$9,$C,$F,$E
K5 byt $6,$C,$7,$1,$5,$F,$D,$8,$4,$A,$9,$E,$0,$3,$B,$2
K4 byt $7,$D,$A,$1,$0,$8,$9,$F,$E,$4,$6,$C,$B,$2,$5,$3
K3 byt $5,$8,$1,$D,$A,$3,$4,$2,$E,$F,$C,$7,$6,$0,$9,$B
K2 byt $E,$B,$4,$C,$6,$D,$F,$A,$2,$3,$8,$1,$0,$7,$5,$9
K1 byt $4,$A,$9,$2,$D,$8,$0,$E,$6,$B,$1,$C,$7,$F,$5,$3

; ================================================

STOP

  brk

; ================================================
Advertisements

Create a free website or blog at WordPress.com.

%d bloggers like this: