This post describes a simulator for the 6502 assembly language that I have written in the x86 assembly language using the Flat Assembler (FASM).
x86 Code
The actual x86 code for the simulator is given in Part 2.
References
This simulator is based partly on the LISA assembler for the Apple II computer as described in the book:
Using 6502 Assembly Language – Randy Hyde
A manual for the LISA assembler can be found here:
LISA Assembler Version 2.5 Manual Online
A detailed reference for the 6502 instruction set can be found here:
homepage.ntlworld.com/cyborgsystems/CS_Main/6502/6502.htm
My simulator in no way supports every feature of the LISA assembler as described in the above sources. The sections below describe what elements of the 6502 assembly language are supported by my simulator.
Memory
Total memory available is 65536 bytes, with absolute memory addresses being 16 bits in length. The memory is organised into 256 pages of 256 bytes each.
Registers
Accumulator | An 8-bit register. Most operations are carried out through the accumulator. |
---|---|
Index Register X | An 8-bit register. Used for counters and offsets. |
Index Register Y | An 8-bit register. Used for counters and offsets. |
Stack Pointer | An 8-bit register pointing to the stack. Incremented when a byte is pulled. Decremented when a byte is pushed. |
Program Counter | A 16-bit register pointing to the next instruction to be executed. |
Processor Status | An 8-bit register holding the processor status word (PSW). |
Status Flags
Includes the carry flag (C), the zero flag (Z), the overflow flag (V) and the negative flag (N). These are bits within the PSW (processor status word).
Operand Expressions
An operand expression can contain one or two arguments. If there are two arguments they must be separated by a plus or minus sign. An argument can be a number, symbol or char constant as described in the table:
Hex Number | If a dollar sign $ is detected, the number following it will be read as a hex number. |
---|---|
Decimal Number | If an exclamation ! is detected, the number following it will be read as a decimal number. A decimal number does not need to be preceded by an exclamation. If a number is detected (without a $, % or !) the compiler will try to read it as a decimal number. |
Binary Number | If a percent sign % is detected, the number following it will be read as a binary number. |
Symbol | If a symbol is detected, the compiler will look it up in the symbol table to get it’s value. |
* | If an asterix is detected, the compiler will substitute the current value of the location counter. |
Character Constant | If a single character enclosed in quotes is detected, the compiler will substitute it’s ASCII value. |
Some examples: lda $1A2B adc $F6 adc %10110110 adc !56 adc 56 lda SUM cmp "A" lda $C5+%1101 sbc 123+$EE and A-B ora $12+XYZ lda #A+B lda #$FD+ASDF lda /SUM+%11010011 bne *+10 lda #"T"+"5" eor $123+99,x lda (%1101-$AC,x) sta (1234+ZXC),y |
Symbols
A symbol can contain up to 14 characters, and is terminated by a white space character or CRLF. Symbols can contain letters, numbers or an underscore. The compiler stores any symbols it finds in the symbol table which starts at page 252 in memory. 16 bytes are allocated for each symbol in this table. The first 14 bytes are for the symbol, while the last 2 bytes are for the 16 bit value (address) of the symbol. Symbols are case insensitive (all lower case chars are converted to upper case by the compiler).
A symbol must always begin with a letter. The compiler detects a symbol by looking in the first column of a line for this first letter. If a line does not contain a symbol then the first column must be left blank.
Pseudo-Ops
EQU | Label EQU operand | Assign a 2 byte value to a symbol. The operand can only contain a decimal, binary or hex number. |
---|---|---|
EPZ | Label EPZ operand | Assign a 1 byte value to a symbol. The operand can only contain a decimal, binary or hex number. |
DFS | Label DFS expression | Define storage. The expression gives the number of bytes to reserve in memory. |
ORG | Label ORG expression | Set the value of the location counter, as given by the expression. The label is optional. |
HEX | Label HEX operand | Place up to 4 bytes into memory, starting from the current position as given by the location counter. The operand can only contain hex digits (up to 8). No $ is required. The label is optional. |
ADR | Label ADR expression | Place the 2 bytes as given by the expression into memory, first the low byte and then the high byte. The label is optional. |
BYT | Label BYT expression | Place the low byte as given by the expression into memory. The label is optional. |
HBY | Label HBY expression | Place the high byte as given by the expression into memory. The label is optional. |
DBY | Label DBY expression | Place the 2 bytes as given by the expression into memory, first the high byte and then the low byte. The label is optional. |
Some examples: A equ $1234 B equ !68 C equ 70 D equ %11001101 XYZ epz $A5 ASDF epz 10 ZXC epz %1011 ARR dfs 100 KEY dfs 32,0 org $800 org 2000 org %1111 hex 1A2B3C4D hex ABCDEF hex 48 adr ARR+7 adr $3456 byt $12 byt A hby 4567 hby %1110001110011 dby 64532 dby $9876 dby D+100 |
Addressing Modes
Accumulator | No operand is required. The instruction acts upon the accumulator. |
---|---|
Implied | No operand is required. The register acted upon is implied by the instruction. |
Relative | Used only by the branch instructions. The operand is an address that must be within -128 to +127 bytes of the address of the instruction that follows the branch instruction. |
Immediate | The operand is an immediate value, preceded by a # or /. If # the lower byte of the operand is used. If / the upper byte of the operand is used. |
Zero Page | The operand is a 1 byte address value that points only to a zero page location. |
Absolute | The operand is a 2 byte address value that can point to any location in memory. |
zp,x | Zero page indexed with x: zp is a zero page base address. The target address is obtained by adding the contents of the x index register. If Base + Index is greater than $FF, wrapping occurs. |
zp,y | Zero page indexed with y: zp is a zero page base address. The target address is obtained by adding the contents of the y index register. If Base + Index is greater than $FF, wrapping occurs. |
a,x | Absolute address indexed with x: a is an absolute base address. The target address is obtained by adding the contents of the x index register. If Base + Index is greater than $FFFF, wrapping occurs. |
a,y | Absolute address indexed with y: a is an absolute base address. The target address is obtained by adding the contents of the y index register. If Base + Index is greater than $FFFF, wrapping occurs. |
(zp,x) | Indexed indirect addressing: a zero page address is calculated by adding the contents of the x index register to zp. If the result is greater than $FF, wrapping occurs. Then an absolute 2 byte target address is read from the location given by the zero page address. |
(zp),y | Indirect indexed addressing: a base address is read from the zero page location given by zp. Then the contents of the index register y is added to the base to give the final target address. If Base + Index is greater than $FFFF, wrapping occurs. |
(a) | Indirect addressing: the target address is read from the absolute memory location a. Used only by the JMP instruction. |
Addressing Mode Examples
Accumulator: asl ror lsr ror Implied: nop clc inx pha tax tsx plp Relative: bne *+40 beq EQ Immediate: lda #12 lda /$4567 adc #$FE Zero Page: sta 10 sbc 40 lsr 77 eor A Absolute: lda $3456 ora $7890 jmp $5678 jsr CALC Indirect: jmp ($3456) jmp (P_ADD) |
Demonstration of the zp,x mode, showing how zp+x wraps back around when it becomes greater than $FF.
; ================================================ ldx #0 LOOP txa sta $80,x inx cpx #0 bne LOOP ; ================================================ |
Demonstration of the a,y mode: write the value of y to the absolute base address of $300 offset by y.
; ================================================ ldy #0 LOOP tya sta $300,y iny cpy #0 bne LOOP ; ================================================ |
Demonstration of the (zp,x) mode. First a table of 2 byte absolute addresses is created on the zero page (LOOP_1). Loop two then uses the sta instruction in the (zp,x) mode to write a table of values to the target addresses.
; ================================================ ldx #0 LOOP_1 ; create a table of 2 byte target addresses ; (= $300 + x) on the zero page ; the first byte of each address is x ; the second byte is 3 txa lsr sta 0,x inx lda #3 sta 0,x inx cpx #0 bne LOOP_1 LOOP_2 ; 0+x is the zero page address that stores ; the 2 byte target address txa sta (0,x) inx inx cpx #0 bne LOOP_2 ; ================================================ |
Demonstration of the (zp),y mode. This code creates exactly the same table of data at absolute address $300. The difference is that only one address needs to be stored on the zero page. The statement sta (0),y reads this address from location 0 and then adds y to it to get the target address.
; ================================================ ; store the address $300 at zero page location 0 lda #0 sta 0 lda #3 sta 1 ldy #0 LOOP ; store the value 2*y at $300+y tya asl sta (0),y iny cpy #$80 bne LOOP ; ================================================ |
Add & Subtract
ADC is add with carry, so before using ADC the carry bit should be cleared (CLC).
Before using the SBC instruction the carry bit should be set using the SEC instruction. Setting the carry to 1 is the same as setting the borrow bit to 0, and vice versa. If after the execution of a SBC instruction the carry is 0 (borrow = 1), it means that a bit needed to be borrowed to successfully complete the subtraction. If no borrow was required then the carry bit will be 1.
See the Big Number Subtraction example program below.
ADC | SBC | |
---|---|---|
Immediate | 69 | E9 |
Zero Page | 65 | E5 |
Absolute | 6D | ED |
zp,x | 75 | F5 |
a,x | 7D | FD |
a,y | 79 | F9 |
(zp,x) | 61 | E1 |
(zp),y | 71 | F1 |
Flags | CVZN | CVZN |
Logical
AND | ORA | EOR | BIT | |
---|---|---|---|---|
Immediate | 29 | 09 | 49 | – |
Zero Page | 25 | 05 | 45 | 24 |
Absolute | 2D | 0D | 4D | 2C |
zp,x | 35 | 15 | 55 | – |
a,x | 3D | 1D | 5D | – |
a,y | 39 | 19 | 59 | – |
(zp,x) | 21 | 01 | 41 | – |
(zp),y | 31 | 11 | 51 | – |
Flags | ZN | ZN | ZN | VZN |
Note that XOR can be used in place of EOR.
Shift & Rotate
These are the only type of instructions that use the Accumulator mode.
ASL | LSR | ROL | ROR | |
---|---|---|---|---|
Accumulator | 0A | 4A | 2A | 6A |
Zero Page | 06 | 46 | 26 | 66 |
Absolute | 0E | 4E | 2E | 6E |
zp,x | 16 | 56 | 36 | 76 |
a,x | 1E | 5E | 3E | 7E |
Flags | CZN | CZN | CZN | CZN |
Load & Store
LDA | STA | LDX | STX | LDY | STY | |
---|---|---|---|---|---|---|
Immediate | A9 | – | A2 | – | A0 | – |
Zero Page | A5 | 85 | A6 | 86 | A4 | 84 |
Absolute | AD | 8D | AE | 8E | AC | 8C |
zp,x | B5 | 95 | – | – | B4 | 94 |
zp,y | – | – | B6 | 96 | – | – |
a,x | BD | 9D | – | – | BC | – |
a,y | B9 | 99 | BE | – | – | – |
(zp,x) | A1 | 81 | – | – | – | – |
(zp),y | B1 | 91 | – | – | – | – |
Flags | ZN | – | ZN | – | ZN | – |
Increment & Decrement
INC | DEC | |
---|---|---|
Zero Page | E6 | C6 |
Absolute | EE | CE |
zp,x | F6 | D6 |
a,x | FE | DE |
Flags | ZN | ZN |
Compare
The CMP instruction compares a byte in memory M with the accumulator ACC.
– If ACC – M >= 0 the C flag is set.
– If ACC – M = 0 the Z flag is set.
– If ACC – M < 0 the N flag is set.
The CPX and CPY instructions work in the same way, but for the X and Y registers.
CMP | CPX | CPY | |
---|---|---|---|
Immediate | C9 | E0 | C0 |
Zero Page | C5 | E4 | C4 |
Absolute | CD | EC | CC |
zp,x | D5 | – | – |
a,x | DD | – | – |
a,y | D9 | – | – |
(zp,x) | C1 | – | – |
(zp),y | D1 | – | – |
Flags | CZN | CZN | CZN |
Branch
The branches are the only type of instruction to use relative mode. The operand is an address in memory, which is used to calculate a single byte offset that is relative to the address of the instruction after the branch instruction. The branch instructions all assemble to 2 bytes – the opcode byte plus the offset byte. Therefore the branch destination must be within -126 to +129 bytes of the address of the branch instruction.
Opcode | Description | |
---|---|---|
BPL | 10 | Branch if N = 0 |
BMI | 30 | Branch if N = 1 |
BVC | 50 | Branch if V = 0 |
BVS | 70 | Branch if V = 1 |
BCC | 90 | Branch if C = 0 |
BCS | B0 | Branch if C = 1 |
BNE | D0 | Branch if Z = 0 |
BEQ | F0 | Branch if Z = 1 |
Note that BGE can be used in place of BCS, BLT in place of BCC, BTR in place of BEQ, and BFL in place of BNE.
Implied Mode Instructions
Opcode | Flags | Description | |
---|---|---|---|
NOP | EA | – | No operation |
SEC | 38 | C | Set C |
CLC | 18 | C | Clear C |
CLV | B8 | V | Clear V |
INX | E8 | ZN | Increment X |
DEX | CA | ZN | Decrement X |
INY | C8 | ZN | Increment Y |
DEY | 88 | ZN | Decrement Y |
PHA | 48 | – | Push ACC to the stack |
PLA | 68 | ZN | Pull ACC from the stack |
TAX | AA | ZN | Copy ACC to X |
TXA | 8A | ZN | Copy X to ACC |
TAY | A8 | ZN | Copy ACC to Y |
TYA | 98 | ZN | Copy Y to ACC |
TSX | BA | ZN | Copy SP to X |
TXS | 9A | – | Copy X to SP |
PHP | 08 | – | Push PSW to the stack |
PLP | 28 | CZVN | Pull PSW from the stack |
Jump Instructions
JMP | JSR | RTS | |
---|---|---|---|
Absolute | 4C | 20 | – |
Indirect | 6C | – | – |
Implied | – | – | 60 |
BRK
The BRK instruction is assembled as a single zero byte. When the simulator encounters a zero byte, the program counter is advanced by one and then execution stops. Thus the brk instruction can be used to stop a program altogether, or as a breakpoint where execution of a program can be resumed by pressing the [Run Code] or [Step] buttons again.
The Simulator
The simulator has two main dialogs, the Edit dialog and the Run dialog.
The Edit Dialog
The 6502 code is typed into the text input area on the left. Press the [Compile Assembly] button to assemble the code. The output will then be displayed in the output text area on the right. The code is compiled in 2 passes. On the first pass all of the symbols are resolved and added to the symbol table. On the second pass the instructions are assembled. Apart from character constants everything is case insensitive. The compiler simply converts lower case to upper case.
The code is assembled starting from location $200 in the main memory of the 6502. By default execution starts from this address.
To view the symbol table press the [View Symbols] button. To return to the listing of the assembled code press [View Output].
Press the [Clear Assembly] button to clear both text areas.
To run and debug the code press the [Execute Code] button. This will open up the Run dialog.
The Run Dialog
A listing of the assembled code is displayed on the top left. The last instruction executed is shown on the middle left. The state of the registers is shown on the bottom left. The top right displays the bytes in memory two pages at a time, and also displays the watch. The bottom right is the ASCII string input area.
To run the code, press the [Run Code] button.
To execute the code one instruction at a time, press the [Step] button.
To reset the program counter to it’s default value of $200, press the [Reset PC] button.
To reset the 6502 machine and clear the entire memory contents, press the [Reset Machine] button.
To view the watch, press the [View Watch] button. To stop viewing the watch and display the pages in memory, just select a page from the combo box.
To return to the Edit dialog, press [Edit Code].
The ASCII Input Area
A text string can be entered directly into memory via the ASCII input area on the bottom right of the Run dialog. Simply enter some text and press the [Load ASCII] button. Note that CRLFs will be filtered out.
Four pages in memory are set aside for this purpose (pages 248-251) starting from memory location $F800. After the [Load ASCII] button has been pressed, any ascii data stored within these 4 pages of memory will be displayed in the ASCII input area.
ASCII data can also be entered into the area of memory starting at $F800 via a program, where it will be displayed once the [Run Code] button has been pressed. The code here prints a Hello World message.
; ================================================ ldy #0 LOOP lda MESSAGE,y cmp #0 beq DONE sta $f800,y iny jmp LOOP DONE jmp STOP MESSAGE byt 'H','E','L','L','O',' ' byt 'W','O','R','L','D' STOP brk ; ================================================ |
A string containing a hex number can be entered into memory via the ASCII input area and then converted to a numerical value stored elsewhere by using a program such as this one. Simply enter a hex number and press [Load ASCII]. Then press [Run Code] to run the program. The ascii characters stored at $F800 are converted to a number and stored at location $300.
; ================================================ ; convert the string of ASCII hex digits at ASC ; to a hexadecimal value at HXD ; HXD = big endian byte order ; ================================================ Y_SRC epz 0 Y_DEST epz 1 ASC epz 2 HXD epz 4 ; ================================================ lda #0 sta Y_SRC sta Y_DEST sta ASC sta HXD lda #$F8 sta ASC+1 lda #$03 sta HXD+1 ; ================================================ D1 ; digit 1 ldy Y_SRC lda (ASC),y cmp #0 beq DONE jsr CHAR_TO_HEX cmp #$FF beq NEXT asl asl asl asl ldy Y_DEST sta (HXD),y D2 ; digit 2 inc Y_SRC ldy Y_SRC cpy #0 beq DONE lda (ASC),y cmp #0 beq DONE jsr CHAR_TO_HEX cmp #$FF beq D2 ldy Y_DEST ora (HXD),y sta (HXD),y inc Y_DEST NEXT inc Y_SRC ldy Y_SRC cpy #0 beq DONE jmp D1 DONE jmp THE_END ; ================================================ CHAR_TO_HEX ; convert an ASCII char to a hex value ; 0-9 and A-F cmp #'0' bmi NOT_HEX cmp #'9' bmi ADD_09 beq ADD_09 cmp #'A' bmi NOT_HEX cmp #'F' bmi ADD_AF beq ADD_AF jmp NOT_HEX ADD_09 sec sbc #'0' jmp RET ADD_AF sec sbc #'A' clc adc #10 jmp RET NOT_HEX lda #$FF RET rts ; ================================================ THE_END ; ================================================ |
The Watch
Add an item to the watch by pressing the [Add To Watch] button. A dialog box appears into which is entered either a symbol or hexadecimal address (prefixed by a $) and the number of bytes (up to 256) to include in the watch. So to watch how execution of a program affects the 100 bytes starting from location $600, just enter $600 as the address to watch and 100 as the number of bytes to include in the watch. Press the [View Watch] button to display the addresses currently being watched. Up to 12 addresses can be watched at any one time.
Memory Map
The memory map for the simulator. The memory is organised into 256 pages of 256 bytes each, giving a total of 65536 bytes,
Page 0 | $00 – $FF: zero page memory. |
---|---|
Page 1 | $100 – $1FF: the stack. |
Page 2 | Pages 2-247 contain assembled code and data. By default code is assembled and executed starting from the address $200. |
Page 248 | Pages 248-251 can be used to directly load ASCII strings. |
Page 252 | Pages 252-255 are used by the symbol table. |
Example Programs
Below are some programs I have written and tested for the simulator.
Big Number Bitwise Rotation
Rotate left by one bit. Keep on pressing the [Run Code] button to repeat. The brk instruction here acts as a breakpoint.
; ================================================ ldy #0 lda #1 FILL ; fill the zero page with data sta 0,y iny cpy #0 bne FILL REPEAT ldx #$FF clc php ROTL ; rotate the zero page bytes left by 1 bit ; the bytes are in big endian byte order plp rol 0,x php dex cpx #$ff bne ROTL ; or the carry bit from byte $0 back into byte $FF lda #0 plp adc #0 ora $FF sta $FF brk jmp REPEAT ; ================================================ |
Rotate right by one bit. Keep on pressing the [Run Code] button to repeat. The brk instruction here acts as a breakpoint.
; ================================================ ldy #0 lda #1 FILL ; fill the zero page with data sta 0,y iny cpy #0 bne FILL REPEAT ldx #0 clc php ROTR ; rotate the zero page bytes right by 1 bit ; the bytes are in big endian byte order plp ror 0,x php inx cpx #0 bne ROTR ; or the carry bit from byte $FF back into byte $0 lda #0 plp ror ora 0 sta 0 brk jmp REPEAT ; ================================================ |
CRC-16
CRC-16 long division method. The first 2 bytes of MSG become the CRC.
; ================================================ ; CRC-16 ; ================================================ org 0 ; Polynomial = 80 05 P0 byt $80 P1 byt $05 ; the initial xor INIT byt $ff,$ff ; the final xor FINAL byt 0,0 ; ================================================ org $200 ; xor INIT into the first 2 bytes of MSG lda MSG eor INIT sta MSG lda MSG+1 eor INIT+1 sta MSG+1 CRC ; CRC-16 - long division method ; get the next bit from MSG jsr SHL ; if the bit is zero - do nothing bcc NEXT ; xor the polynomial into the first 2 bytes of MSG lda MSG eor P0 sta MSG lda MSG+1 eor P1 sta MSG+1 NEXT ; decrement the bit and byte counters dec X8 lda X8 cmp #0 bne CRC lda #8 sta X8 dec LEN lda LEN cmp #0 bne CRC ; xor FINAL into the first 2 bytes of MSG lda MSG eor FINAL sta MSG lda MSG+1 eor FINAL+1 sta MSG+1 ; the first 2 bytes of MSG now contain the CRC jmp STOP ; ================================================ SHL ; shift left the message bytes by 1 bit ; shl all 256 bytes starting from MSG ; MSG is in big endian byte order ; so start from the byte at MSG + $FF ldx #$ff clc php ROT plp rol MSG,x php dex cpx #$ff bne ROT ; need the carry bit - so pull PSW from the stack plp rts ; ================================================ ; count the number of bits in each byte X8 byt 8 ; the length of the input message (max = 256 bytes) LEN byt 16 ; ================================================ STOP brk ; ================================================ org $300 ; the input message - can be up to 256 bytes MSG byt $1,$2,$3,$4,$5,$6,$7,$8 byt $9,$0,$A,$B,$C,$D,$E,$F ; ================================================ |
CRC-16 table method. The 2 byte result is stored at zero page location 0.
; ================================================ ; CRC-16 - Table Method ; ================================================ CRC epz 0 INIT epz 16 FINAL epz 32 BIT epz 48 BYTES epz 49 T_ENTRY epz 52 ; ================================================ ; for Y = $0 to Y = $FF ldy #0 FILL_TABLE ; generate a 256 element table ; the low byte for each entry is stored from $600-$6FF ; the high byte for each entry is stored from $700-$7FF ; load the value of Y into the lower byte sty BYTES ; load zero into the higher byte lda #0 sta BYTES+1 ; count the bits lda #8 sta BIT XOR_P ; get the next bit asl BYTES+1 rol BYTES bcc NEXT ; xor the polynomial into BYTES, if the bit is set lda #$80 eor BYTES sta BYTES lda #$05 eor BYTES+1 sta BYTES+1 NEXT ; for each bit in BYTES dec BIT lda BIT cmp #0 bne XOR_P ; store the lower byte at $600 + Y lda BYTES sta $600,y ; store the higher byte at $700 + Y lda BYTES+1 sta $700,y ; get the next Y iny cpy #0 bne FILL_TABLE ; ================================================ ; set the value of the initial xor ($FFFF) lda #$ff sta INIT sta INIT+1 ; set the value of the final xor ($0000) lda #0 sta FINAL sta FINAL+1 ; ================================================ ; load the first 2 bytes of MSG into CRC lda MSG sta CRC lda MSG+1 sta CRC+1 ; xor INIT into the 2 bytes of CRC lda INIT eor CRC sta CRC lda INIT+1 eor CRC+1 sta CRC+1 ldy #2 CALC_CRC ; the first byte of CRC is the index to the table ldx CRC ; get the table entry lda $600,x sta T_ENTRY lda $700,x sta T_ENTRY+1 ; shift out the first CRC byte ; copy up the next CRC byte ; shift in the next MSG byte lda CRC+1 sta CRC lda MSG,y sta CRC+1 ; xor in the table entry lda CRC eor T_ENTRY sta CRC lda CRC+1 eor T_ENTRY+1 sta CRC+1 iny cpy LEN bne CALC_CRC PAD_1 ; the first padding (zero) byte ldx CRC lda $600,x sta T_ENTRY lda $700,x sta T_ENTRY+1 lda CRC+1 sta CRC lda #0 sta CRC+1 lda CRC eor T_ENTRY sta CRC lda CRC+1 eor T_ENTRY+1 sta CRC+1 PAD_2 ; the second padding (zero) byte ldx CRC lda $600,x sta T_ENTRY lda $700,x sta T_ENTRY+1 lda CRC+1 sta CRC lda #0 sta CRC+1 lda CRC eor T_ENTRY sta CRC lda CRC+1 eor T_ENTRY+1 sta CRC+1 ; xor FINAL into the 2 bytes of CRC lda FINAL eor CRC sta CRC lda FINAL+1 eor CRC+1 sta CRC+1 jmp STOP ; ================================================ ; the length of the input message (bytes) LEN byt 16 org $300 ; the input message - can be up to 256 bytes MSG byt $1,$2,$3,$4,$5,$6,$7,$8 byt $9,$0,$A,$B,$C,$D,$E,$F ; ================================================ STOP brk ; ================================================ |
CRC-32
CRC-32 long division method. The first 4 bytes of MSG become the CRC.
; ================================================ ; CRC-32 ; ================================================ org 0 ; Polynomial = 04 C1 1D B7 P0 byt $04 P1 byt $C1 P2 byt $1D P3 byt $B7 ; the initial xor INIT byt $ff,$ff,$ff,$ff ; the final xor FINAL byt $ff,$ff,$ff,$ff ; ================================================ org $200 ; xor INIT into the first 4 bytes of MSG lda MSG eor INIT sta MSG lda MSG+1 eor INIT+1 sta MSG+1 lda MSG+2 eor INIT+2 sta MSG+2 lda MSG+3 eor INIT+3 sta MSG+3 CRC ; CRC-32 - long division method ; get the next bit from MSG jsr SHL ; if the bit is zero - do nothing bcc NEXT ; xor the polynomial into the first 4 bytes of MSG lda MSG eor P0 sta MSG lda MSG+1 eor P1 sta MSG+1 lda MSG+2 eor P2 sta MSG+2 lda MSG+3 eor P3 sta MSG+3 NEXT ; decrement the bit and byte counters dec X8 lda X8 cmp #0 bne CRC lda #8 sta X8 dec LEN lda LEN cmp #0 bne CRC ; xor FINAL into the first 4 bytes of MSG lda MSG eor FINAL sta MSG lda MSG+1 eor FINAL+1 sta MSG+1 lda MSG+2 eor FINAL+2 sta MSG+2 lda MSG+3 eor FINAL+3 sta MSG+3 ; the first 4 bytes of MSG now contain the CRC jmp STOP ; ================================================ SHL ; shift left the message bytes by 1 bit ; shl all 256 bytes starting from MSG ; MSG is in big endian byte order ; so start from the byte at MSG + $FF ldx #$ff clc php ROT plp rol MSG,x php dex cpx #$ff bne ROT ; need the carry bit - so pull PSW from the stack plp rts ; ================================================ ; count the number of bits in each byte X8 byt 8 ; the length of the input message (max = 256 bytes) LEN byt 4 ; ================================================ STOP brk ; ================================================ org $300 ; the input message - can be up to 256 bytes MSG byt $31,$32,$33,$34 ; ================================================ |
Random Numbers
Use the LCG to generate a 256 byte random number.
; ================================================ ; Linear Congruential Generator ; X(n+1) = (aX(n) + c) mod m ; m = 1 00 00 00 00 ; a = 00 03 43 FD ; c = 00 26 9E C3 ; X(0) = 12 34 56 78 ; ================================================ ; use the LCG to randomly fill the ; 256 bytes starting at $300 ; ================================================ DWORD_1 epz 0 DWORD_2 epz 4 RESULT epz 8 RND equ $300 ; ================================================ ; set DWORD_1 equal to a = 00 03 43 FD lda #$FD sta DWORD_1 lda #$43 sta DWORD_1+1 lda #$03 sta DWORD_1+2 lda #$00 sta DWORD_1+3 ; set DWORD_2 equal to X(0) = 12 34 56 78 lda #$78 sta DWORD_2 lda #$56 sta DWORD_2+1 lda #$34 sta DWORD_2+2 lda #$12 sta DWORD_2+3 ; Y is the offset from RND = $300 ldy #0 REPEAT ; calculate: aX(n) jsr MULT ; add c = 00 26 9E C3 lda RESULT clc adc #$C3 sta DWORD_2 lda RESULT+1 adc #$9E sta DWORD_2+1 lda RESULT+2 adc #$26 sta DWORD_2+2 lda RESULT+3 adc #0 sta DWORD_2+3 ; store the result at RND + Y lda RESULT sta RND,y iny lda RESULT+1 sta RND,y iny lda RESULT+2 sta RND,y iny lda RESULT+3 sta RND,y iny cpy #0 bne REPEAT brk ; ================================================ MULT ; multiply two 4 byte values mod m = 1 00 00 00 00 ; X counts the bits ldx #0 ; initialise RESULT = 0 lda #0 sta RESULT sta RESULT+1 sta RESULT+2 sta RESULT+3 LOOP ; left shift RESULT asl RESULT rol RESULT+1 rol RESULT+2 rol RESULT+3 ; left shift DWORD_2 to set the carry bit asl DWORD_2 rol DWORD_2+1 rol DWORD_2+2 rol DWORD_2+3 bcc NEXT ; if the carry bit is set, add DWORD_1 to RESULT lda DWORD_1 clc adc RESULT sta RESULT lda DWORD_1+1 adc RESULT+1 sta RESULT+1 lda DWORD_1+2 adc RESULT+2 sta RESULT+2 lda DWORD_1+3 adc RESULT+3 sta RESULT+3 NEXT ; for each of the 32 bits in DWORD_2 inx cpx #32 bne LOOP rts ; ================================================ |
Big Number Subtraction
Two 256 byte random numbers are generated: RND_1 and RND_2. Then RND_1 is subtracted from RND_2. The subtraction is done in two different ways:
(1) SUBTRACT uses SBC to compute RND_2,x – RND_1,x for each of the 256 bytes.
(2) ADD_NOT uses ADC to compute RND_2,x + NOT(RND_1),x for each of the 256 bytes.
Note that for 6502 subtraction the carry bit must be initially set (SEC).
; ================================================ ; use the linear congruential generator to ; create two 256 byte random numbers, and ; then subtract one from the other ; ================================================ DWORD_1 epz 0 DWORD_2 epz 4 RESULT epz 8 TEMP epz 12 COUNT epz 14 RND_1 equ $400 RND_2 equ $500 DIFF equ $600 INV_SUM equ $700 ; ================================================ ; use the LCG to randomly fill the ; 256 bytes starting at $400 ; ================================================ ; set DWORD_1 equal to a = 00 03 43 FD lda #$FD sta DWORD_1 lda #$43 sta DWORD_1+1 lda #$03 sta DWORD_1+2 lda #$00 sta DWORD_1+3 ; set DWORD_2 equal to X(0) = 12 34 56 78 lda #$78 sta DWORD_2 lda #$56 sta DWORD_2+1 lda #$34 sta DWORD_2+2 lda #$12 sta DWORD_2+3 ldy #0 LOOP_1 ; calculate: aX(n) jsr MULT ; add c = 00 26 9E C3 lda RESULT clc adc #$C3 sta DWORD_2 lda RESULT+1 adc #$9E sta DWORD_2+1 lda RESULT+2 adc #$26 sta DWORD_2+2 lda RESULT+3 adc #0 sta DWORD_2+3 ; store the result at RND_1 + Y lda RESULT sta RND_1,y iny lda RESULT+1 sta RND_1,y iny lda RESULT+2 sta RND_1,y iny lda RESULT+3 sta RND_1,y iny cpy #0 bne LOOP_1 ; ================================================ ; use the LCG to randomly fill the ; 256 bytes starting at $500 ; ================================================ ldy #0 LOOP_2 ; calculate: aX(n) jsr MULT ; add c = 00 26 9E C3 lda RESULT clc adc #$C3 sta DWORD_2 lda RESULT+1 adc #$9E sta DWORD_2+1 lda RESULT+2 adc #$26 sta DWORD_2+2 lda RESULT+3 adc #0 sta DWORD_2+3 ; store the result at RND_2 + Y lda RESULT sta RND_2,y iny lda RESULT+1 sta RND_2,y iny lda RESULT+2 sta RND_2,y iny lda RESULT+3 sta RND_2,y iny cpy #0 bne LOOP_2 ; ================================================ ; subtract RND_1 from RND_2 jsr SUBTRACT ; ================================================ ; add NOT(RND_1) to RND_2 jsr ADD_NOT ; ================================================ STOP brk ; ================================================ MULT ; multiply two 4 byte values mod m = 1 00 00 00 00 ldx #0 ; initialise RESULT = 0 lda #0 sta RESULT sta RESULT+1 sta RESULT+2 sta RESULT+3 LOOP ; left shift RESULT asl RESULT rol RESULT+1 rol RESULT+2 rol RESULT+3 ; left shift DWORD_2 to set the carry bit asl DWORD_2 rol DWORD_2+1 rol DWORD_2+2 rol DWORD_2+3 bcc NEXT ; if the carry bit is set, add DWORD_1 to RESULT lda DWORD_1 clc adc RESULT sta RESULT lda DWORD_1+1 adc RESULT+1 sta RESULT+1 lda DWORD_1+2 adc RESULT+2 sta RESULT+2 lda DWORD_1+3 adc RESULT+3 sta RESULT+3 NEXT ; for each of the 32 bits in DWORD_2 inx cpx #32 bne LOOP rts ; ================================================ SUBTRACT ; DIFF = RND_2 - RND_1 ; big endian byte order ldx #$FF sec php S_LOOP lda RND_2,x plp sbc RND_1,x php sta DIFF,x dex cpx #$FF bne S_LOOP plp rts ; ================================================ ADD_NOT ; INV_SUM = RND_2 + NOT(RND_1) ; big endian byte order ldx #$FF sec php I_LOOP lda RND_1,x eor #$FF plp adc RND_2,x php sta INV_SUM,x dex cpx #$FF bne I_LOOP plp rts ; ================================================ |
Big Number Multiply
Two 256 byte random numbers are generated: RND_1 and RND_2. They are then multiplied together and the 512 byte result stored at OUTPUT_1 and OUTPUT_2.
; ================================================ ; use the linear congruential generator to ; create two 256 byte random numbers, and ; then multiply them together ; ================================================ DWORD_1 epz 0 DWORD_2 epz 4 RESULT epz 8 TEMP epz 12 COUNT epz 14 RND_1 equ $400 RND_2 equ $500 OUTPUT_1 equ $600 OUTPUT_2 equ $700 ; ================================================ ; use the LCG to randomly fill the ; 256 bytes starting at $400 ; ================================================ ; set DWORD_1 equal to a = 00 03 43 FD lda #$FD sta DWORD_1 lda #$43 sta DWORD_1+1 lda #$03 sta DWORD_1+2 lda #$00 sta DWORD_1+3 ; set DWORD_2 equal to X(0) = 12 34 56 78 lda #$78 sta DWORD_2 lda #$56 sta DWORD_2+1 lda #$34 sta DWORD_2+2 lda #$12 sta DWORD_2+3 ldy #0 LOOP_1 ; calculate: aX(n) jsr MULT ; add c = 00 26 9E C3 lda RESULT clc adc #$C3 sta DWORD_2 lda RESULT+1 adc #$9E sta DWORD_2+1 lda RESULT+2 adc #$26 sta DWORD_2+2 lda RESULT+3 adc #0 sta DWORD_2+3 ; store the result at RND_1 + Y lda RESULT sta RND_1,y iny lda RESULT+1 sta RND_1,y iny lda RESULT+2 sta RND_1,y iny lda RESULT+3 sta RND_1,y iny cpy #0 bne LOOP_1 ; ================================================ ; use the LCG to randomly fill the ; 256 bytes starting at $500 ; ================================================ ldy #0 LOOP_2 ; calculate: aX(n) jsr MULT ; add c = 00 26 9E C3 lda RESULT clc adc #$C3 sta DWORD_2 lda RESULT+1 adc #$9E sta DWORD_2+1 lda RESULT+2 adc #$26 sta DWORD_2+2 lda RESULT+3 adc #0 sta DWORD_2+3 ; store the result at RND_2 + Y lda RESULT sta RND_2,y iny lda RESULT+1 sta RND_2,y iny lda RESULT+2 sta RND_2,y iny lda RESULT+3 sta RND_2,y iny cpy #0 bne LOOP_2 ; ================================================ ; now multiply RND_1 by RND_2 jsr BIG_MULT ; ================================================ STOP brk ; ================================================ MULT ; multiply two 4 byte values mod m = 1 00 00 00 00 ldx #0 ; initialise RESULT = 0 lda #0 sta RESULT sta RESULT+1 sta RESULT+2 sta RESULT+3 LOOP ; left shift RESULT asl RESULT rol RESULT+1 rol RESULT+2 rol RESULT+3 ; left shift DWORD_2 to set the carry bit asl DWORD_2 rol DWORD_2+1 rol DWORD_2+2 rol DWORD_2+3 bcc NEXT ; if the carry bit is set, add DWORD_1 to RESULT lda DWORD_1 clc adc RESULT sta RESULT lda DWORD_1+1 adc RESULT+1 sta RESULT+1 lda DWORD_1+2 adc RESULT+2 sta RESULT+2 lda DWORD_1+3 adc RESULT+3 sta RESULT+3 NEXT ; for each of the 32 bits in DWORD_2 inx cpx #32 bne LOOP rts ; ================================================ BIG_MULT ; multiply RND_1 by RND_2 ; store the result in |OUTPUT_1|OUTPUT_2| ldx #0 stx COUNT LP_1 ldy #0 lda RND_2,x sta TEMP LP_2 jsr SHL asl TEMP bcc NEXT_BIT jsr ADD NEXT_BIT iny cpy #8 bne LP_2 inc COUNT ldx COUNT cpx #0 bne LP_1 rts ; ================================================ ADD ; add |RND_1| to |OUTPUT_1|OUTPUT_2| ; big endian byte order ldx #$FF clc php L1 ; add RND_1 to OUTPUT_2 lda RND_1,x plp adc OUTPUT_2,x php sta OUTPUT_2,x dex cpx #$FF bne L1 L2 ; add the carry bit to OUTPUT_1 plp bcc DONE lda #0 adc OUTPUT_1,x php sta OUTPUT_1,x dex cpx #$FF bne L2 DONE rts ; ================================================ SHL ; rotate the bytes left by 1 bit ; the bytes are in big endian byte order ldx #$FF clc php ROTL_2 ; the 256 bytes starting at OUTPUT_2 plp rol OUTPUT_2,x php dex cpx #$FF bne ROTL_2 ROTL_1 ; the 256 bytes starting at OUTPUT_1 plp rol OUTPUT_1,x php dex cpx #$FF bne ROTL_1 ; get the carry bit from (OUTPUT_1 + 0) lda #0 plp adc #0 ; or the carry bit into (OUTPUT_2 + FF) ora OUTPUT_2+$FF sta OUTPUT_2+$FF rts ; ================================================ |
Serpent-1 Encryption
A 32 byte key is used to generate a 560 byte key schedule. Pointers are used to access the key schedule because it is bigger than 256 bytes. The 16 byte input plain text is stored at zero page location 32. The 32 byte key is stored at zero page location 80. The 16 byte output cipher text is written to zero page location 48 once the program has finished execution.
; ================================================ ; Serpent-1 Encryption ; ================================================ ; pointers to the sub-key array PTR_KS_0 epz 0 PTR_KS_1 epz 2 PTR_KS_2 epz 4 PTR_KS_3 epz 6 PTR_KS_4 epz 8 INDEX epz 10 TEMP_W epz 12 ROUND epz 16 ; the plain text (16 bytes) P_TEXT epz 32 ; the cipher text (16 bytes) C_TEXT epz 48 ; the key (32 bytes) THE_KEY epz 80 ; ================================================ ; initialise the plain text org P_TEXT byt $00,$11,$22,$33 byt $44,$55,$66,$77 byt $88,$99,$AA,$BB byt $CC,$DD,$EE,$FF ; ================================================ ; initialise the key org THE_KEY byt $12,$34,$56,$78 byt $90,$AB,$CD,$EF byt $12,$34,$56,$78 byt $90,$AB,$CD,$EF byt $12,$34,$56,$78 byt $90,$AB,$CD,$EF byt $12,$34,$56,$78 byt $90,$AB,$CD,$EF ; ================================================ org $200 ldy #0 COPY_K ; copy the 32 byte key to the first ; 32 bytes of the key schedule lda THE_KEY,y sta KEY_SCHEDULE,y iny cpy #32 bne COPY_K ; generate the key schedule jsr GENERATE_KS jsr SUB_KS ldy #0 COPY_P ; copy the 16 byte plain text to ; |WORD_0|WORD_1|WORD_2|WORD_3| lda P_TEXT,y sta WORD_0,y iny cpy #16 bne COPY_P ; do the encryption jsr ENCRYPT ldy #0 COPY_C ; copy |WORD_0|WORD_1|WORD_2|WORD_3| ; back to the cipher text lda WORD_0,y sta C_TEXT,y iny cpy #16 bne COPY_C jmp STOP ; ================================================ ENCRYPT ; encrypt the message in: ; |WORD_0|WORD_1|WORD_2|WORD_3| ; initialise the pointers to the sub-keys lda #KEY_SCHEDULE+32 sta PTR_KS_0 lda /KEY_SCHEDULE+32 sta PTR_KS_0+1 lda #KEY_SCHEDULE+36 sta PTR_KS_1 lda /KEY_SCHEDULE+36 sta PTR_KS_1+1 lda #KEY_SCHEDULE+40 sta PTR_KS_2 lda /KEY_SCHEDULE+40 sta PTR_KS_2+1 lda #KEY_SCHEDULE+44 sta PTR_KS_3 lda /KEY_SCHEDULE+44 sta PTR_KS_3+1 ; 32 rounds - ROUND = 0 to 31 lda #0 sta ROUND LOOP ; xor the current sub-keys into the message jsr SUBKEY_XOR ; s-box index = ROUND mod 8 (SHL 4 bits) lda ROUND asl asl asl asl and #$70 sta S_INDEX ; apply the s-box jsr CALC_S ; skip the linear transform for the last round lda ROUND cmp #31 beq LAST ; apply the linear transform jsr LIN_TRANS ; next round inc ROUND jmp LOOP LAST ; do the final xor of the sub-key jsr SUBKEY_XOR rts ; ================================================ SUBKEY_XOR ; xor the current 16 byte sub-key into the text ldy #0 ; the first byte of each word lda (PTR_KS_0),y eor WORD_0 sta WORD_0 lda (PTR_KS_1),y eor WORD_1 sta WORD_1 lda (PTR_KS_2),y eor WORD_2 sta WORD_2 lda (PTR_KS_3),y eor WORD_3 sta WORD_3 iny ; the second byte of each word lda (PTR_KS_0),y eor WORD_0+1 sta WORD_0+1 lda (PTR_KS_1),y eor WORD_1+1 sta WORD_1+1 lda (PTR_KS_2),y eor WORD_2+1 sta WORD_2+1 lda (PTR_KS_3),y eor WORD_3+1 sta WORD_3+1 iny ; the third byte of each word lda (PTR_KS_0),y eor WORD_0+2 sta WORD_0+2 lda (PTR_KS_1),y eor WORD_1+2 sta WORD_1+2 lda (PTR_KS_2),y eor WORD_2+2 sta WORD_2+2 lda (PTR_KS_3),y eor WORD_3+2 sta WORD_3+2 iny ; the fourth byte of each word lda (PTR_KS_0),y eor WORD_0+3 sta WORD_0+3 lda (PTR_KS_1),y eor WORD_1+3 sta WORD_1+3 lda (PTR_KS_2),y eor WORD_2+3 sta WORD_2+3 lda (PTR_KS_3),y eor WORD_3+3 sta WORD_3+3 ; increment the pointers to the sub-key lda PTR_KS_0 clc adc #16 sta PTR_KS_0 lda #0 adc PTR_KS_0+1 sta PTR_KS_0+1 lda PTR_KS_1 clc adc #16 sta PTR_KS_1 lda #0 adc PTR_KS_1+1 sta PTR_KS_1+1 lda PTR_KS_2 clc adc #16 sta PTR_KS_2 lda #0 adc PTR_KS_2+1 sta PTR_KS_2+1 lda PTR_KS_3 clc adc #16 sta PTR_KS_3 lda #0 adc PTR_KS_3+1 sta PTR_KS_3+1 rts ; ================================================ LIN_TRANS ; the linear transformation ; LT: |WORD_0|WORD_1|WORD_2|WORD_3| ; w0 = ROL(w0,13) ldx #0 ROL_13 asl WORD_0 rol WORD_0+1 rol WORD_0+2 rol WORD_0+3 php pla and #1 ora WORD_0 sta WORD_0 inx cpx #13 bne ROL_13 ; w2 = ROL(w2,3) ldx #0 ROL_3 asl WORD_2 rol WORD_2+1 rol WORD_2+2 rol WORD_2+3 php pla and #1 ora WORD_2 sta WORD_2 inx cpx #3 bne ROL_3 ; w1 = w1 xor w0 xor w2 lda WORD_1 eor WORD_0 eor WORD_2 sta WORD_1 lda WORD_1+1 eor WORD_0+1 eor WORD_2+1 sta WORD_1+1 lda WORD_1+2 eor WORD_0+2 eor WORD_2+2 sta WORD_1+2 lda WORD_1+3 eor WORD_0+3 eor WORD_2+3 sta WORD_1+3 ; w3 = w3 xor w2 xor SHL(w0,3) lda WORD_0 sta TEMP_W lda WORD_0+1 sta TEMP_W+1 lda WORD_0+2 sta TEMP_W+2 lda WORD_0+3 sta TEMP_W+3 ldx #0 SHL_3 asl TEMP_W rol TEMP_W+1 rol TEMP_W+2 rol TEMP_W+3 inx cpx #3 bne SHL_3 lda WORD_3 eor WORD_2 eor TEMP_W sta WORD_3 lda WORD_3+1 eor WORD_2+1 eor TEMP_W+1 sta WORD_3+1 lda WORD_3+2 eor WORD_2+2 eor TEMP_W+2 sta WORD_3+2 lda WORD_3+3 eor WORD_2+3 eor TEMP_W+3 sta WORD_3+3 ; w1 = ROL(w1,1) ROL_1 asl WORD_1 rol WORD_1+1 rol WORD_1+2 rol WORD_1+3 php pla and #1 ora WORD_1 sta WORD_1 ; w3 = ROL(w3,7) ldx #0 ROL_7 asl WORD_3 rol WORD_3+1 rol WORD_3+2 rol WORD_3+3 php pla and #1 ora WORD_3 sta WORD_3 inx cpx #7 bne ROL_7 ; w0 = w0 xor w1 xor w3 lda WORD_0 eor WORD_1 eor WORD_3 sta WORD_0 lda WORD_0+1 eor WORD_1+1 eor WORD_3+1 sta WORD_0+1 lda WORD_0+2 eor WORD_1+2 eor WORD_3+2 sta WORD_0+2 lda WORD_0+3 eor WORD_1+3 eor WORD_3+3 sta WORD_0+3 ; w2 = w2 xor w3 xor SHL(w1,7) lda WORD_1 sta TEMP_W lda WORD_1+1 sta TEMP_W+1 lda WORD_1+2 sta TEMP_W+2 lda WORD_1+3 sta TEMP_W+3 ldx #0 SHL_7 asl TEMP_W rol TEMP_W+1 rol TEMP_W+2 rol TEMP_W+3 inx cpx #7 bne SHL_7 lda WORD_2 eor WORD_3 eor TEMP_W sta WORD_2 lda WORD_2+1 eor WORD_3+1 eor TEMP_W+1 sta WORD_2+1 lda WORD_2+2 eor WORD_3+2 eor TEMP_W+2 sta WORD_2+2 lda WORD_2+3 eor WORD_3+3 eor TEMP_W+3 sta WORD_2+3 ; w0 = ROL(w0,5) ldx #0 ROL_5 asl WORD_0 rol WORD_0+1 rol WORD_0+2 rol WORD_0+3 php pla and #1 ora WORD_0 sta WORD_0 inx cpx #5 bne ROL_5 ; w2 = ROL(w2,22) ldx #0 ROL_22 asl WORD_2 rol WORD_2+1 rol WORD_2+2 rol WORD_2+3 php pla and #1 ora WORD_2 sta WORD_2 inx cpx #22 bne ROL_22 rts ; ================================================ CALC_S ; apply the s-box to the 16 byte input block ; |WORD_0|WORD_1|WORD_2|WORD_3| TO_BITSLICE ; break up the input block into 32 x 4-bit bitslices ldx #0 BYTE_1 asl WORD_3 rol BIT_SLICE,x asl WORD_2 rol BIT_SLICE,x asl WORD_1 rol BIT_SLICE,x asl WORD_0 rol BIT_SLICE,x inx cpx #8 bne BYTE_1 BYTE_2 asl WORD_3+1 rol BIT_SLICE,x asl WORD_2+1 rol BIT_SLICE,x asl WORD_1+1 rol BIT_SLICE,x asl WORD_0+1 rol BIT_SLICE,x inx cpx #16 bne BYTE_2 BYTE_3 asl WORD_3+2 rol BIT_SLICE,x asl WORD_2+2 rol BIT_SLICE,x asl WORD_1+2 rol BIT_SLICE,x asl WORD_0+2 rol BIT_SLICE,x inx cpx #24 bne BYTE_3 BYTE_4 asl WORD_3+3 rol BIT_SLICE,x asl WORD_2+3 rol BIT_SLICE,x asl WORD_1+3 rol BIT_SLICE,x asl WORD_0+3 rol BIT_SLICE,x inx cpx #32 bne BYTE_4 ldx #0 SUB ; put each of the 32 bitslices through the s-box lda BIT_SLICE,x ora S_INDEX tay lda S_BOX,y sta BIT_SLICE,x inx cpx #32 bne SUB TO_WORD ; put the results back into the input block ldx #0 WD_1 lsr BIT_SLICE,x rol WORD_0 lsr BIT_SLICE,x rol WORD_1 lsr BIT_SLICE,x rol WORD_2 lsr BIT_SLICE,x rol WORD_3 inx cpx #8 bne WD_1 WD_2 lsr BIT_SLICE,x rol WORD_0+1 lsr BIT_SLICE,x rol WORD_1+1 lsr BIT_SLICE,x rol WORD_2+1 lsr BIT_SLICE,x rol WORD_3+1 inx cpx #16 bne WD_2 WD_3 lsr BIT_SLICE,x rol WORD_0+2 lsr BIT_SLICE,x rol WORD_1+2 lsr BIT_SLICE,x rol WORD_2+2 lsr BIT_SLICE,x rol WORD_3+2 inx cpx #24 bne WD_3 WD_4 lsr BIT_SLICE,x rol WORD_0+3 lsr BIT_SLICE,x rol WORD_1+3 lsr BIT_SLICE,x rol WORD_2+3 lsr BIT_SLICE,x rol WORD_3+3 inx cpx #32 bne WD_4 rts ; ================================================ GENERATE_KS ; generate the key schedule from the 32 byte key ; initialise the pointers to the 4 byte words ; within the key schedule ; Key[i-8] lda #KEY_SCHEDULE sta PTR_KS_4 lda /KEY_SCHEDULE sta PTR_KS_4+1 ; Key[i-5] lda #KEY_SCHEDULE+12 sta PTR_KS_3 lda /KEY_SCHEDULE+12 sta PTR_KS_3+1 ; Key[i-3] lda #KEY_SCHEDULE+20 sta PTR_KS_2 lda /KEY_SCHEDULE+20 sta PTR_KS_2+1 ; Key[i-1] lda #KEY_SCHEDULE+28 sta PTR_KS_1 lda /KEY_SCHEDULE+28 sta PTR_KS_1+1 ; Key[i] lda #KEY_SCHEDULE+32 sta PTR_KS_0 lda /KEY_SCHEDULE+32 sta PTR_KS_0+1 ; FOR INDEX = 0 TO INDEX = 131 lda #0 sta INDEX NEXT_KS ldy #0 lda (PTR_KS_4),y eor (PTR_KS_3),y eor (PTR_KS_2),y eor (PTR_KS_1),y eor #$b9 eor INDEX sta (PTR_KS_0),y iny lda (PTR_KS_4),y eor (PTR_KS_3),y eor (PTR_KS_2),y eor (PTR_KS_1),y eor #$79 sta (PTR_KS_0),y iny lda (PTR_KS_4),y eor (PTR_KS_3),y eor (PTR_KS_2),y eor (PTR_KS_1),y eor #$37 sta (PTR_KS_0),y iny lda (PTR_KS_4),y eor (PTR_KS_3),y eor (PTR_KS_2),y eor (PTR_KS_1),y eor #$9e sta (PTR_KS_0),y ldx #0 ROL_11 ldy #0 ; ROTL Key[i] by 11 bits lda (PTR_KS_0),y asl sta (PTR_KS_0),y iny lda (PTR_KS_0),y rol sta (PTR_KS_0),y iny lda (PTR_KS_0),y rol sta (PTR_KS_0),y iny lda (PTR_KS_0),y rol sta (PTR_KS_0),y ldy #0 php pla and #1 ora (PTR_KS_0),y sta (PTR_KS_0),y inx cpx #11 bne ROL_11 ; increment the pointers to the 4 byte ; words within the key schedule ; Key[i-8] lda PTR_KS_4 clc adc #4 sta PTR_KS_4 lda #0 adc PTR_KS_4+1 sta PTR_KS_4+1 ; Key[i-5] lda PTR_KS_3 clc adc #4 sta PTR_KS_3 lda #0 adc PTR_KS_3+1 sta PTR_KS_3+1 ; Key[i-3] lda PTR_KS_2 clc adc #4 sta PTR_KS_2 lda #0 adc PTR_KS_2+1 sta PTR_KS_2+1 ; Key[i-1] lda PTR_KS_1 clc adc #4 sta PTR_KS_1 lda #0 adc PTR_KS_1+1 sta PTR_KS_1+1 ; Key[i] lda PTR_KS_0 clc adc #4 sta PTR_KS_0 lda #0 adc PTR_KS_0+1 sta PTR_KS_0+1 ; increment the index counter ; exit if index = 132 inc INDEX lda INDEX eor #132 beq DONE jmp NEXT_KS DONE rts ; ================================================ SUB_KS ; apply the s-box to each 16 byte ; sub-key within the key schedule ; initialise the pointers to each ; 4 byte word within the sub-key lda #KEY_SCHEDULE+32 sta PTR_KS_0 lda /KEY_SCHEDULE+32 sta PTR_KS_0+1 lda #KEY_SCHEDULE+36 sta PTR_KS_1 lda /KEY_SCHEDULE+36 sta PTR_KS_1+1 lda #KEY_SCHEDULE+40 sta PTR_KS_2 lda /KEY_SCHEDULE+40 sta PTR_KS_2+1 lda #KEY_SCHEDULE+44 sta PTR_KS_3 lda /KEY_SCHEDULE+44 sta PTR_KS_3+1 ; the s-box index is initialised to 3 lda #$30 sta S_INDEX ; FOR INDEX = 0 TO INDEX = 32 (33 sub-keys) lda #0 sta INDEX NEXT_BLOCK ; copy the 16 byte sub-key to ; |WORD_0|WORD_1|WORD_2|WORD_3| ldy #0 lda (PTR_KS_0),y sta WORD_0 lda (PTR_KS_1),y sta WORD_1 lda (PTR_KS_2),y sta WORD_2 lda (PTR_KS_3),y sta WORD_3 iny lda (PTR_KS_0),y sta WORD_0,y lda (PTR_KS_1),y sta WORD_1,y lda (PTR_KS_2),y sta WORD_2,y lda (PTR_KS_3),y sta WORD_3,y iny lda (PTR_KS_0),y sta WORD_0,y lda (PTR_KS_1),y sta WORD_1,y lda (PTR_KS_2),y sta WORD_2,y lda (PTR_KS_3),y sta WORD_3,y iny lda (PTR_KS_0),y sta WORD_0,y lda (PTR_KS_1),y sta WORD_1,y lda (PTR_KS_2),y sta WORD_2,y lda (PTR_KS_3),y sta WORD_3,y ; apply the s-box jsr CALC_S ; the result is in: |WORD_0|WORD_1|WORD_2|WORD_3| ; copy back to the sub-key ldy #0 lda WORD_0 sta (PTR_KS_0),y lda WORD_1 sta (PTR_KS_1),y lda WORD_2 sta (PTR_KS_2),y lda WORD_3 sta (PTR_KS_3),y iny lda WORD_0,y sta (PTR_KS_0),y lda WORD_1,y sta (PTR_KS_1),y lda WORD_2,y sta (PTR_KS_2),y lda WORD_3,y sta (PTR_KS_3),y iny lda WORD_0,y sta (PTR_KS_0),y lda WORD_1,y sta (PTR_KS_1),y lda WORD_2,y sta (PTR_KS_2),y lda WORD_3,y sta (PTR_KS_3),y iny lda WORD_0,y sta (PTR_KS_0),y lda WORD_1,y sta (PTR_KS_1),y lda WORD_2,y sta (PTR_KS_2),y lda WORD_3,y sta (PTR_KS_3),y ; update the s-box index lda S_INDEX cmp #0 bne SKIP lda #$70 jmp INC_BLOCK SKIP sec sbc #$10 INC_BLOCK sta S_INDEX ; increment to the next 16 byte sub-key lda PTR_KS_0 clc adc #16 sta PTR_KS_0 lda #0 adc PTR_KS_0+1 sta PTR_KS_0+1 lda PTR_KS_1 clc adc #16 sta PTR_KS_1 lda #0 adc PTR_KS_1+1 sta PTR_KS_1+1 lda PTR_KS_2 clc adc #16 sta PTR_KS_2 lda #0 adc PTR_KS_2+1 sta PTR_KS_2+1 lda PTR_KS_3 clc adc #16 sta PTR_KS_3 lda #0 adc PTR_KS_3+1 sta PTR_KS_3+1 ; update the counter inc INDEX lda INDEX eor #33 beq QUIT jmp NEXT_BLOCK QUIT rts ; ================================================ STOP brk ; ================================================ org $800 ; the block containing the 16 byte input message ; the encryption, linear transform and s-box ; calculations are directly applied to this block WORD_0 byt 0,0,0,0 WORD_1 byt 0,0,0,0 WORD_2 byt 0,0,0,0 WORD_3 byt 0,0,0,0 ; the array of the 32 x 4-bit bitslices BIT_SLICE dfs 32,0 ; the upper 4-bits of s_index is the s-box number (0-7) ; this is ORed into a bitslice to get an offset ; from the start of the s-box array S_INDEX byt 0 ; the 8 s-box arrays S_BOX byt 3,8,15,1,10,6,5,11,14,13,4,2,7,0,9,12 byt 15,12,2,7,9,0,5,10,1,11,14,8,6,13,3,4 byt 8,6,7,9,3,12,10,15,13,1,14,4,0,11,5,2 byt 0,15,11,8,12,9,6,3,13,1,2,4,10,7,5,14 byt 1,15,8,3,12,0,11,6,2,5,4,10,9,14,7,13 byt 15,5,2,11,4,10,9,12,0,3,14,8,13,6,7,1 byt 7,2,12,5,8,4,6,11,14,9,1,15,13,3,10,0 byt 1,13,15,0,14,8,2,11,7,4,12,10,9,3,5,6 ; ================================================ ; the 560 byte key schedule ; the first 32 bytes are the actual key ; the next 528 bytes are the 33 x 16 byte sub-keys org $900 KEY_SCHEDULE dfs 560 ; ================================================ |
GOST 28147-89 ECB
The 32 byte key stored at KDS is used to encrypt the 8 byte input block stored at N1 and N2. Once the program has finished execution N1 and N2 store the result.
; ================================================ KDS epz 0 N1 epz 48 N2 epz 52 CM epz 80 P_XR epz 96 ROUND epz 98 TEMP epz 100 ; ================================================ org KDS hex 75713134 hex B60FEC45 hex A607BB83 hex AA3746AF hex 4FF99DA6 hex D1B53B5B hex 1B402A1B hex AA030D1B org N1 hex 11223344 hex 55667788 ; ================================================ org $200 ; ================================================ lda #0 sta ROUND RND_1_24 ; rounds 1 to 24 ; set the pointer to the round subkey X(R) ; P_XR = KDS + 4(ROUND mod 8) lda ROUND and #7 asl asl clc adc #KDS sta P_XR jsr CALC_ROUND ; update N1 and N2 jsr N1_TO_N2 jsr CM_TO_N1 inc ROUND lda ROUND cmp #24 bne RND_1_24 ; for the last 8 rounds, set ROUND = 7 and count down lda #7 sta ROUND RND_25_32 ; rounds 25 to 32 ; set the pointer to the round subkey X(R) ; P_XR = KDS + 4(ROUND mod 8) lda ROUND asl asl clc adc #KDS sta P_XR jsr CALC_ROUND dec ROUND lda ROUND cmp #$FF beq CM_TO_N2 ; update N1 and N2 jsr N1_TO_N2 jsr CM_TO_N1 jmp RND_25_32 CM_TO_N2 ; set N2 = CM lda CM sta N2 lda CM+1 sta N2+1 lda CM+2 sta N2+2 lda CM+3 sta N2+3 jmp STOP ; ================================================ CALC_ROUND ; CM = N1 ; CM = CM + X(R) ; CM = S_BOXES(CM) ; CM = ROTL(CM,11) ; CM = CM xor N2 jsr N1_TO_CM jsr CALC_CM1 jsr SUB_CM jsr ROTL_CM jsr CALC_CM2 rts ; ================================================ N1_TO_CM ; set CM = N1 lda N1 sta CM lda N1+1 sta CM+1 lda N1+2 sta CM+2 lda N1+3 sta CM+3 rts ; ================================================ CALC_CM1 ; calc CM = N1 + X(R) ; X(R) is the subkey for round R ; P_XR points to X(R) ldy #0 lda CM clc adc (P_XR),y sta CM iny lda CM+1 adc (P_XR),y sta CM+1 iny lda CM+2 adc (P_XR),y sta CM+2 iny lda CM+3 adc (P_XR),y sta CM+3 rts ; ================================================ SUB_CM ; apply the S boxes ; apply K8 to the upper 4 bits of CM lda CM+3 and #$F0 lsr lsr lsr lsr tax lda K8,x asl asl asl asl pha ; apply K7 to the next 4 bits of CM lda CM+3 and #$F tax lda K7,x ; recombine the nibbles and write back to CM sta CM+3 pla ora CM+3 sta CM+3 ; apply K6 to the next 4 bits of CM lda CM+2 and #$F0 lsr lsr lsr lsr tax lda K6,x asl asl asl asl pha ; apply K5 to the next 4 bits of CM lda CM+2 and #$F tax lda K5,x ; recombine the nibbles and write back to CM sta CM+2 pla ora CM+2 sta CM+2 ; apply K4 to the next 4 bits of CM lda CM+1 and #$F0 lsr lsr lsr lsr tax lda K4,x asl asl asl asl pha ; apply K3 to the next 4 bits of CM lda CM+1 and #$F tax lda K3,x ; recombine the nibbles and write back to CM sta CM+1 pla ora CM+1 sta CM+1 ; apply K2 to the next 4 bits of CM lda CM and #$F0 lsr lsr lsr lsr tax lda K2,x asl asl asl asl pha ; apply K1 to the lower 4 bits of CM lda CM and #$F tax lda K1,x ; recombine the nibbles and write back to CM sta CM pla ora CM sta CM rts ; ================================================ ROTL_CM ; ROTL CM by 11 bits ; copy up by 1 byte (8 bits) lda CM+3 pha lda CM+2 sta CM+3 lda CM+1 sta CM+2 lda CM sta CM+1 pla sta CM ; ROTL by 1 bit asl CM rol CM+1 rol CM+2 rol CM+3 php pla and #1 ora CM sta CM ; ROTL by 1 bit asl CM rol CM+1 rol CM+2 rol CM+3 php pla and #1 ora CM sta CM ; ROTL by 1 bit asl CM rol CM+1 rol CM+2 rol CM+3 php pla and #1 ora CM sta CM rts ; ================================================ CALC_CM2 ; calc CM = CM xor N2 lda CM eor N2 sta CM lda CM+1 eor N2+1 sta CM+1 lda CM+2 eor N2+2 sta CM+2 lda CM+3 eor N2+3 sta CM+3 rts ; ================================================ N1_TO_N2 ; set N2 = N1 lda N1 sta N2 lda N1+1 sta N2+1 lda N1+2 sta N2+2 lda N1+3 sta N2+3 rts ; ================================================ CM_TO_N1 ; set N1 = CM lda CM sta N1 lda CM+1 sta N1+1 lda CM+2 sta N1+2 lda CM+3 sta N1+3 rts ; ================================================ ; the S boxes K8 byt $1,$F,$D,$0,$5,$7,$A,$4,$9,$2,$3,$E,$6,$B,$8,$C K7 byt $D,$B,$4,$1,$3,$F,$5,$9,$0,$A,$E,$7,$6,$8,$2,$C K6 byt $4,$B,$A,$0,$7,$2,$1,$D,$3,$6,$8,$5,$9,$C,$F,$E K5 byt $6,$C,$7,$1,$5,$F,$D,$8,$4,$A,$9,$E,$0,$3,$B,$2 K4 byt $7,$D,$A,$1,$0,$8,$9,$F,$E,$4,$6,$C,$B,$2,$5,$3 K3 byt $5,$8,$1,$D,$A,$3,$4,$2,$E,$F,$C,$7,$6,$0,$9,$B K2 byt $E,$B,$4,$C,$6,$D,$F,$A,$2,$3,$8,$1,$0,$7,$5,$9 K1 byt $4,$A,$9,$2,$D,$8,$0,$E,$6,$B,$1,$C,$7,$F,$5,$3 ; ================================================ STOP brk ; ================================================ |