A Programmatic 6502 Assembler

I have had a lot of fun trying old school programming and hardware design, using modern tools.

An example of problems with languages is poor macro preprocessors. Verilog for example, has a horrible macro pre processor. A lot of people resort to writing programs to generate Verilog because the macro processor is so bad.

Here I have gone backwards another decade or two and implemented a 6502 assembler. Back in the 70s and early 80s there were many 6502 assemblers, but they tended to have inconsistent and byzantine directives and macros.

This assembler is a little different. Instead of the assembler language provding the macros and code generation facilities to be interpreted by the assembler, the assembler is written in python and you are supposed to call the assembler from python, feeding it instructions programmatically. This is akin to a macro language on steriods, where python is the macro language.

The code is here: asm6502.py
You instantiate the assembler and feed it a list of lines of assembly with the 'assemble(string list)' method.
It contains a utility routine go() that assembles all the instructions as a demo.

It performs three passes. One to read the instructions, one to compute the address locations and build the symbol table and a final pass to generate the memory map.


The Object Code Map

The assembler keeps a complete map of the 64K memory space of the 6502 and populates the code and values into that map. The 'object_code' class variable is a list containing the map. Each untouched location is set to -1. Other values indicate the 8 bit value at that location.

So after assembling the code into the map, it is possible to add in other things to the map by assiging to the object_code list. E.G.

a.object_code[0xfffd] = 0x00
a.object_code[0xfffc] = 0x10

Which would set the reset vector to 0x1000.


Sending Assembly to the Assembler From Python

This python assembles a few instructions

import asm6502

lines = list()
lines.append("start: ")
lines.append(" LDA #$10 ")
lines.append(" LDX #$00 ")
lines.append("loop: ")
lines.append(" STA $1000,x ")
lines.append(" INX ")
lines.append(" SBC #$01 ")
lines.append(" BPL loop ")
lines.append(" RTS ")

a = asm6502.asm6502(debug=0)
a.assemble(lines)

This one inserts 10 NOPs programmatically

import asm6502

lines = list()
lines.append("start: ")
lines.append(" LDA #$10 ")
lines.append(" LDX #$00 ")
lines.append("loop: ")
for i in xrange(10):
    lines.append(" NOP ")
lines.append(" STA $1000,x ")
lines.append(" INX ")
lines.append(" SBC #$01 ")
lines.append(" BPL loop ")
lines.append(" RTS ")

a = asm6502.asm6502()
a.assemble(lines)

Assembler Output

Output looks something like this:

a=asm6502.asm6502(debug=0)
65C02 Assembler
a.assemble(lines)
LISTING
1 0000 : start
2 0000 : A9 10 lda #$10
3 0002 : A2 00 ldx #$00
4 0004 : loop
5 0004 : 9D 00 10 sta $1000,X
6 0007 : E8 inx
7 0008 : E9 01 sbc #$01
8 000A : 10 FA bpl loop
9 000C : 60 rts

SYMBOL TABLE
start = $0000
loop = $0004

OBJECT CODE
0000: A9 10 A2 00 9D 00 10 E8 E9 01 10 FA 60
*
a.object_code[0:14]
[169, 16, 162, 0, 157, 0, 16, 232, 233, 1, 16, 250, 96, -1]


Directives

There are a small number of directives:

; Comment
ORG address ; Sets the current aseembly location
STR some_text ; Include text as ascii bytes
DB comma_separated_list_of_bytes ; $ prefix for hex
DW comma_separated_list_of_16_bit_numbers ; $ prefix for hex
DDW comma_separated_list_of_32_bit_numbers ; $ prefix for hex
DQW comma_separated_list_of_64_bit_numbers ; $ prefix for hex
LE ; For multi word data (DW, DDW and DQW) sets the encoding to little endian
BE ; For multi word data (DW, DDW and DQW) sets the encoding to big endian

The assembler defaults to little endian.


Labels

A word followed by a colon makes a label. It can be on it's own line, or in front of an instruction or directive.

alabel: ; A label on it's own
anotherlabel: STA #$10 ; A label with an instruction

Any address or 16 bit data field can be replaced with a declared label and the label address will be inserted there.


Assembling Into the Same Map

The assembler instance clears it's state before assembling, except for the object_code map. This enables you to assemble multiple pieces of code into different locations and they will be added to the map.
The print_object_code() class method displays the current object code map
E.G. The following code assembles a sequence, then modifies its origin, then reassembles it:
from asm6502 import asm6502
a = asm6502()
lines = [' ORG $1000', ' NOP', ' LDA #$20', 'here: NOP', ' DB 10,11,12,13', ' RTS']
a.assemble(lines)
lines[0] = ' ORG $2000'
a.assemble(lines)
a.print_object_code()
This yields this memory map with the same code in two places.
a.print_object_code()
OBJECT CODE
*
1000: EA A9 20 EA 0A 0B 0C 0D 60
*
2000: EA A9 20 EA 0A 0B 0C 0D 60
*

TBD 1: Write a 65C02 simulator that runs from the object_code state generated by the assembler

TBD 2: Write an output generator for one or more of the flash/prom/eeprom programming formats

TBD 3: Give it decent error handling