Asi64

:: 6502, C64, racket

In my last few posts, I detailed some of my experience learning 6502 assembler for the Commodore 64. I started off using DASM, which seems to be quite a nice assembler, severely lacking in documentation. Then I discovered the very awesome KickAssembler, which includes a full blown scripting language on top of java, lots of other very nice features, and great documentation. This made me realise what a powerful modern assembler could be like - and perhaps I could go one step further with it.

Asi64 extends Racket to become a 6502 assembler. No need for scripting languages or half baked macro systems here - you have literally all of Racket and its amazing macro system on your side. It also has direct support for Vice, a popular Commodore 64 emulator, passing your labels and breakpoints along enabling a fluid programming and debugging cycle. Hats off to the fantasic KickAssembler, I have stolen many ideas from it :)

If you want to have a go, you get can asi64 from my github here or the racket package manager - see the github repo for a brief rundown of syntax and getting started. Currently, this is targetted at Windows - the assembler itself should work fine on any OS but it might need a small tweak in how it executes the emulator - PRs would be welcome!

By way of introduction to some bits of the assembler (definetly not even touching on its potential) we will write a relatively simple demo effect, a very basic starfield. (note this is not attempting to teach 6502 or Racket!)

Starfields!

To make a simple starfield we will use the C64’s character graphics. In this mode, the VIC-II graphics chip will point at an area in memory where a character set of our design is stored. A character is defined as an 8x8 grid of pixels. In memory, this is stored sequentially as 8 bytes, where the bits of each byte represent a pixel, going from the top row of the character downwards.

The VIC chip goes on its merry way rendering graphics. On every 8th scanline (known as a “bad line”) it will read the next row of video memory in and work out which characters it needs to display. This is fairly simple, the video memory is somewhere of our choosing and represents the 40*25 characters laid out sequentially. Each byte in the memory represents a character index from the character set to render. This means you can have 256 characters in a set, for a 2k total of memory.

Setup

First, let’s get the boring stuff out of the way, which is telling the VIC chip where to look for the video and character memory. In this example, we will use $0c00 for the video memory, $2000 for the character memory, and stick our program at $1000.

1
2
3
4
5
6
7
8
9
#lang asi64
(define vic-control $d018)
(C64{
	*= $1000
	lda @%00111000
	; screen at $0c00, characters at $2000
	sta vic-control 
	jmp (here) ;infinite loop
})

$d018 is the vic register which controls where it looks for things. Here we load the value 00111000 into it which sets things up how we want. The combination of lda, sta is so common that we can get asi64 to provide us a little abstraction over it by using the define-op macro.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
#lang asi64
(define vic-control $d018)
(define black 0)
(define background $d021)
(define-op (mov src dst){
	lda src
	sta dst
})

(C64{
	*= $1000
	(mov @%00111000 vic-control)
	(mov @black background)
	jmp (here) ;infinite loop
})

From now on I will mostly omit code that’s already been shown since assembly quickly gets large.

Character Design

What we will do is have 5 characters in a row and rotate pixels through them on each screen refresh. Therefore, character 0 we will start off with a single pixel in the centre of it. The next 4 characters will simply be blank, waiting to recieve the pixel. Then, the video memory will be tiled with chracters 0 – 5 to give the illusion of a (terrible) star field.

So let’s go ahead and build our characters. We know the memory starts at $2000, therefore that address represents the top row of pixels for the first character.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
	lda @0
	sta $2000
	sta $2001
	sta $2002
	sta $2003
	sta $2005
	sta $2006
	sta $2007

	; a single pixel in the centre!
	lda @%00001000
	sta $20004 

Well, this is boring. And we still need to do the next 4 characters. As ever in assembly you have to choose between writing loops or unrolling them - space vs speed - in this case we will stick to unrolled code. I think racket can help us out here though.

1
2
3
4
5
6
 	lda @0
	;splat the first 5 characters 
	(for ([i (in-range (* 8 5))])
	  {sta (+ $2000 i)})
	; load our humble pixel
	(mov @%00001000 $2004)

Much better. Ok, so we might have written over $2004 in the loop needlessly, but whatever, that could easily be rectified. Next up is to tile the screen with the 0–4 sequence. We could unroll some code to do this, or write a loop, but there is another option. We can simply write to the video memory directly as part of the assembly process.

1
2
3
	*= $0c00 ;asssemble at video memory location
	(for ([i (in-range 0 200)])
		(data 0 1 2 3 4))

Don’t know about you but I am already bored of keep writing this for .. in-range stuff. Let’s knock up a quick macro to do it for us.

1
2
3
4
5
6
7
(define-syntax (fori stx)
  (syntax-parse stx
    ([_ e f]
     ; breaking hygeine here, very sorry
     (with-syntax ([f2 (syntax->datum #'f)])
      #'(for ([i (in-range 0 e)])
                 f2)))))

Yes, this is a terrible macro! It breaks hygeine and won’t even work in some cases. I am just putting it here to show that you can!

Now we can write

1
2
3
4
5
	lda @0
	;splat the first 5 characters 
	(fori (* 8 5) {sta (+ $2000 i)})
	; load our humble pixel
	(mov @%00001000 $2004)

and

1
2
	*= $0c00 ;asssemble at video memory location
	(fori 200 (data 0 1 2 3 4))

Rawr!

Move it!

I’ll admit this looks like a pretty terrible star field. Even when moving it will look bad, but one problem at a time. Let’s get it moving! The basic operation will be to rotate $2004 to the right. This will move the bit along one place, and if it falls off the end of the byte it will end up in the processor’s carry bit. From there, if we rotate right the same row-byte of the next character ($200c) the bit will move from the carry into the most signficant bit of the next character’s byte. we can repeat this process, with a special case for the last character since it needs to wrap around back to the first one if the carry is set.

1
2
3
4
5
6
7
8
:update_starfield
	clc ; clear carry bit
	; ror our little guy through memory
	(fori 5 {ror (+ $2004 (* i 8))})
	bcc end+	
	ror $2004
:end    
	rts	

We will call this subroutine once each video frame. To achieve this we will burn cycles until the raster hits the bottom of the screen-ish (since we don’t want to update whilst the VIC is rendering!), call this procedure, and then wait again. Usually you’d do this with raster interrupts since obviously sitting waiting for the screen means you can’t do anything else, but it will suffice for this example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
		lda @$ff
:loop 		cmp $d012 ; wait for line 256
		bne loop-
		jsr update_starfield+
		; do nothing for a while so 
		; this doesn't trigger more than
		; once on the same frame!
		(fori 100 {nop})
		lda @$ff
		jmp loop-

Making it not terrible

As amazing as the single pixel going across the screen in a lockstep obviously repeated pattern is, we can probably do better. The obvious problems are:

  • There is only one lonely pixel
  • The pixels should travel at different speeds to produce a parallax effect
  • The tiling should be broken up into different perumations on each row

If we had more pixels, it would mean more RORing of the relevant bytes. To make them move faster, we’d have to repeat the whole ROR cycle more than once. Clearly, this quickly escalates into tons of boring and hard to change code. Instead, we will get racket to help us out by writing a function that can generate a starfield update to our specifications. We will pass it a memory location of the character set, the amount of characters to rotate through, a list of tuples that indicate which rows of the character need rotating and at what speed.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
(struct starbank (location len rows))

(define (generate-starfield-updates-1 starbank)
  ;location - charset location where the first starfield char is
  ;len      - how many chars to ROL through
  ;rows     - which rows of the character contain stars (no point ROLing them otherwise)
  (define (char-index i)
    (+ (starbank-location starbank)  (* 8 i)))
  (for ([row (starbank-rows starbank)])
    ;extract the row number and speed
    (let ([row (car row)]
          [speed (cadr row)])
      (for ([s (in-range 0 speed)])
      	;repeat the ror cycle for speed times
        {clc ; clear the carry bit!
         (for ([char (in-range 0 (starbank-len starbank))])
           ;now we just ror each character in turn
           {ror (+ (char-index char) row)})
         ; special wrap-around case
         bcc end+
         ror (+ (char-index 0) row)
     :end rts }))))

This is pretty cool. It basically writes the same code as we did manually last time except now it can do it for multiple rows at different speeds. This makes it really easy to tinker with the values to produce a nice looking starfield.

What if we wanted to scroll the stars left instead of right? The basic principle is the same, we still need to rotate the pixels through memory, except we must rotate left, the wrap around goes from the first to the last character, and the characters must be processed in reverse. Since racket is allowing us to conditionally generate assembly code, we can interleave this as an option into our generator

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
(struct starbank (location len rows))

(define (generate-starfield-updates starbank going-right?)
  (define (char-index i)
    (+ (starbank-location starbank)  (* 8 i)))

  (for ([row (starbank-rows starbank)])
    (let ([row (car row)]
          [speed (cadr row)])
      (for ([s (in-range 0 speed)])        
        {clc 
         (for ([char (if going-right?
                         (in-range 0 (starbank-len starbank))
                         (in-range (- (starbank-len starbank) 1) -1 -1))])
           (if going-right?
               {ror (+ (char-index char) row)}
               {rol (+ (char-index char) row)}))

         ;finally wrap around if the carry is et
         bcc end+
         (if going-right?
             {ror (+ (char-index 0) row)}
             {rol (+ (char-index (- (starbank-len starbank) 1)) row)})
         :end}))))

It’s pretty much the same code, it just produces rols instead of rors and processes the things backwards.

You could extend this to also create the chracter memory for you, and have it return two functions that you can call and label somewhere in the program, one for generating and the other for updating. You can probably see how this could quickly become a build-your-own-tailored-compiler-kit if you wanted it to.

The last part was to re-arrange to video memory so each row has a different permuatation of the 0 1 2 3 4 sequence. There’s a number of ways we could do this of course. Randomness doens’t work very well, so I messed around a bit until I found a good pattern, and hardcoded it at compile-time.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
	*= $0c00 ;asssemble at video memory location
    (let ([perm1 '(0 1 2 3 4)]
          [perm2 '(4 0 1 2 3)]
          [perm3 '(3 4 0 1 2)]
          [perm4 '(2 3 4 0 1)]
          [order '(1 4 2 1 3 1 2 4
                     1 4 2 1 3 1 2 4
                     1 4 2 1 3 1 2 4
                     1)])

      (for ([next order])
        (for([count (in-range 8)])
          (case next
            [(1) (data perm1)]
            [(2) (data perm2)]
            [(3) (data perm3)]
            [(4) (data perm4)]))))                   

Omitting the code to stick some more pixels in our character set,our new update code becomes

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
		(define sb1 (starbank $2000 5 '((1 1) (3 2) (5 4))))
		lda @$ff
:loop 		cmp $d012 ; wait for line 256
		bne loop-
		jsr update-starfield
		; do nothing for a while so 
		; this doesn't trigger more than
		; once on the same frame!
		(fori 100 {nop})
		lda @$ff
		jmp loop-
:update-starfield
		(generate-starfield-updates sb1 #t)
		rts

The final result, I think you will agree is much better looking, even if it is still a bit rubbish! (actually, the gif is kinda terrible, it looks a lot better on the emulator and much better still on the real thing!)

Conclusion

Asi64 brings racket and 6502 together in a (hopefully!) glorious fusion. It is by no means finished, and will likely be rewritten again at least one more time. This is basically my first proper Racket project so I still have much to learn, but it is a lot more fun to program the Commodore 64 now ! (You could also use this to program the NES!). Comments are welcome.

Here is a picture of the lovely starfield running on the real hardware :)

The full code for this example can be found at the github repo here