Lookup Tables

Overview

When your processor runs at 1 MHz and lacks multiplication hardware, calculating sine values in real-time is impossible. Lookup tables solved this by pre-computing results and storing them in memory. Need sin(45°)? Look it up instantly. The technique traded RAM for speed—a bargain on systems where cycles were precious.

The problem

Operation	6502 cycles
Addition	2-4
Multiplication	100-200 (software)
Division	200-400 (software)
Sine/cosine	500+ (software)

Real-time games couldn't afford these costs.

The solution

Pre-calculate and store. The table below stores (sin(angle) + 1) × 127 so values fit in unsigned bytes (0-254), which is the common signed-sine convention on 8-bit hardware:

; 256 entries: angle 0-255 maps to one full circle
; entry = round((sin(angle * 2π / 256) + 1) * 127)
sine_table:
    .byte 127, 130, 133, 136, 139, 143, 146, 149   ; angles 0-7
    .byte 152, 155, 158, 161, 164, 167, 170, 173   ; angles 8-15
    ; ... continue for 256 entries

get_sine:
    ldx angle              ; X = angle (0-255), assume zero-page
    lda sine_table,x       ; A = sine value
    rts
    ; Cost: LDX zp (3) + LDA abs,X (4) + RTS (6) = 13 cycles
    ; vs ~500 cycles to compute sin() in software

Common lookup tables

Table	Use
Sine/cosine	Rotation, movement
Multiplication	Scaling, coordinates
Square root	Distance calculation
Screen addresses	Fast plotting
Colour cycling	Animation effects

Multiplication tables

8×8 multiplication via the quarter-squares identity:

a × b = ((a + b)² − (a − b)²) / 4

Pre-calculate f(n) = n² / 4 once for n = 0..510, then a × b = f(a + b) − f(a − b). The full multiply needs only two table lookups and one subtraction:

; quarter_squares[n] = (n*n) / 4 for n = 0..510
; Stored as two 256-byte tables (low byte / high byte)
qsq_lo:    .byte <(0*0/4), <(1*1/4), <(2*2/4), ...
qsq_hi:    .byte >(0*0/4), >(1*1/4), >(2*2/4), ...

; Multiply A × X, leaving result in (qsq_result_hi:qsq_result_lo)
; Caller has already computed (a+b) into Y and (a-b) into A
mul8x8:
    ldy_aplusb:                ; Y = a + b
    sty .ap1
    sty .ap2
    sec
    ; ... fetch qsq[a+b] - qsq[a-b]
    rts

The full quarter-squares routine is ~30 cycles compared to ~120 for shift-and-add multiply.

Screen address tables

Fast pixel plotting:

screen_lo:
    .byte <$0400, <$0428, <$0450, ...  ; Each row start
screen_hi:
    .byte >$0400, >$0428, >$0450, ...

plot_pixel:
    ldy row
    lda screen_lo,y
    sta ptr
    lda screen_hi,y
    sta ptr+1
    ; Now ptr points to row start

Memory vs speed tradeoff

Table size	Lookup cost (6502)	Calculation cost	Notes
256 bytes (8-bit index)	7 cycles (`LDX` + `LDA abs,X`)	100-500 cycles	Fits a full circle of sine, page-aligned for free
512 bytes (9-bit index)	9-12 cycles	Same	Need extra arithmetic to compose the index
1024 bytes	12-15 cycles	Same	Two-table form (low byte / high byte)
4096 bytes	15+ cycles	Same	Page-cross overhead becomes significant

The cost of the table is paid once at assembly time; the cost of the calculation is paid every time it runs.

Page alignment

Place tables at addresses ending $xx00 so LDA table,X never crosses a page. A page-cross adds 1 cycle to indexed addressing modes. For 256-byte tables this gives a free, predictable cost; for larger tables, splitting into per-page sections lets you keep the index inside one page.

Interpolation

When table resolution insufficient:

Look up nearest values
Interpolate between them
Still faster than calculation

Platform examples

C64

Screen row addresses
Colour cycling tables
Sprite multiplexer Y-sorts

ZX Spectrum

Screen address calculation (complex layout)
Attribute addresses
Trigonometry for 3D

NES

PPU address mapping
Collision tables
Sine for movement

Limitations

Limitation	Mitigation
Memory cost	Reduce precision (8-bit instead of 16-bit), or share tables (sine and cosine differ only by a 90° shift)
Table generation	Generate at assemble time using the assembler's macro language, or once at boot into RAM
Precision limits	Interpolate between adjacent entries if needed
Page-cross cost	Page-align all hot tables

Build-time vs runtime generation

Two strategies:

Assembled in: the table is part of the ROM/binary. No startup cost; uses ROM rather than RAM. Best when ROM is plentiful (cartridge era) or when the table is small.
Generated at boot: a one-shot routine fills a RAM table with computed values, then the program runs from there. Saves ROM at the cost of startup time and RAM. Common when a single table covers many similar uses or when ROM is tight.

Trading memory for speed