Skip to content
Techniques & Technology

Lookup Tables

Trading memory for speed

Lookup tables pre-calculated expensive operations like trigonometry and multiplication, storing results for instant retrieval on processors too slow for real-time maths.

commodore-64sinclair-zx-spectrumnintendo-entertainment-systemcommodore-amiga optimisationmathsperformance 1975–present

Overview

When your processor runs at 1 MHz and lacks multiplication hardware, calculating sine values in real-time is impossible. Lookup tables solved this by pre-computing results and storing them in memory. Need sin(45°)? Look it up instantly. The technique traded RAM for speed—a bargain on systems where cycles were precious.

The problem

Operation6502 cycles
Addition2-4
Multiplication100-200 (software)
Division200-400 (software)
Sine/cosine500+ (software)

Real-time games couldn't afford these costs.

The solution

Pre-calculate and store. The table below stores (sin(angle) + 1) × 127 so values fit in unsigned bytes (0-254), which is the common signed-sine convention on 8-bit hardware:

; 256 entries: angle 0-255 maps to one full circle
; entry = round((sin(angle * 2π / 256) + 1) * 127)
sine_table:
    .byte 127, 130, 133, 136, 139, 143, 146, 149   ; angles 0-7
    .byte 152, 155, 158, 161, 164, 167, 170, 173   ; angles 8-15
    ; ... continue for 256 entries

get_sine:
    ldx angle              ; X = angle (0-255), assume zero-page
    lda sine_table,x       ; A = sine value
    rts
    ; Cost: LDX zp (3) + LDA abs,X (4) + RTS (6) = 13 cycles
    ; vs ~500 cycles to compute sin() in software

Common lookup tables

TableUse
Sine/cosineRotation, movement
MultiplicationScaling, coordinates
Square rootDistance calculation
Screen addressesFast plotting
Colour cyclingAnimation effects

Multiplication tables

8×8 multiplication via the quarter-squares identity:

a × b = ((a + b)² − (a − b)²) / 4

Pre-calculate f(n) = n² / 4 once for n = 0..510, then a × b = f(a + b) − f(a − b). The full multiply needs only two table lookups and one subtraction:

; quarter_squares[n] = (n*n) / 4 for n = 0..510
; Stored as two 256-byte tables (low byte / high byte)
qsq_lo:    .byte <(0*0/4), <(1*1/4), <(2*2/4), ...
qsq_hi:    .byte >(0*0/4), >(1*1/4), >(2*2/4), ...

; Multiply A × X, leaving result in (qsq_result_hi:qsq_result_lo)
; Caller has already computed (a+b) into Y and (a-b) into A
mul8x8:
    ldy_aplusb:                ; Y = a + b
    sty .ap1
    sty .ap2
    sec
    ; ... fetch qsq[a+b] - qsq[a-b]
    rts

The full quarter-squares routine is ~30 cycles compared to ~120 for shift-and-add multiply.

Screen address tables

Fast pixel plotting:

screen_lo:
    .byte <$0400, <$0428, <$0450, ...  ; Each row start
screen_hi:
    .byte >$0400, >$0428, >$0450, ...

plot_pixel:
    ldy row
    lda screen_lo,y
    sta ptr
    lda screen_hi,y
    sta ptr+1
    ; Now ptr points to row start

Memory vs speed tradeoff

Table sizeLookup cost (6502)Calculation costNotes
256 bytes (8-bit index)7 cycles (LDX + LDA abs,X)100-500 cyclesFits a full circle of sine, page-aligned for free
512 bytes (9-bit index)9-12 cyclesSameNeed extra arithmetic to compose the index
1024 bytes12-15 cyclesSameTwo-table form (low byte / high byte)
4096 bytes15+ cyclesSamePage-cross overhead becomes significant

The cost of the table is paid once at assembly time; the cost of the calculation is paid every time it runs.

Page alignment

Place tables at addresses ending $xx00 so LDA table,X never crosses a page. A page-cross adds 1 cycle to indexed addressing modes. For 256-byte tables this gives a free, predictable cost; for larger tables, splitting into per-page sections lets you keep the index inside one page.

Interpolation

When table resolution insufficient:

  1. Look up nearest values
  2. Interpolate between them
  3. Still faster than calculation

Platform examples

C64

  • Screen row addresses
  • Colour cycling tables
  • Sprite multiplexer Y-sorts

ZX Spectrum

  • Screen address calculation (complex layout)
  • Attribute addresses
  • Trigonometry for 3D

NES

  • PPU address mapping
  • Collision tables
  • Sine for movement

Limitations

LimitationMitigation
Memory costReduce precision (8-bit instead of 16-bit), or share tables (sine and cosine differ only by a 90° shift)
Table generationGenerate at assemble time using the assembler's macro language, or once at boot into RAM
Precision limitsInterpolate between adjacent entries if needed
Page-cross costPage-align all hot tables

Build-time vs runtime generation

Two strategies:

  • Assembled in: the table is part of the ROM/binary. No startup cost; uses ROM rather than RAM. Best when ROM is plentiful (cartridge era) or when the table is small.
  • Generated at boot: a one-shot routine fills a RAM table with computed values, then the program runs from there. Saves ROM at the cost of startup time and RAM. Common when a single table covers many similar uses or when ROM is tight.

See also