The Atari Panther - Part 5 - Fun with object lists

In this 5th part we will see how complex is drive efficiently the Panther Chip.

A 3D rendering of the Atari Panther. Author unknown.

Articles:

Objects List processing

As explained, the Panther works by processing an Object List every scanline. On every scanline it has to go through all the Objects, fetch their definition from SRAM/ROM, check if they contribute to current scanline, if yes, fetch graphic data from SRAM/ROM while drawing in the line buffer, then update the Object in memory and proceed with the next Object.

Here in pseudo-code:

int y=0   // current scanline
do {
  if(isScanLineVisible(y)) {
    void* op = objectListPointer;
    while(op!=null && !isScanLineTimeComplete()) {
      Object obj = fetchObject(op)
      switch(obj.type) {
        case BITMAP_OBJECT:
          int l = obj.y - y;   // current bitmap line
          if(l>=0 && l<obj.h) {
            drawLineInBufferAtX(o.bitmapLines[l], lineBuffer, obj.x);
          }
          break;
        case CLUT_OBJECT:
          ...      
          ...      
      }
      updateObject(op)
      op = obj.next;    
    }
    waitScanLineTimeComplete();
    copyAndClear(lineBuffer,backLineBuffer);
  }
  if(isLastScanLine(y)) { y=0; } else { y=y+1; }
} while(true)

This is quite similar to how the Atari 7800 works and very different from framebuffer systems (Amiga) and sprites & tiles systems (Super NES and Megadrive).

Note how Bitmap Objects that appear early in the Object List are shown under the others.

There are multiple type of Objects, each with its function:

  • Bitmap object
    • any size
    • 1,2,4,8 bit per pixel
    • zoomable and shrinkable
    • optionally Run Length Encoded
  • CLUT object copies an array of byte from anywhere to the CLUT at a given scanline
  • Branch to another Object based on the scan line currently processed
  • Copy an array from anywhere to anywhere
  • Write a constant to a given address
  • Add a constant to a given address
  • Interrupt the 68000 at a given scan line

All the Objects, except the Bitmap Objects, can be stored also in ROM. This because Bitmap Objects are updated during the drawing. A Bitmap Object contains a pointer to the bitmap data. This pointer can point to ROM or SRAM.

A not efficient display list

As already seen, parsing the Objects List can eat a substantial part of the SRAM bandwidth. Is therefore important that the Objects List not only shows the graphic as expected but it also do it in an efficient way. Let’s see why with an example.

We want to simulate a tile-layer using 12 rows of 20 Objects. Each Bitmap Objects is 16x16 pixels large. We create an Object List with 240 Bitmap Objects by linking them from left to right from the top to the bottom.

To parse a Bitmap Object definition requires 4 read accesses and 2 write accesses to SRAM (2 mclk/access) if the Bitmap Object is visible on the current scanline or 1 read access if not visible.

To fetch 4 bit per pixel bitmap data from ROM requires 4 mclk/word, a word is 16 bit and contains 4 pixels. This is 1 mclk/pixel.

Each of the 16 scanlines composing the first row are completed in 20 * (6 * 2) + 20 * 16 = 240 + 320 = 560 mclk. This sums the mclk needed to parse the Object definitions and fetch the pixels.

The second row requires an additional 20 * (1 * 2) = 40 mclk to skip the first 20 Object composing row 1. So, the formula is

mclk(row) = 560 + 40 * (row-1)

Effect of Object List parsing on free mclk

row:                     tiles               : free cycles
 01: O O O O O O O O O O O O O O O O O O O O : 464 mclk
 02: O O O O O O O O O O O O O O O O O O O O : 424 mclk
 03: O O O O O O O O O O O O O O O O O O O O : 384 mclk
 04: O O O O O O O O O O O O O O O O O O O O : 344 mclk
 05: O O O O O O O O O O O O O O O O O O O O : 304 mclk
 06: O O O O O O O O O O O O O O O O O O O O : 264 mclk
 07: O O O O O O O O O O O O O O O O O O O O : 224 mclk
 08: O O O O O O O O O O O O O O O O O O O O : 184 mclk
 09: O O O O O O O O O O O O O O O O O O O O : 144 mclk
 10: O O O O O O O O O O O O O O O O O O O O : 104 mclk
 11: O O O O O O O O O O O O O O O O O O O O : 064 mclk
 12: O O O O O O O O O O O O O O O O O O O O : 024 mclk

Note how at row 12 we need 1000 mclk to draw a scanline. This leaves just 1024-1000=24 mclk for other layers or sprites, practically nothing.

This example shows that with a naive approach the Panther cannot compete with the Megadrive or the SNES and that the clever design of Object Lists is at the core of Panther’s programming.

A better display list

Ideally, every scanline should process only the Objects visible on that scanline. To the extreme this would require a specific list for every scanline. Off course this is not feasible because requires too much SRAM and CPU processing to prepare the list.

An intermediate approach is needed, like dividing the screen in horizontal buckets and create an Objects List for each. Branch Objects can be useful to skip parts of the list after a given scanline.

As an exercise, we can apply the Branch Objects to the previous example. Every row starts with a Branch Object (Bx) that normally points to the first tile of its row (Tx,1), but when the row is no more visible branches to the Branch Object of the next row (Bx+1). The last tile of every row (Tx,N) points to the head of a list of sprites (S1) that are above the tilemap.

B1 T1,1 ... T1,N \
...    ...        >→ S1 … SN
BM TM,1 ... TM,N /

To fetch a Branch Object requires 1 read access when not branching and 2 read accesses when branching. The Object Processor always starts from B1, so it has to skip all the rows already displayed. 1 access is 2 mclk.

So, the formula for the tiles is:

mclk(row) = 560 + 2 + 4 * (row-1)

Effect of Object List optimizations

row:                     tiles               : free cycles
 01: O O O O O O O O O O O O O O O O O O O O : 462 mclk
 02: O O O O O O O O O O O O O O O O O O O O : 458 mclk
 03: O O O O O O O O O O O O O O O O O O O O : 454 mclk
 04: O O O O O O O O O O O O O O O O O O O O : 450 mclk
 05: O O O O O O O O O O O O O O O O O O O O : 446 mclk
 06: O O O O O O O O O O O O O O O O O O O O : 442 mclk
 07: O O O O O O O O O O O O O O O O O O O O : 438 mclk
 08: O O O O O O O O O O O O O O O O O O O O : 434 mclk
 09: O O O O O O O O O O O O O O O O O O O O : 430 mclk
 10: O O O O O O O O O O O O O O O O O O O O : 426 mclk
 11: O O O O O O O O O O O O O O O O O O O O : 424 mclk
 12: O O O O O O O O O O O O O O O O O O O O : 420 mclk

Much better! But we can do even better by keeping the free mclk constant.

An even better display list

We assume that the HW reads the Object List Pointer register only when the Object List processing starts and that changing it during the processing has no immediate side effects on Object List parsing.

The trick is to use the Write Object to change the Object List Pointer register at the end of every row.

W1 B1 T1,1 ... T1,N \
...   ...    ...     >→ S1 … SN
WM BM TM,1 ... TM,N /

The frame starts with the Object List Pointer register pointing to B1.

Bx normally lead to Tx,1 but when the row x is complete branches to Wx+1 which changes the Object List Pointer register so it points to Bx+1. In other words, the first scanline of a row executes the Wx object. The other scanlines of the row don’t. A Write Object requires 4 mclk.

The formula is:

mclk(row) = 4 + 4 + 560 = 568  ; on the first scanline of a row
mclk(row) = 4 + 560 = 564      ; on the other 15 scanlines of a row

Effect of further Object List optimizations

row:                     tiles               : free cycles
 01: O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
 02: O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
 03: O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
 04: O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
 05: O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
 06: O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
 07: O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
 08: O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
 09: O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
 10: O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
 11: O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
 12: O O O O O O O O O O O O O O O O O O O O : 460/456 mclk

This should give an idea of the flexibility and complexity that an Object List can reach. Other ideas are using the Copy Object or the Write Object to modify the Object List dynamically.

Another matter that complicates Objects List processing is the fact that the Panther Chip rewrites at every scanline the Bitmap Object description of those visible on that scanline (data address and Y size fields). Therefore before the begin of a new frame the Bitmap Object descriptions must be restored. This can be done by the CPU or by the Object List using the Write Object or the Copy Object.

If this is not enough, horizontal clipping of Bitmap Object is important to no waste time rendering pixels not visible. The CPU has to clip the Bitmap Objects by correctly configuring their definition.

Closing

This was not an easy one! Understanding Object Lists processing will allow us to further understand pros and cons of this machine.

Written on August 23, 2023