The Atari Panther - Part 5 - Fun with object lists

In this 5th part we will see how complex is drive efficiently the Panther Chip.

A 3D rendering of the Atari Panther. Author unknown.

Articles:

Objects List processing

As explained, the Panther works by processing an Object List every scanline. On every scanline it has to go through all the Objects, fetch their definition from SRAM/ROM, check if they contribute to current scanline, if yes, fetch graphic data from SRAM/ROM while drawing in the line buffer, then update the Object in memory and proceed with the next Object.

Here in pseudo-code:

int y=0   // current scanline
do {
  if(isScanLineVisible(y)) {
    void* op = objectListPointer;
    while(op!=null && !isScanLineTimeComplete()) {
      Object obj = fetchObject(op)
      switch(obj.type) {
        case BITMAP_OBJECT:
          int l = obj.y - y;   // current bitmap line
          if(l>=0 && l<obj.h) {
            drawLineInBufferAtX(o.bitmapLines[l], lineBuffer, obj.x);
          }
          break;
        case CLUT_OBJECT:
          ...      
          ...      
      }
      updateObject(op)
      op = obj.next;    
    }
    waitScanLineTimeComplete();
    copyAndClear(lineBuffer,backLineBuffer);
  }
  if(isLastScanLine(y)) { y=0; } else { y=y+1; }
} while(true)

This is quite similar to how the Atari 7800 works and very different from framebuffer systems (Amiga) and sprites & tiles systems (Super NES and Megadrive).

Note how Bitmap Objects that appear early in the Object List are shown under the others.

There are multiple type of Objects, each with its function:

Bitmap object
- any size
- 1,2,4,8 bit per pixel
- zoomable and shrinkable
- optionally Run Length Encoded
CLUT object copies an array of byte from anywhere to the CLUT at a given scanline
Branch to another Object based on the scan line currently processed
Copy an array from anywhere to anywhere
Write a constant to a given address
Add a constant to a given address
Interrupt the 68000 at a given scan line

All the Objects, except the Bitmap Objects, can be stored also in ROM. This because Bitmap Objects are updated during the drawing. A Bitmap Object contains a pointer to the bitmap data. This pointer can point to ROM or SRAM.

A not efficient display list

As already seen, parsing the Objects List can eat a substantial part of the SRAM bandwidth. Is therefore important that the Objects List not only shows the graphic as expected but it also do it in an efficient way. Let’s see why with an example.

We want to simulate a tile-layer using 12 rows of 20 Objects. Each Bitmap Objects is 16x16 pixels large. We create an Object List with 240 Bitmap Objects by linking them from left to right from the top to the bottom.

To parse a Bitmap Object definition requires 4 read accesses and 2 write accesses to SRAM (2 mclk/access) if the Bitmap Object is visible on the current scanline or 1 read access if not visible.

To fetch 4 bit per pixel bitmap data from ROM requires 4 mclk/word, a word is 16 bit and contains 4 pixels. This is 1 mclk/pixel.

Each of the 16 scanlines composing the first row are completed in 20 * (6 * 2) + 20 * 16 = 240 + 320 = 560 mclk. This sums the mclk needed to parse the Object definitions and fetch the pixels.

The second row requires an additional 20 * (1 * 2) = 40 mclk to skip the first 20 Object composing row 1. So, the formula is

mclk(row) = 560 + 40 * (row-1)

Effect of Object List parsing on free mclk

row:                     tiles               : free cycles
O O O O O O O O O O O O O O O O O O O O : 464 mclk
O O O O O O O O O O O O O O O O O O O O : 424 mclk
O O O O O O O O O O O O O O O O O O O O : 384 mclk
O O O O O O O O O O O O O O O O O O O O : 344 mclk
O O O O O O O O O O O O O O O O O O O O : 304 mclk
O O O O O O O O O O O O O O O O O O O O : 264 mclk
O O O O O O O O O O O O O O O O O O O O : 224 mclk
O O O O O O O O O O O O O O O O O O O O : 184 mclk
O O O O O O O O O O O O O O O O O O O O : 144 mclk
O O O O O O O O O O O O O O O O O O O O : 104 mclk
O O O O O O O O O O O O O O O O O O O O : 064 mclk
O O O O O O O O O O O O O O O O O O O O : 024 mclk

Note how at row 12 we need 1000 mclk to draw a scanline. This leaves just 1024-1000=24 mclk for other layers or sprites, practically nothing.

This example shows that with a naive approach the Panther cannot compete with the Megadrive or the SNES and that the clever design of Object Lists is at the core of Panther’s programming.

A better display list

Ideally, every scanline should process only the Objects visible on that scanline. To the extreme this would require a specific list for every scanline. Off course this is not feasible because requires too much SRAM and CPU processing to prepare the list.

An intermediate approach is needed, like dividing the screen in horizontal buckets and create an Objects List for each. Branch Objects can be useful to skip parts of the list after a given scanline.

As an exercise, we can apply the Branch Objects to the previous example. Every row starts with a Branch Object (Bx) that normally points to the first tile of its row (Tx,1), but when the row is no more visible branches to the Branch Object of the next row (Bx+1). The last tile of every row (Tx,N) points to the head of a list of sprites (S1) that are above the tilemap.

B1 T1,1 ... T1,N \
...    ...        >→ S1 … SN
BM TM,1 ... TM,N /

To fetch a Branch Object requires 1 read access when not branching and 2 read accesses when branching. The Object Processor always starts from B1, so it has to skip all the rows already displayed. 1 access is 2 mclk.

So, the formula for the tiles is:

mclk(row) = 560 + 2 + 4 * (row-1)

Effect of Object List optimizations

row:                     tiles               : free cycles
O O O O O O O O O O O O O O O O O O O O : 462 mclk
O O O O O O O O O O O O O O O O O O O O : 458 mclk
O O O O O O O O O O O O O O O O O O O O : 454 mclk
O O O O O O O O O O O O O O O O O O O O : 450 mclk
O O O O O O O O O O O O O O O O O O O O : 446 mclk
O O O O O O O O O O O O O O O O O O O O : 442 mclk
O O O O O O O O O O O O O O O O O O O O : 438 mclk
O O O O O O O O O O O O O O O O O O O O : 434 mclk
O O O O O O O O O O O O O O O O O O O O : 430 mclk
O O O O O O O O O O O O O O O O O O O O : 426 mclk
O O O O O O O O O O O O O O O O O O O O : 424 mclk
O O O O O O O O O O O O O O O O O O O O : 420 mclk

Much better! But we can do even better by keeping the free mclk constant.

An even better display list

We assume that the HW reads the Object List Pointer register only when the Object List processing starts and that changing it during the processing has no immediate side effects on Object List parsing.

The trick is to use the Write Object to change the Object List Pointer register at the end of every row.

W1 B1 T1,1 ... T1,N \
...   ...    ...     >→ S1 … SN
WM BM TM,1 ... TM,N /

The frame starts with the Object List Pointer register pointing to B1.

Bx normally lead to Tx,1 but when the row x is complete branches to Wx+1 which changes the Object List Pointer register so it points to Bx+1. In other words, the first scanline of a row executes the Wx object. The other scanlines of the row don’t. A Write Object requires 4 mclk.

The formula is:

mclk(row) = 4 + 4 + 560 = 568  ; on the first scanline of a row
mclk(row) = 4 + 560 = 564      ; on the other 15 scanlines of a row

Effect of further Object List optimizations

row:                     tiles               : free cycles
O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
O O O O O O O O O O O O O O O O O O O O : 460/456 mclk
O O O O O O O O O O O O O O O O O O O O : 460/456 mclk

This should give an idea of the flexibility and complexity that an Object List can reach. Other ideas are using the Copy Object or the Write Object to modify the Object List dynamically.

Another matter that complicates Objects List processing is the fact that the Panther Chip rewrites at every scanline the Bitmap Object description of those visible on that scanline (data address and Y size fields). Therefore before the begin of a new frame the Bitmap Object descriptions must be restored. This can be done by the CPU or by the Object List using the Write Object or the Copy Object.

If this is not enough, horizontal clipping of Bitmap Object is important to no waste time rendering pixels not visible. The CPU has to clip the Bitmap Objects by correctly configuring their definition.

Closing

This was not an easy one! Understanding Object Lists processing will allow us to further understand pros and cons of this machine.

Written on August 23, 2023