Significance of the Study
Chapter II. RISC/ Nintendo 64
N64 CPU is part of the MIPS R4000 family of processors. The N64 CPU consists of the following components:
an execution unit with a 64-bit register file for integer and floating-point operations
a 16 KB instruction cache
an 8 KB writeback data cache
a 32-entry TLB (Translation Lookaside Buffer) for virtual address to physical address calculation
The Nintendo 64 game runs in kernel mode with 32-bit addressing. 64-bit integer operations are available in this mode. However, the 32-bit C calling convention is used to maximize performance.
Description
The R4300i is a low-cost RISC microprocessor optimized for demanding consumer applications. The R4300i provides performance equivalent to a high-end PC at a cost point to enable set-top terminals, games and portable consumer devices. The R4300i is compatible with the MIPS R4000 family of RISC microprocessors and will run all existing MIPS software. Unlike its predecessors designed for use in workstations, the R4300i is expected to lower the cost of systems in which it is used, a requirement for price-sensitive consumer products. The R4300i is also an effective embedded processor, supported by currently available development tools and providing very high performance at a low price-point. Description The R4300i is a low-cost RISC microprocessor optimized for demanding consumer applications. The R4300i provides performance equivalent to a high-end PC at a cost point to enable set-top terminals, games and portable consumer devices. The R4300i is compatible with the MIPS R4000 family of RISC microprocessors and will run all existing MIPS software. Unlike its predecessors designed for use in workstations, the R4300i is expected to lower the cost of systems in which it is used, a requirement for price-sensitive consumer products. The R4300i is also an effective embedded processor, supported by currently available development tools and providing very high performance at a low price-point.
The Nintendo 64 game consists of a number of hardware components that work together to produce the graphics and audio for the game. The heart of the system is the Reality CoProcessor (RCP). Attached to the RCP are memory chips, N64 CPU and some miscellaneous I/O chips.
The RCP runs the graphics and audio microcode. The display portion of the RCP renders into the graphics frame buffer located in main memory. The video and audio portions of the RCP, DMA frame buffer, and audio data from main memory to drive the video and audio DACs. Figure 1-1 below is a block diagram of the N64 system.
Figure 1-1 N64 Hardware Block Diagram
The CPU and RCP are both processors that can execute at the same time. Threads execute on the CPU and tasks execute on the RCP. Access to main memory from threads and tasks also occur in parallel.
The game program runs on the N64 CPU as a collection of threads, each of which has its own stack. The operating system is a collection of routines that can be called in a thread. The operating system controls which thread is running on the CPU. A thread can access all of physical memory.
Tasks run on the RCP, which is a microcode engine that processes a task list. Task lists are generated by a thread running on the N64 CPU and are stored in main memory. The game program creates the task list, calls an OS routine to load the appropriate microcode, and then starts the RCP running to process the task list. The microcode on the RCP reads the task list from main memory. The RCP task can also write into main memory.
Memory Management System (MMU) The VR4300 processor has a 32-bit physical addressing range of 4 Gbytes. However, since it is rare for systems to implement a physical memory space this large, the CPU provides a logical expansion of memory space by translating addresses composed in the large virtual address space into available physical memory addresses. The VR4300 processor supports the following two addressing modes: 32-bit mode, in which the virtual address space is divided into 2 Gbytes per user process and 2 Gbytes for the kernel. • 64-bit mode, in which the virtual address is expanded to 1 Tbyte (240 bytes) of user virtual address space.
The main memory in the system is used in parallel by the R4300 CPU, the RSP microcode engine, the RDP graphics pipeline, and the other I/O interfaces of the RCP. The software is responsible for defining the memory map
The N64 CPU can use both physical or virtual addresses. The TLB maps virtual addresses into physical addresses (see NOTE). It is anticipated that programs will mainly use KSEG0 (cached, unmapped) addresses for instructions and data. The RSP hardware uses physical addresses. The microcode imposes a segmented addressing scheme to generate the physical addresses. Bits 24 through 27 of the segmented address are used to index into a 16-entry table to obtain the base address of the segment. The upper 4 bits are masked off. The lower bits are an offset into the segment. This scheme is used to create dynamic RSP task lists easily. The RDP hardware uses physical addresses. The RSP microcode translates the segmented addresses stored in the task list into physical addresses.
The N64 CPU has an 8 KB writeback data cache. This means that when the CPU writes a variable, it may not be written to main memory until later. Since the RSP reads the task list directly from main memory, the dynamic portion of the task list must be flushed from the data cache before the RSP starts.
You also need to be careful with DMA operations. The data buffer must be flushed from the cache before the write from memory occurs. The data buffer must be invalidated in the cache before a read into memory occurs. If the cache invalidate does not occur, a write back from the cache may destroy data that has just been transferred into main memory by a read DMA. It is also a good idea to align I/O buffers on the 16-byte data cache line size, to avoid cache line tearing. Tearing occurs when a buffer and an unrelated variable share a cache line. The potential writeback of the variable could destroy data read into the I/O buffer.
Please note the following alignment restrictions:
8 byte alignment for most DMA
Access to PI using DMA
Alignment for Main Memory: | 8 byte |
Alignment for ROM: | 2 byte (see NOTE) |
64 byte alignment for color frame buffers (cfb) and Z buffer
8 byte alignment for textures
The RCP is a collection of processors, memory interfaces, and control logic. The Reality Signal Processor (RSP) is the microcode engine that executes audio and graphics tasks. The Reality Display Processor (RDP) is the graphics display pipeline that renders into the frame buffer. The memory interfaces provide access to main memory for the CPU, RSP, RDP, video interface, audio interface, peripheral devices, and serial game controllers. It is very important to remember that these interfaces may be active at the same time and that the RSP and RDP are running in parallel. Please see the RCP block diagram in Figure 1-2.