KISS: Keep it simple, stupid. Now there's an acronym
rarely encountered in the computer industry. Even when it
comes to RISC a design philosophy that supposedly upholds
elegance and simplicity things are getting out of hand,
with multimillion-transistor budgets and such complex
features as four-way superscalar pipelines, out-of-order
execution, and even multimedia instructions. Mips Technologies has taken a conscious step in another direction with its new R5000 processor. As the latest Rx000-compatible CPU, the R5000 takes advantage of a relatively simple and efficient design to deliver high clock speeds (250 MHz by the end of this year) and competitive performance (an estimated 6.0 SPECint95 and 6.1 SPECfp95 at least) at a very attractive price (less than $300). The architects of the R5000 deliberately avoided complexity that would compromise their goal of making a fast, economical chip for low- to mid-range workstations. That's not because Mips is averse to advanced microprocessor design. The Mips R8000 and R10000 have all the sophisticated features mentioned earlier plus more. Where appropriate, the R5000 inherits some of those features, and it adds some new twists of its own, such as optimized logic for the single-precision floating-point (FP) math that characterizes today's 3-D graphics. Simple, Not Stupid To strike a balance between performance and simplicity, R5000 architects made some interesting choices. Compatibility with the latest Mips software was a paramount consideration, so they retained the 64-bit architecture first introduced in the R4000 and the Mips IV instruction set that made its debut with the R10000. The R5000's 64-bit data paths and registers effectively double the chip's bandwidth, yet they can also operate in 32-bit mode to provide backward compatibility with older Mips software. Mips retained many features of the high-end R10000, such as generous register files and primary caches. The R5000 has 32 integer and 32 FP registers, all of them 64 bits wide. Although the R10000 actually has 64 integer and 64 FP registers, half of those are programmer-invisible shadow registers for speculatively executed instructions. Since the R5000 doesn't speculatively execute, all its registers can be programmer-visible architectural registers. The R5000 has separate primary caches for instructions and data, and each cache is 32 KB and two-way set-associative just like the R10000. In fact, the R5000's caches occupy as much space as the logic. Large on-board caches are becoming increasingly common in high-performance microprocessors to help alleviate the memory-latency problem of modern system design. The final important feature that the R5000 inherits from the R10000 is superscalar pipelining. The R10000 was the first single-chip superscalar CPU from Mips, and the engineers went all-out. They endowed the R10000 with four-way pipelines and the potential to execute as many as five instructions per cycle, though it can retire only four per cycle. For the economy-model R5000, Mips pared down: The chip has two-way pipelines, with significant limitations on the types of instructions it can execute in parallel. For instance, the R5000 cannot execute two integer instructions simultaneously. Unfortunately, integer instructions are the most common operations in general-purpose software. Mips' thinking was that the R5000 will deliver sufficient integer performance for its target market even without parallelism. Instead, the architects chose to optimize the R5000's pipelines for the instruction streams typically found in 2-D and 3-D graphics software: a mix of integer and single-precision floating-point operations. As a result, the R5000 can execute a single-precision FP and any dissimilar instruction simultaneously. In other words, one instruction in the pair must be an integer-type or load/store operation; the R5000 cannot execute two single-precision FP instructions at once. Unduly restrictive? Mips engineers took into account how real-world graphics software works. MADD About Math Consider the math behind 3-D geometry processing. To calculate the vertices of a 3-D object, a graphics program typically multiplies a 4x4 transform matrix of single-precision FP values against a 1x4 matrix of similar values representing a single vertex. The result is another 1x4 matrix. The graphics program must repeat this operation for every vertex in the object potentially tens of thousands of times for a complex object in a CAD drawing. To do this kind of math, the Mips IV instruction set includes both a single-precision and a double-precision multiply-add (MADD) operation that's similar to the multiply-accumulate (MAC) instruction in a DSP. The main difference is that MADD uses four operands instead of MAC's three (A*B+C=D instead of A*B+C=C). As implemented in the R5000, the single-precision MADD instruction has a repeat rate of one cycle and a latency of four cycles. The FPU is subpipelined, so it can calculate the multiplication and addition components of a matrix problem in parallel. Once the R5000's five-stage pipeline is primed, there can be a MADD instruction in a different state of completion in every pipe stage. So the R5000 can repeatedly issue a new MADD instruction every cycle. The R5000 can also process an integer or load/store instruction at the same time as a single-precision MADD. For every MADD executing in the FP pipeline, an accompanying load/store instruction can be flowing through the second pipeline. Result: a highly tuned microarchitecture that rips through 3-D geometry calculations at speeds you'd normally expect from a more expensive processor. Ordinary FP benchmarks may not tell the whole story because they're not specifically measuring this performance. Mips claims that 3-D graphics software such as Pro-Engineer and UltraCAD will run faster on the R5000 than on similarly benchmarked CPUs like the Pentium Pro. BYTE hasn't yet verified those claims, but Mips' parent company, Silicon Graphics (Mountain View, CA), is reporting significant performance gains on its latest R5000-based Indy workstations. According to SGI, the new Indys run 3-D graphics software about 83 percent faster than existing R4400-based systems. And because they use early versions of the R5000 that run at 150 to 180 MHz, greater gains lie ahead when the R5000 achieves its target clock speed of 250 MHz. Cutting Corners To hold down costs, Mips left several advanced features out of the R5000 chip. Having fewer pipelines is just one example. The R5000 also cannot execute instructions speculatively or out of order. This greatly reduces the chip's complexity because it doesn't have to bother with tricky techniques to retire instructions in their original program order. The R5000 doesn't support dynamic branch prediction, either. However, it does have several branch-likely instructions that smart programmers and compilers can use as a sort of poor man's substitute. There's also an FP conditional-move instruction that saves a branch. In another cost-cutting measure, Mips eliminated the 128-bit secondary-cache bus found on the R4000 and R10000. The R5000 accesses its secondary cache over the general I/O bus, which is 64 bits wide. The maximum cache size is 2 MB. This is similar to the Pentium's cache interface, except the R5000's bus can run at 100 MHz, compared to the Pentium bus's top speed of 66 MHz. All these economies significantly reduce the R5000's pin count. And the R5000's die is downright tiny: 84 sq mm on a .32-micron process. A smaller die means higher yields, lower manufacturing costs, and faster clock speeds. With CPUs, that's the whole ball game. WHERE TO FINDMips Technologies
R5000 at a GlanceHIGHLIGHTS: DETAILS:
Inside the R5000In stage 2, instructions access the R5000's register file. In stage 3, the integer unit and the FPU execute the instructions. Floating-point instructions may require multiple cycles to complete, but the FPU is subpipelined so it can handle parts of four instructions simultaneously. In stage 4, instructions access the 32-KB primary data cache. In stage 5 (writeback), the CPU finally retires the results. Tom R. Halfhill is a senior editor based in BYTE's San Mateo, California, bureau. You can send E-mail to him at thalfhill@bix.com. Copyright 1994-1998 BYTE |