Berkeley RISC is one of two seminal research projects into reduced instruction set computer (RISC) based microprocessor design taking place under the Defense Advanced Research Projects Agency VLSI Project. RISC was led by David Patterson (who coined the term RISC) at the University of California, Berkeley between 1980 and 1984. The other project took place a short distance away at Stanford University under their MIPS effort starting in 1981 and ran until 1984.
Berkeley's project was so successful that it became the name for all similar designs to follow; even the MIPS would become known as a "RISC processor". The Berkeley RISC design was later commercialized by Sun Microsystems as the SPARC architecture, and inspired the ARM architecture.
The RISC concept
Both RISC and MIPS were developed from the realization that the vast majority of programs used only a small minority of a processor's available instruction set. In a famous 1978 paper, Andrew S. Tanenbaum demonstrated that a complex 10,000 line high-level program could be represented using a simplified instruction set architecture using an 8-bit fixed-length opcode. This was roughly the same conclusion reached at IBM, whose studies of their own code running on mainframes like the IBM 360 used only a small subset of all the instructions available. Both of these studies suggested that one could produce a much simpler CPU that would still run most real-world code. Another finding, not fully explored at the time, was Tanenbaum's note that 81% of the constants were either 0, 1, or 2.
RISC I also featured a two-stage instruction pipeline for additional speed, but without the complex instruction re-ordering of more modern designs. This makes conditional branches a problem, because the compiler has to fill the instruction following a conditional branch (the so-called branch delay slot), with something selected to be "safe" (i.e., not dependent on the outcome of the conditional). Sometimes the only suitable instruction in this case is <code>NOP</code>. A notable number of later RISC-style designs still require the consideration of branch delay.
After a month of validation and debugging, the design was sent to the innovative MOSIS service for production on June 22, 1981, using a 2 μm (2,000 nm) process. A variety of delays forced them to abandon their masks four separate times, and wafers with working examples did not arrive back at Berkeley until May 1982. The first working RISC I "computer" (actually a checkout board) ran on June 11. In testing, the chips proved to have lesser performance than expected. In general, an instruction would take 2 μs to complete, while the original design allotted for about .4 μs (five times as fast). The precise reasons for this problem were never fully explained. However, throughout testing it was clear that certain instructions did run at the expected speed, suggesting the problem was physical, not logical.
Had the design worked at full speed, performance would have been excellent. Simulations using a variety of small programs compared the 4 MHz RISC I to the 5 MHz 32-bit VAX 11/780 and the 5 MHz 16-bit Zilog Z8000 showed this clearly. Program size was about 30% larger than the VAX but very close to that of the Z8000, validating the argument that the higher code density of CISC designs was not actually all that impressive in reality. In terms of overall performance, the simulations indicated that a full-speed RISC I would have been twice as fast as the VAX, and about four times that of the Z8000. The programs ended up performing about the same overall number of memory accesses because the large register file dramatically improved the odds the needed operand was already on-chip.
It is important to put this performance in context. Even though the RISC I hardware had run slower than the VAX, it made no difference to the importance of the design. RISC allowed for the production of a true 32-bit processor on a real chip die using what was already an older fab. Traditional designs simply could not do this; with so much of the chip surface dedicated to decoder logic, a true 32-bit design like the Motorola 68020 required newer fabs before becoming practical. Using the same fabs, RISC I could have largely outperformed the competition.
On February 12, 2015, IEEE installed a plaque at UC Berkeley to commemorate the contribution of RISC-I. The plaque reads:
- UC Berkeley students designed and built the first VLSI reduced instruction set computer in 1981. The simplified instructions of RISC-I reduced the hardware for instruction decode and control, which enabled a flat 32-bit address space, a large set of registers, and pipelined execution. A good match to C programs and the Unix operating system, RISC-I influenced instruction sets widely used today, including those for game consoles, smartphones and tablets.
RISC II
thumb|RISC II die shot
While the RISC I design ran into delays, work at Berkeley had already turned to the new Blue design. Work on Blue progressed slower than Gold, due both to the lack of a pressing need now that Gold was going to fab, and to changeovers in the classes and students staffing the effort. This pace also allowed them to add in several new features that would end up improving the design considerably.
The key difference was simpler cache circuitry that eliminated one line per bit (from three to two), dramatically shrinking the register file size. The change also required much tighter bus timing, but this was a small price to pay and in order to meet the needs several other parts of the design were sped up as well.
The savings due to the new design were tremendous. Whereas Gold contained a total of 78 registers in 6 windows, Blue contained 138 registers broken into 8 windows of 16 registers each, with another 10 globals. This expansion of the register file increases the chance that a given procedure can fit all of its local storage in registers, and increase the nesting depth. Nevertheless, the larger register file required fewer transistors, and the final Blue design, fabbed as RISC II, implemented all of the RISC instruction set with only 40,760 transistors.
The other major change was to include an instruction-format expander, which invisibly "up-converted" 16-bit instructions into a 32-bit format.
This allowed smaller instructions, typically things with one or no operands, like <code>NOP</code>, to be stored in memory in a smaller 16-bit format, and for two such instructions to be packed into a single machine word. The instructions would be invisibly expanded back to 32-bit versions before they reached the arithmetic logic unit (ALU), meaning that no changes were needed in the core logic. This simple technique yielded a surprising 30% improvement in code density, making an otherwise identical program on Blue run faster than on Gold due to the decreased number of memory accesses.
RISC II proved to be much more successful in silicon and in testing outperformed almost all minicomputers on almost all tasks. For instance, performance ranged from 85% of VAX speed to 256% on a variety of loads. RISC II was also benched against the famous Motorola 68000, then considered to be the best commercial chip implementation, and outperformed it by 140% to 420%.
Follow-ons
Work on the original RISC designs ended with RISC II, but the concept lived on at Berkeley. The basic core was re-used in SOAR in 1984, basically a RISC converted to run Smalltalk (in the same way that it could be claimed RISC ran C), and later in the similar VLSI-BAM that ran Prolog instead of Smalltalk. Another effort was SPUR, which was a full set of chips needed to build a full 32-bit workstation.
The RISC concept, as developed in the Berkeley RISC, Stanford MIPS, and IBM 801 projects, influenced several commercial ISAs in the mid 1980s. Acorn Computers, in collaboration with silicon partner VLSI Technology, developed the ARM architecture, shipping ARM Evaluation Systems with their second generation ARM chipsets from July 1986, and a range of desktop computers, branded Acorn Archimedes, advertised as capable of 4 MIPS, from June 6, 1987. Hewlett Packard introduced its own PA-RISC ISA, also in 1986, in new models of its HP 3000 and HP 9000 series. Sun Microsystems, in collaboration with silicon partner Fujitsu, shipped their own SPARC ISA, from July 8, 1987, in their Sun 4/260, a machine advertised as offering 10 MIPS. MIPS Computer Systems, founded in 1984 to commercialize the work of the Stanford MIPS project, developed the MIPS architecture and MIPS processors starting with the R2000; Silicon Graphics (SGI) replaced the Motorola 68000 series processors in their workstations with MIPS processors, eventually purchasing MIPS, and Digital Equipment Corporation used MIPS processors in their DECstation workstations. IBM developed the ROMP RISC processor, used in the IBM RT PC, and the POWER architecture, used in the RS/6000 series. By the late 1980s, most large chip vendors followed, working on efforts like the Motorola 88000, Fairchild Clipper, and AMD 29000. The performance and efficiency of the systems exceeded the previous generation of CISC CPUs.
In the early 1990s, Apple, IBM, and Motorola formed the AIM alliance, which developed the PowerPC architecture, based on IBM's POWER architecture, with PowerPC processors sold both by IBM and Motorola, and used by Apple to replace the Motorola 68000 series processors in their Macintosh computers. Digital Equipment Corporation (DEC) had several RISC projects in development since the early 1980s, eventually settling on the DEC PRISM, but that project was canceled; in the early 1990s, a subsequent project produced the DEC Alpha.
On February 13, 2015, IEEE installed a plaque at Oracle Corporation in Santa Clara. It reads
- Sun Microsystems introduced the Scalable Processor Architecture (SPARC) RISC in July 1987. Building on UC Berkeley RISC and Sun compiler and operating system developments, SPARC architecture was highly adaptable to evolving semiconductor, software, and system technology and user needs. The architecture delivered the highest performance, scalable workstations and servers, for engineering, business, Internet, and cloud computing uses.
Techniques developed for and alongside the idea of the reduced instruction set have also been adopted in successively more powerful implementations and extensions of the traditional "complex" x86 architecture. Much of a modern microprocessor's transistor count is devoted to large caches, many pipeline stages, superscalar instruction dispatch, branch prediction and other modern techniques which are applicable regardless of instruction architecture. The amount of silicon dedicated to instruction decoding on a modern x86 implementation is proportionately quite small, so the distinction between "complex" and RISC processor implementations has become blurred.
See also
- RISC-V
- Power ISA
- PA-RISC
References
Citations
Bibliography
- Berkeley RISC II
