thumb|An L7A1045 DSP chip, as used in several [[Sampler (musical instrument)#Akai|Akai samplers and the Hyper Neo Geo 64 arcade board]]
thumb|The [[NeXTcube from 1990 had a Motorola 68040 (25 MHz) and a Motorola 56001 digital signal processor (also 25 MHz), which was directly accessible via an interface.]]
A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing. DSPs are fabricated on metal–oxide–semiconductor (MOS) integrated circuit chips. They are widely used in audio signal processing, telecommunications, digital image processing, radar, sonar and speech recognition systems, and in common consumer electronic devices such as mobile phones, disk drives and high-definition television (HDTV) products. DSPs often use special memory architectures that are able to fetch multiple data or instructions at the same time.
Overview
thumb|410px|A typical digital processing system
Digital signal processing (DSP) algorithms typically require a large number of mathematical operations to be performed quickly and repeatedly on a series of data samples. Signals (perhaps from audio or video sensors) are constantly converted from analog to digital, manipulated digitally, and then converted back to analog form. Many DSP applications have constraints on latency; that is, for the system to work, the DSP operation must be completed within some fixed time, and deferred (or batch) processing is not viable.
Most general-purpose microprocessors and operating systems can execute DSP algorithms successfully, but are not suitable for use in portable devices such as mobile phones and PDAs because of power efficiency constraints.
The architecture of a DSP is optimized specifically for digital signal processing. Most also support some of the features of an applications processor or microcontroller, since signal processing is rarely the only task of a system. Some useful features for optimizing DSP algorithms are outlined below.
Architecture
Software architecture
By the standards of general-purpose processors, DSP instruction sets are often highly irregular; while traditional instruction sets are made up of more general instructions that allow them to perform a wider variety of operations, instruction sets optimized for digital signal processing contain instructions for common mathematical operations that occur frequently in DSP calculations. Both traditional and DSP-optimized instruction sets are able to compute any arbitrary operation but an operation that might require multiple ARM or x86 instructions to compute might require only one instruction in a DSP-optimized instruction set.
One implication for software architecture is that hand-optimized assembly-code routines (assembly programs) are commonly packaged into libraries for re-use, instead of relying on advanced compiler technologies to handle essential algorithms. Even with modern compiler optimizations, hand-optimized assembly code is more efficient, and many common algorithms involved in DSP calculations are hand-written in order to take full advantage of the architectural optimizations.
Instruction sets
- multiply–accumulates (MACs, including fused multiply–add, FMA) operations
- used extensively in all kinds of matrix operations
- convolution for filtering
- dot product
- polynomial evaluation
- Fundamental DSP algorithms depend heavily on multiply–accumulate performance
- FIR filters
- Fast Fourier transform (FFT)
- related instructions:
- SIMD
- VLIW
- Specialized instructions for modulo addressing in ring buffers and bit-reversed addressing mode for FFT cross-referencing
- DSPs sometimes use time-stationary encoding to simplify hardware and increase coding efficiency.
- Multiple arithmetic units may require memory architectures to support several accesses per instruction cycle – typically supporting reading 2 data values from 2 separate data buses and the next instruction (from the instruction cache, or a 3rd program memory) simultaneously.
- Special loop controls, such as architectural support for executing a few instruction words in a very tight loop without overhead for instruction fetches or exit testing—such as zero-overhead looping and hardware loop buffers.
Data instructions
- Saturation arithmetic, in which operations that produce overflows will accumulate at the maximum (or minimum) values that the register can hold, rather than wrapping around (maximum + 1 doesn't overflow to minimum, as in many general-purpose CPUs, instead it stays at maximum). Sometimes various sticky bits operation modes are available.
- Fixed-point arithmetic is often used to speed up arithmetic processing.
- Single-cycle operations to increase the benefits of pipelining.
Program flow
- Floating-point unit integrated directly into the datapath
- Pipelined architecture
- Highly parallel multiplier–accumulators (MAC units)
- Hardware-controlled looping, to reduce or eliminate the overhead required for looping operations
Hardware architecture
Memory architecture
DSPs are usually optimized for streaming data and use special memory architectures that are able to fetch multiple data or instructions at the same time, such as the Harvard architecture or Modified von Neumann architecture, which use separate program and data memories (sometimes even concurrent access on multiple data buses).
DSPs can sometimes rely on supporting code to know about cache hierarchies and the associated delays. This is a tradeoff that allows for better performance. In addition, extensive use of DMA is employed.
Addressing and virtual memory
DSPs frequently use multi-tasking operating systems, but have no support for virtual memory or memory protection. Operating systems that use virtual memory require more time for context switching among processes, which increases latency.
- Hardware modulo addressing
- Allows circular buffers to be implemented without having to test for wrapping
- Bit-reversed addressing, a special addressing mode
- useful for calculating FFTs
- Exclusion of a memory management unit
- Address generation unit
History
thumb|TRW TDC1010 multiplier-accumulator
Development
In 1976, Richard Wiggins proposed the Speak & Spell concept to Paul Breedlove, Larry Brantingham, and Gene Frantz at Texas Instruments' Dallas research facility. Two years later in 1978, they produced the first Speak & Spell, with the technological centerpiece being the TMS5100, the industry's first digital signal processor. It also set other milestones, being the first chip to use linear predictive coding to perform speech synthesis. The chip was made possible with a 7 μm PMOS fabrication process.
In 1978, American Microsystems (AMI) released the S2811. The S2281 was the first integrated circuit chip specifically designed as a DSP, and fabricated using vertical metal oxide semiconductor (VMOS, V-groove MOS), a technology that had previously not been mass-produced. It had an on-chip ADC/DAC with an internal signal processor, but it didn't have a hardware multiplier and was not successful in the market.
In 1980, the first stand-alone complete DSPs, the NEC μPD7720, based on the modified Harvard architecture and AT&T's DSP1 – were presented at the International Solid-State Circuits Conference '80. Both processors were inspired by the research in public switched telephone network (PSTN) telecommunications. The μPD7720, introduced for voiceband applications, was one of the most commercially successful early DSPs.
NXP Semiconductors produce DSPs based on TriMedia VLIW technology, optimized for audio and video processing. In some products, the DSP core is hidden as a fixed-function block in a SoC, but NXP also provides a range of flexible single-core media processors. The TriMedia media processors support both fixed-point arithmetic as well as floating-point arithmetic, and have specific instructions to deal with complex filters and entropy coding.
CSR produces the Quatro family of SoCs that contain one or more custom Imaging DSPs optimized for processing document image data for scanner and copier applications.
Microchip Technology produces the PIC24 based dsPIC line of DSPs. Introduced in 2004, the dsPIC is designed for applications needing a true DSP as well as a true microcontroller, such as motor control and in power supplies. The dsPIC runs at up to 40MIPS, and has support for 16-bit fixed-point MAC, bit reverse and modulo addressing, as well as DMA.
Most DSPs use fixed-point arithmetic because, in real-world signal processing, the additional range provided by floating point is not needed, and there is a large speed and cost benefit due to reduced hardware complexity. Floating-point DSPs may be invaluable in applications where a wide dynamic range is required. Product developers might also use floating-point DSPs to reduce the cost and complexity of software development in exchange for more expensive hardware, since it is generally easier to implement algorithms in floating point.
Generally, DSPs are dedicated integrated circuits; however, DSP functionality can also be produced by using field-programmable gate array chips (FPGAs).
Embedded general-purpose RISC processors are becoming increasingly DSP-like in functionality. For example, the OMAP3 processors include an ARM Cortex-A8 and C6000 DSP.
In communications, a new breed of DSPs offering the fusion of both DSP functions and hardware acceleration functions is making its way into the mainstream. Such Modem processors include ASOCS ModemX and CEVA's XC4000.
In May 2018, Huarui-2 designed by Nanjing Research Institute of Electronics Technology of China Electronics Technology Group passed acceptance. With a processing speed of 0.4 TFLOPS, the chip can achieve better performance than current mainstream DSP chips. The design team has begun to create Huarui-3, which has a processing speed in TFLOPS level and support for artificial intelligence.
DSP-based tuners for analog radio
[[File:Panasonic RF-2400D FM AM portable radio - Front.jpg|thumb|right|Panasonic RF-2400D AM/FM radio. Despite a modern DSP-based internal design-->
</references>
External links
- DSP Online Book
- Pocket Guide to Processors for DSP - Berkeley Design Technology, INC
