In the C programming language, Duff's device is a way of manually implementing loop unrolling by interleaving two syntactic constructs of C: the - loop and a switch statement. Its discovery is credited to Tom Duff in November 1983, when Duff was working for Lucasfilm and used it to speed up a real-time animation program.
Loop unrolling attempts to reduce the overhead of conditional branching needed to check whether a loop is done, by executing a batch of loop bodies per iteration. To handle cases where the number of iterations is not divisible by the unrolled-loop increments, a common technique among assembly language programmers is to jump directly into the middle of the unrolled loop body to handle the remainder.
Duff implemented this technique in C by using C's case label fall-through feature to jump into the unrolled body.
Duff's technique
Duff was copying 16-bit unsigned integers ("shorts" in most C implementations) from an array into a memory-mapped output register, denoted in C by a pointer. He wanted to optimize the following K&R C code:
<syntaxhighlight lang="c">
send(to, from, count)
register short *to, *from;
register count;
{
do { /* count > 0 assumed */
- to = *from++;
} while (--count > 0);
}
</syntaxhighlight>
In K&R C, any unspecified type defaults to , including variables such as and the return type of . The Register (keyword)| keyword is used to suggest that a variable ought to be kept in a CPU register, avoiding memory. Functions could not return as it was not yet a reserved keyword, but many functions effectively did so by omitting the statement, which was optional. This signature could idiomatically, but not equivalently, be expressed in a modern C prototype as <syntaxhighlight lang="c" inline>void send(short* to, short* from, int count);</syntaxhighlight>.
This code assumes that the argument passed in is greater than zero. Since the output location is a memory-mapped register, the pointer is not incremented, as it would be in copying from memory to memory.
If were always divisible by eight, unrolling this loop eight-fold would produce the following:
<syntaxhighlight lang="c">
send(to, from, count)
register short *to, *from;
register count;
{
register n = count / 8;
do {
- to = *from++;
- to = *from++;
- to = *from++;
- to = *from++;
- to = *from++;
- to = *from++;
- to = *from++;
- to = *from++;
} while (--n > 0);
}
</syntaxhighlight>
Duff realized that to handle cases where is not divisible by eight, the assembly programmer's technique of jumping into the loop body could be implemented by interlacing the structures of a switch statement and a loop, putting the switch's labels at the points of the loop body that correspond to the remainder of :
Although valid in C, Duff's device goes against common C guidelines, such as the MISRA guidelines. Some compilers (e.g. CompCert) are restricted to such guidelines and thus reject Duff's device unless specifically instructed otherwise.
Simplified explanation
{| class="wikitable floatright"
|+
! A functionally equivalent version<br />with <code>switch</code> and <code>while</code> disentangled
|-
| <syntaxhighlight lang="c">
send(to, from, count)
register short *to, *from;
register count;
{
register n = (count + 7) / 8;
switch (count % 8) {
case 0: *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
}
while (--n > 0) {
- to = *from++;
- to = *from++;
- to = *from++;
- to = *from++;
- to = *from++;
- to = *from++;
- to = *from++;
- to = *from++;
}
}
</syntaxhighlight>
|}
The basic idea of loop unrolling is that the number of instructions executed in a loop can be reduced by reducing the number of loop tests, sometimes reducing the amount of time spent in the loop. For example, in the case of a loop with only a single instruction in the block code, the loop test will typically be performed for every iteration of the loop, that is every time the instruction is executed. If, instead, eight copies of the same instruction are placed in the loop, then the test will be performed only every eight iterations, and this may gain time by avoiding seven tests. However, this only handles a multiple of eight iterations, requiring something else to handle any remainder of iterations. When numerous instances of Duff's device were removed from the XFree86 Server in version 4.0, there was an improvement in performance and a noticeable reduction in size of the executable. Therefore, before applying any program optimization, it should be benchmarked or its compiled output should be explored, to verify that it performs as expected on the target architecture, optimization level, and compiler. Additionally, the risk of the optimized code deployed on different platforms where it may not remain the fastest option should be considered.
For the purpose of memory-to-memory copies (which, as mentioned above, was not the original use of Duff's device), the standard C library provides the function <code>memcpy</code>; it will not perform worse than a memory-to-memory copy version of this code, and may contain architecture-specific optimizations that make it significantly faster.
See also
- Computed GOTO
- Coroutine Duff's device can be used to implement coroutines in C/C++ (see Tatham external link)
- Jensen's device
References
Further reading
External links
- Description and original mail by Duff at Lysator
- Simon Tatham's coroutines in C utilizes the same switch/case trick
- Adam Dunkels's Protothreads - Lightweight, Stackless Threads in C also uses nested switch/case statements (see also The lightest lightweight threads, Protothreads)
- Pigeon's device is a related technique: intertwined switch/case and if/else statements
<!--
Please be cautious adding more external links.
Wikipedia is not a collection of links and should not be used for advertising.
Excessive or inappropriate links will be removed.
See Wikipedia:External links and Wikipedia:Spam for details.
If there are already suitable links, propose additions or replacements on
the article's talk page, or submit your link to the relevant category at
DMOZ (dmoz.org) and link there using . -->
