The dangling else is a problem in programming of parser generators in which an optional else clause in an if–then(–else) statement can make nested conditional statements ambiguous. Formally, the reference context-free grammar of the language is ambiguous, meaning there is more than one correct parse tree.

Description

In many programming languages, one may write conditionally executed code in two forms: the if-then form, or the if-then-else form. (The else clause is optional.):

if a then s

if b then s1 else s2

Ambiguous interpretation becomes possible when there are nested statements; specifically when an if-then-else form replaces the statement <code>s</code> inside the above if-then construct:

if a then if b then s1 else s2

In this example, <code>s1</code> gets executed if and only if <code>a</code> is true and <code>b</code> is true. But what about <code>s2</code>? One person might be sure that <code>s2</code> gets executed whenever <code>a</code> is false (by attaching the else to the first if), while another person might be sure that <code>s2</code> gets executed only when <code>a</code> is true and <code>b</code> is false (by attaching the else to the second if). In other words, someone could interpret the previous statement as being equivalent to either of the following unambiguous statements:

if a then { if b then s1 } else s2

if a then { if b then s1 else s2 }

The dangling-else problem dates back to ALGOL 60, and subsequent languages have resolved it in various ways. In LR parsers, the dangling else is the archetypal example of a shift-reduce conflict.

Examples

Concrete examples follow.

C

In C, the grammar reads, in part:

<pre>

statement = ...

| selection-statement

selection-statement = ...

| IF ( expression ) statement

| IF ( expression ) statement ELSE statement</pre>

Thus, without further rules, the statement

<syntaxhighlight lang="C">

if (a) if (b) s; else s2;

</syntaxhighlight>

could ambiguously be parsed as if it were either:

<syntaxhighlight lang="C">

if (a)

{

if (b)

s;

else

s2;

}

</syntaxhighlight>

or:

<syntaxhighlight lang="C">

if (a)

{

if (b)

s;

}

else

s2;

</syntaxhighlight>

The C standard clarifies that an <code>else</code> block is associated with the nearest <code>if</code>. allowing for unambiguous context-free grammars, in particular. Programming languages like Pascal, C, and Java follow this convention, so there is no ambiguity in the semantics of the language, though the use of a parser generator may lead to ambiguous grammars. In these cases alternative grouping is accomplished by explicit blocks, such as <code>begin...end</code> in Pascal and <code>{...}</code> in C.

Depending on the compiler construction approach, one may take different corrective actions to avoid ambiguity:

  • If the parser is produced by an SLR, LR(1), or LALR LR parser generator, the programmer will often rely on the generated parser feature of preferring shift over reduce whenever there is a conflict.

Possible solutions are:

  • Having an "end if" symbol delimiting the end of the if construct. Examples of such languages are ALGOL 68, Ada, Eiffel, PL/SQL, Visual Basic, Modula-2, and AppleScript.
  • Disallowing the statement following a "then" to be an "if" itself (it may however be a pair of statement brackets containing only an if-then-clause). Some implementations of ALGOL 60 follow this approach.
  • Requiring braces (parentheses) when an "else" follows an "if".
  • Requiring every "if" to be paired with an "else". Haskell works this way as a result of its purely functional semantics.
  • Using different keywords for the one-alternative and two-alternative "if" statements. S-algol uses <code>if e do s</code> for the one-alternative case and <code>if e1 then e2 else e3</code> for the general case.
  • Requiring braces unconditionally, like Swift and Rust. This is effectively true in Python as its indentation rules delimit every block, not just those in "if" statements.

Avoiding the conflict in LR parsers

The above example could be rewritten in the following way to remove the ambiguity :

<pre>

statement: open_statement

| closed_statement

;

open_statement: IF '(' expression ')' statement

| IF '(' expression ')' closed_statement ELSE open_statement

;

closed_statement: non_if_statement

| IF '(' expression ')' closed_statement ELSE closed_statement

;

non_if_statement: ...

;

</pre>

Any other statement-related grammar rules may also have to be duplicated in this way if they may directly or indirectly end with a <code>statement</code> or <code>selection-statement</code> non-terminal.

However, we give grammar that includes both of if and while statements.

<pre>

statement: open_statement

| closed_statement

;

open_statement: IF '(' expression ')' statement

| IF '(' expression ')' closed_statement ELSE open_statement

| WHILE '(' expression ')' open_statement

;

closed_statement: simple_statement

| IF '(' expression ')' closed_statement ELSE closed_statement

| WHILE '(' expression ')' closed_statement

;

simple_statement: ...

;

</pre>

Finally, we give the grammar that forbids ambiguous IF statements.

<pre>

statement: open_statement

| closed_statement

;

open_statement: IF '(' expression ')' statement

| IF '(' expression ')' closed_statement ELSE open_statement

| WHILE '(' expression ')' open_statement

;

closed_statement: simple_statement

| IF '(' expression ')' closed_statement ELSE closed_statement

| WHILE '(' expression ')' closed_statement

;

simple_statement: ...

;

</pre>

With this grammar the statement <code>if (a) if (b) c else d</code> can only be parsed one way, because the other interpretation (<code>if (a) {if (b) c} else d</code>) is produced as

<pre>

statement

open_statement

IF '(' expression ')' closed_statement ELSE open_statement

'if' '(' 'a' ')' closed_statement 'else' 'd'

</pre>

and then the parsing fails trying to match <code>closed_statement</code> to "if (b) c". An attempt with <code>closed_statement</code> fails in the same way. The other parse, <code>if (a) {if (b) c else d}</code>) succeeds:

<pre>

statement

open_statement

IF '(' expression ')' statement

IF '(' expression ')' closed_statement

IF '(' a ')' (IF '(' expression ')' closed_statement ELSE closed_statement)

IF '(' a ')' (IF '(' b ')' c ELSE 'd')

</pre>

See also

  • Lexer hack
  • Most vexing parse

References