The Packrat parser is a type of parser that shares similarities with the recursive descent parser in its construction. However, it differs because it takes parsing expression grammars (PEGs) as input rather than LL grammars.
In 1970, Alexander Birman laid the groundwork for packrat parsing by introducing the "TMG recognition scheme" (TS), and "generalized TS" (gTS). TS was based upon Robert M. McClure's TMG compiler-compiler, and gTS was based upon Dewey Val Schorre's META compiler-compiler.
Birman's work was later refined by Aho and Ullman; and renamed as Top-Down Parsing Language (TDPL), and Generalized TDPL (GTDPL), respectively. These algorithms were the first of their kind to employ deterministic top-down parsing with backtracking.
Bryan Ford developed PEGs as an expansion of GTDPL and TS. Unlike CFGs, PEGs are unambiguous and can match well with machine-oriented languages. PEGs, similar to GTDPL and TS, can also express all LL(k) and LR(k). Ford also introduced Packrat as a parser that uses memoization techniques on top of a simple PEG parser. This was done because PEGs have an unlimited lookahead capability resulting in a parser with exponential time performance in the worst case.
Syntax
The packrat parser takes in input the same syntax as PEGs: a simple PEG is composed of terminal and nonterminal symbols, possibly interleaved with operators that compose one or several derivation rules. During the early stages of development, it was found that a production that is left-recursive can be transformed into a right-recursive production. This modification significantly simplifies the task of a Packrat parser. Nonetheless, if there is an indirect left recursion involved, the process of rewriting can be quite complex and challenging. If the time complexity requirements are loosened from linear to superlinear, it is possible to modify the memoization table of a Packrat parser to permit left recursion, without altering the input grammar. When using packrat parsing and memoization, it's noteworthy that the parsing function for each nonterminal is solely based on the input string. It does not depend on any information gathered during the parsing process. Essentially, memoization table entries do not affect or rely on the parser's specific state at any given time.
Packrat parsing stores results in a matrix or similar data structure that allows for quick look-ups and insertions. When a production is encountered, the matrix is checked to see if it has already occurred. If it has, the result is retrieved from the matrix. If not, the production is evaluated, the result is inserted into the matrix, and then returned. When evaluating the entire <math>m*n</math> matrix in a tabular approach, it would require <math>\Theta(mn)</math> space.
{| class="wikitable"
!Operator
!Semantics
|-
|Cut
<math>\begin{array}{l}
\alpha \uparrow \beta / \gamma \\
(\alpha \uparrow \beta)*
\end{array}
</math>
|if <math>\alpha
</math> is recognized but <math>\beta
</math> is not, skip the evaluation of the alternative.
In the first case don't evaluate <math>\gamma
</math> if <math>\alpha
</math> was recognized
The second rule is can be rewritten as <math>N \rightarrow \alpha \uparrow \beta N / \epsilon
</math> and the same rules can be applied.
|}
When a Packrat parser uses cut operators, it effectively clears its backtracking stack. This is because a cut operator reduces the number of possible alternatives in an ordered choice. By adding cut operators in the right places in a grammar's definition, the resulting Packrat parser only needs a nearly constant amount of space for memoization. However, the fundamental problem that packrat parsers require O(n) space has not been resolved. || Packrat (partial memoization) || JavaScript || || || , MIT
|-
| Pegasus || Recursive descent, Packrat (selectively) || C# || || || , MIT
|-
| PetitParser || Packrat || Smalltalk, Java, Dart || || || , MIT
|-
| PyPy rlib || Packrat || Python || || || , MIT
|-
| Rats! || Packrat || Java || || || , GNU LGPL
|-
| go-packrat || Packrat || Go || Identical || All || , GPLv3
|}
See also
- CYK algorithm
- Context-free grammar
- Parsing algorithms
- Earley parser
References
External links
- Packrat Parsing: Simple, Powerful, Lazy, Linear Time
- Parsing Expression Grammars: A Recognition-Based Syntactic Foundation
