created: 01/10/01; revised: 10/14/06, 07/15/07
This is the first of two chapters that discuss how scanners are built. A scanner is a program that reads a stream of characters and divides them into a sequence of groups called tokens. A token is a basic symbol of a programming language. For example, in Java the tokens are the keywords, identifiers, separators, literals, and operators. A scanner is the first phase of a compiler. Natural language processing programs also use scanners.
A scanner is usually implemented as a finite-state transducer. This chapter extends the ideas of the previous sections.
The Java class Scanner
(beginning with Java 1.5)
implements the idea of a scanner, and can be
used for many scanning tasks.
However, sometimes you need to write a scanner. The ideas discussed in this chapter will be useful.
Would a spelling checker program use a scanner?