created: 3/5/2001; revised: 10/15/06, 07/29/07
The scanner in the previous chapter worked by reading characters one by one, processing each one before reading the next. It was always able to process the character it had just read. Often with scanners this is not the case. Sometimes a scanner must read one character beyond the end of a token in order to determine that the token has ended. This chapter shows how scanners do this.
What are the tokens in the following section of a HTML file?
<h1>Important Heading</h1> <p> <span style="color:blue">Many words</span> </p>
Regard each complete HTML tag as a token.