go to previous page   go to home page   go to next page
<h1>Important Heading</h1>


When the < has been read.

One Character Pushback


The < is part of the next token, so it can't be discarded. Somehow it must be saved. This is done by pretending to send send it back to the input stream so that the next time a character is read, the pushed-back character will be returned. This is done by writing our own read method that holds onto the pushed-back character. We'll get to the details in a while.

For now, think about the finite automaton. It starts in the start state and remains there as white space characters are consumed. A < character sends it to the tag state, and anything else sends it to the word state.

While in the tag state or in the word state, the automaton reads through characters until it hits a delimiter: a > character for the tag state and either white space or a < character for the word state.


Mentally label the automaton.