import java.io.*; class WebScanner { MyPushbackReader in; String token; // The current token int ich; // input character as int char ch; // current input character WebScanner ( Reader rdr ) { in = new MyPushbackReader( rdr ); } private boolean isWhiteSpace( char ch ) { if ( ch==' ' || ch=='\t' || ch=='\n' || ch=='\r' ) return true; else return false; } String nextToken() throws IOException { . . . . . } }
Now examine the nextToken
method.
The states of the automaton are represented
as integers.
The final state is not actually implemented;
when the automaton reaches the final state,
the method returns to its caller.
The scanner reads characters one at a time
until it reaches the final state or hits
end of file (signaled by a -1).
If it hits end of file, it returns null
.
While in the start state, the scanner skips over white space. The first not-white character causes a transition to the word state or the tag state. The first character of the token is stored in the buffer.
While in the tag state, the scanner
gathers up characters for the tag token
until it sees a >
character.
Then it ends to tag with that character
and returns the tag token to the caller.
While in the word state, the scanner
gathers up characters one by one
until it sees a whitespace character or
a <
character.
When the word state sees a whitespace character it returns the token to the caller.
String nextToken() throws IOException { final int start=1; // states of final int word =2; // the automaton final int tag =3; int state; StringBuffer buff = new StringBuffer(); state = start; while ( (ich = in.read()) != -1 ) { ch = (char)ich; if ( state==start && whiteSpace(ch) ) state = start; else if ( state==start && ch == '<' ) { state = tag; buff.append( ch ); } else if ( state==start ) { state = word; buff.append( ch ); } else if ( state==word && whiteSpace(ch) ) { token = buff.toString().trim() ; return token; } else if ( state==word && ch == '<' ) { in.unread( ch ); token = buff.toString().trim() ; return token; } else if ( state==word ) { buff.append( ch ); } else if ( state==tag && ch == '>' ) { buff.append( ch ); token = buff.toString().trim() ; return token; } else if ( state==tag ) { buff.append( ch ); } } return null; // end of file }
When the word state sees a <
character
it must do something in addition to returning the token
to the caller.
What must it do?