go to previous page   go to home page   go to next page

Answer:

import java.io.*;

class WebScanner
{
  MyPushbackReader in;
  String token;        // The current token
  int    ich;          // input character as int
  char   ch;           // current input character

  WebScanner ( Reader rdr )
  {
    in = new MyPushbackReader( rdr );
  }

  private boolean isWhiteSpace( char ch )
  {
    if ( ch==' ' || ch=='\t' || ch=='\n' || ch=='\r' ) 
      return true;
    else
      return false;
  }

  String nextToken() throws IOException
  {  
    . . . . .
  } 

}

NextToken

transducer

Now examine the nextToken method. The states of the automaton are represented as integers. The final state is not actually implemented; when the automaton reaches the final state, the method returns to its caller.

The scanner reads characters one at a time until it reaches the final state or hits end of file (signaled by a -1). If it hits end of file, it returns null.

While in the start state, the scanner skips over white space. The first not-white character causes a transition to the word state or the tag state. The first character of the token is stored in the buffer.

While in the tag state, the scanner gathers up characters for the tag token until it sees a > character. Then it ends to tag with that character and returns the tag token to the caller.

While in the word state, the scanner gathers up characters one by one until it sees a whitespace character or a < character.

When the word state sees a whitespace character it returns the token to the caller.

  String nextToken() throws IOException
  {  
    final int start=1; // states of 
    final int word =2; // the automaton
    final int tag  =3;
    int       state;
    StringBuffer buff = new StringBuffer();
    state = start;

    while ( (ich = in.read()) != -1 )
    {
      ch = (char)ich;

      if ( state==start && whiteSpace(ch) )
        state = start;

      else if ( state==start && ch == '<' )
      {
        state = tag;
        buff.append( ch );
      }

      else if ( state==start )
      {
        state = word;
        buff.append( ch );
      }

      else if ( state==word && whiteSpace(ch) )
      {
        token = buff.toString().trim() ;
        return token;
      }

      else if ( state==word && ch == '<' )
      {
        in.unread( ch );
        token = buff.toString().trim() ;
        return token;
      }

      else if ( state==word )
      {
        buff.append( ch );
      }

      else if ( state==tag && ch == '>' )
      {
        buff.append( ch );
        token = buff.toString().trim() ;
        return token;
      }

      else if ( state==tag )
      {
        buff.append( ch );
      }   
    } 
    return null;   // end of file
  }

QUESTION 9:

When the word state sees a < character it must do something in addition to returning the token to the caller. What must it do?