go to previous page   go to home page   go to next page


No, sometimes it is part of a token, as in the literal -183.

Breaking tokens out of a stream of characters can get a little tricky, so it is nice we have finite-state transducers to organize our thoughts.

URL Finder

The first example will be a utility program that reads a HTML file (or other text file) and finds every URL it contains. The file is an ordinary text file with the URLs embedded throughout the text. For example, if the input file is this:

Here is an interesting URL: http://chortle  
and here is another one http://chortle.ccsu.edu/cs151/Notes/Chap04/ch04_12.html of great interest.
This one is familiar http://google.com as is this http://www.audubon.org/photography

The output to the monitor will be this:



Such a utility might be used in a Web crawler, or a link verifier program.

There can be many parts of a URL. But for our purpose, a URL looks like this:


Upper case HTTP or even mixed case HttP is allowed. Once the http:// part is seen, the rest (for us) is a string of characters. (The URL class of Java is much more powerful than this.)


How do you know where the URL ends?