No, sometimes it is part of a token, as in the literal -183
.
Breaking tokens out of a stream of characters can get a little tricky, so it is nice we have finite-state transducers to organize our thoughts.
The first example will be a utility program that reads a HTML file (or other text file) and finds every URL it contains. The file is an ordinary text file with the URLs embedded throughout the text. For example, if the input file is this:
Here is an interesting URL: http://chortle and here is another one http://chortle.ccsu.edu/cs151/Notes/Chap04/ch04_12.html of great interest. This one is familiar http://google.com as is this http://www.audubon.org/photography
The output to the monitor will be this:
http://chortle http://chortle.ccsu.edu/cs151/Notes/Chap04/ch04_12.html http://google.com http://www.audubon.org/photography Done
Such a utility might be used in a Web crawler, or a link verifier program.
There can be many parts of a URL. But for our purpose, a URL looks like this:
http://string
Upper case HTTP or even mixed case HttP
is allowed.
Once the
http://
part is seen, the rest (for us) is
a string of characters.
(The URL
class of Java is much more powerful than this.)
How do you know where the URL ends?