A finite automaton is another way to describe a set of strings.
Here is a regular expression:
a+b+
In a Java program, a String
literal is delimited (is enclosed) by double quote marks.
Here is how the above regular expression would appear in a Java source program:
a+b+
The actual regular expression is the four characters inside the quotes. The quote marks themselves are not part of the regular expression. In the following pages, regular expressions will usually be delimited with quotes because that is how they would appear in programs.
The regular expression above describes strings that start with one or more characters 'a' followed by one or more characters 'b'. How this works will be explained later. The finite automaton at right describes the same strings as does the regular expression.
Space characters can be part of a regular expression. There might be leading and/or trailing spaces. The regular expression
danger
is not the same as the regular expression
danger
and is not the same as
danger
Aside: some programming languages (such as JavaScript) delimit regular exressions with slash, so the above three regular expressions would be written
/danger/
/ danger/
/danger /
The set of strings described by a regular expression (RE) is a regular language, a type of formal language. A formal language is a set of strings, where the strings are sequences of symbols selected from a finite alphabet, and the form of a string is described by rules. A RE is a way of writing a rule that determine what strings are in a formal language. A finite automaton is another way of showing the strings of a formal language.
(Review: ) Are there sets of strings that cannot be described with finite automata?