Regular Expressions
Mitchell uses the regular expression library from SML/NJ. The documentation for that library is here.
Upstream Documentation Errata
There is an error in the documentation for the library. For the find,
prefix, and match functions, the type of the match is listed as
{pos : 'a, len : int} option MatchTree.match_tree
when it should be
{pos : 'a, len : int} MatchTree.match_tree
The option constructors that appear when pattern-matching on the results of
find and prefix below are part of the
StringCvt.reader
type. Thus, a MatchTree.match_tree always contains at least one match, which
can be accessed using MatchTree.root.
Basic Usage
The regular expression library has a number of configurable parts. The examples
below show some reliable defaults and basic usage patterns. All of the examples
begin by creating the module using awk-like
regular expression
syntax
and the backtracking based matching engine.
structure R = RegExpFn(structure P = AwkSyntax; structure E = BackTrackEngine)
Has Match
To check if any part of a string matches a regular expression, the following can be used.
(* Just check for a match *)
fun hasMatch reg str =
let
val r = R.compileString reg
val s = Substring.full str
in
Option.isSome (R.find r Substring.getc s)
end
val res = hasMatch "(A|B)B*" "CABBBCB"
val _ = print (Bool.toString res ^ "\n")
This results in the output
true
Prefix Has Match
To check if a prefix of a string matches a regular expression, the following can be used.
fun hasPrefixMatch reg str =
let
val r = R.compileString reg
val s = Substring.full str
in
Option.isSome (R.prefix r Substring.getc s)
end
val res = hasPrefixMatch "(A|B)B*" "CABBBCB"
val _ = print (Bool.toString res ^ "\n")
val res = hasPrefixMatch "(A|B)B*" "ABBBCB"
val _ = print (Bool.toString res ^ "\n")
This results in the output
false
true
Offset and Length of Match
To find the offset and length of a match, the following can be used:
fun matchPosition reg str =
let
val r = R.compileString reg
val s = Substring.full str
in
case R.find r Substring.getc s of
NONE => NONE (* No match *)
| SOME (m, _) =>
let
val {pos=pos, len=len} = MatchTree.root m
val (_, matchOffset, _) = Substring.base pos
in
(* If the string were from something other than Substring.full, we
would have to subtract the offset of the outer substring.
*)
SOME (matchOffset, len)
end
end
val res = matchPosition "(A|B)B*" "CDABBBCB" (* true *)
val _ = case res of
NONE => print "NONE\n"
| SOME (off, len) =>
print ("{off=" ^ Int.toString off ^ ", len=" ^ Int.toString len ^ "}\n")
This results in the output
{off=2, len=4}
Matching String
To get the first matching string, the following can be used:
fun matchedString reg str =
let
val r = R.compileString reg
val s = Substring.full str
in
case R.find r Substring.getc s of
NONE => NONE
| SOME (MatchTree.Match ({pos=pos, len=len}, _), _) =>
SOME (Substring.slice (pos, 0, SOME len))
end
val _ = case matchedString "(A|B)B*" "CDABBBCB" of
NONE => print "NONE\n"
| SOME str => print (Substring.string str ^ "\n")
This results in the output
ABBB