Regular Expressions
Mitchell uses the regular expression library from SML/NJ. The documentation for that library is here.
Upstream Documentation Errata
There is an error in the documentation for the library. For the find
,
prefix
, and match
functions, the type of the match is listed as
{pos : 'a, len : int} option MatchTree.match_tree
when it should be
{pos : 'a, len : int} MatchTree.match_tree
The option
constructors that appear when pattern-matching on the results of
find
and prefix
below are part of the
StringCvt.reader
type. Thus, a MatchTree.match_tree
always contains at least one match, which
can be accessed using MatchTree.root
.
Basic Usage
The regular expression library has a number of configurable parts. The examples
below show some reliable defaults and basic usage patterns. All of the examples
begin by creating the module using awk
-like
regular expression
syntax
and the backtracking based matching engine.
structure R = RegExpFn(structure P = AwkSyntax; structure E = BackTrackEngine)
Has Match
To check if any part of a string matches a regular expression, the following can be used.
(* Just check for a match *)
fun hasMatch reg str =
let
val r = R.compileString reg
val s = Substring.full str
in
Option.isSome (R.find r Substring.getc s)
end
val res = hasMatch "(A|B)B*" "CABBBCB"
val _ = print (Bool.toString res ^ "\n")
This results in the output
true
Prefix Has Match
To check if a prefix of a string matches a regular expression, the following can be used.
fun hasPrefixMatch reg str =
let
val r = R.compileString reg
val s = Substring.full str
in
Option.isSome (R.prefix r Substring.getc s)
end
val res = hasPrefixMatch "(A|B)B*" "CABBBCB"
val _ = print (Bool.toString res ^ "\n")
val res = hasPrefixMatch "(A|B)B*" "ABBBCB"
val _ = print (Bool.toString res ^ "\n")
This results in the output
false
true
Offset and Length of Match
To find the offset and length of a match, the following can be used:
fun matchPosition reg str =
let
val r = R.compileString reg
val s = Substring.full str
in
case R.find r Substring.getc s of
NONE => NONE (* No match *)
| SOME (m, _) =>
let
val {pos=pos, len=len} = MatchTree.root m
val (_, matchOffset, _) = Substring.base pos
in
(* If the string were from something other than Substring.full, we
would have to subtract the offset of the outer substring.
*)
SOME (matchOffset, len)
end
end
val res = matchPosition "(A|B)B*" "CDABBBCB" (* true *)
val _ = case res of
NONE => print "NONE\n"
| SOME (off, len) =>
print ("{off=" ^ Int.toString off ^ ", len=" ^ Int.toString len ^ "}\n")
This results in the output
{off=2, len=4}
Matching String
To get the first matching string, the following can be used:
fun matchedString reg str =
let
val r = R.compileString reg
val s = Substring.full str
in
case R.find r Substring.getc s of
NONE => NONE
| SOME (MatchTree.Match ({pos=pos, len=len}, _), _) =>
SOME (Substring.slice (pos, 0, SOME len))
end
val _ = case matchedString "(A|B)B*" "CDABBBCB" of
NONE => print "NONE\n"
| SOME str => print (Substring.string str ^ "\n")
This results in the output
ABBB