Legend
| Namespace: RT.GenerexesFlexible Regular ExpressionsThe goal of Generex is to allow the same kind of pattern matching, analysis, parsing and searching as typical regular expression engines, but instead of limiting itself to matching characters in a string, it can match against any array of objects. For an example, imagine you have a list of log entries, and you need to find clusters of at least three consecutive log entries from the same day: 16/06/2013 15:15:50 17/06/2013 02:09:09 ┐ 17/06/2013 21:00:18 ├ 3 entries here 17/06/2013 23:44:28 ┘ 18/06/2013 09:39:34 19/06/2013 08:30:36 ┐ 19/06/2013 09:28:52 ├ 3 entries here 19/06/2013 14:34:12 ┘ 20/06/2013 00:07:58 20/06/2013 01:54:10 22/06/2013 07:31:00 23/06/2013 03:35:56 ┐ 23/06/2013 10:06:10 ├ 4 entries here 23/06/2013 12:05:59 │ 23/06/2013 23:43:35 ┘ 24/06/2013 19:46:16 25/06/2013 17:42:17 26/06/2013 11:29:37
This sounds like the typical kind of pattern matching that regular expressions are made for, but
instead of a Generex uses the power of generics to bring you regular expression functionality on any type. As a trade-off, instead of an extremely terse text-based syntax, it uses method calls to construct the regular expression. Here are only a few basic examples: Regex │ Generex ═════════╪════════════════════ a │ Generex.New('a') ab │ a.Then(b) a|b │ a.Or(b) a*? │ a.Repeat() a* │ a.RepeatGreedy() (?=a) │ a.LookAhead() (?<=a) │ a.LookBehind() (?>a) │ a.Atomic() For example, a regular expression such as the following: var regex = new Regex(@"(?:x[yz])*"); could be coded in Generex in either of the following ways: var regex = Generex.New('x').Then(Generex.New('y').Or('z')).RepeatGreedy(); var regex = Generex.New('x').Then(ch => ch == 'y' || ch == 'z').RepeatGreedy();
This is of course significantly more verbose, although some programmers may find it more readable. It is
certainly easier to remember the words “look behind” than it is to remember the fairly cryptic code
However, a far more compelling advantage is that this code is entirely compiled and type-checked by the C# compiler. There is no risk of having a runtime exception thrown due to syntax errors. You also won’t be using string interpolation to construct such an expression, which would conceivably open up injection vulnerabilities. Matching, replacing, etc.
Generex has the same matching methods as Regex ( Recursive regular expressionsNow, let’s look at a time-honoured classic of regular expression matching: The Matching Parentheses. ((()()(()))())(()(())()(())) The regular expression given on MSDN for matching these looks like this: ^(((?<Open>\())+((?<Close-Open>\)))+)*(?(Open)(?!))$ It should be immediately clear that this is a kludge. This expression uses features that were shoehorned into an existing syntax unsuitable for this task. In Generex, the parentheses matcher looks like this: var parenthesisMatcher = Generex<char>.Recursive(inner => Generex.New('(') .Then(inner) .Then(')') .RepeatGreedy()); Result objectsEvery match of a normal regular expression gives you an index and length, so you can extract the part of the string that matched. You can also use capturing groups to get submatches inside matches. But still, the only thing you can get out of that is a substring. You cannot use regular expressions to parse a structure and represent that structure as an object of your own choosing. For example, let’s say you have a string with matching parentheses as before, but this time, every parenthesis has a single-character name: (a(b(c)(d)(e(f)))(g))(h(i)(j(k))(l)(m(n)))
You can use a regular expression to check that the parentheses match up, but then all you get is a single
match that matches the whole string. You’re still left with the task of figuring out how the parentheses are
nested. For example, is the We can modify the recursive Generex to match this kind of string: var parenthesisMatcher = Generex.Recursive<char>(inner => Generex.New('(') .Then(Generex<char>.Any) // matches any single character .Then(inner) .Then(')') .RepeatGreedy()); but we can do better. Let’s declare ourselves a class that would contain information about such a parenthesis: public sealed class Parenthesis { public char Character; public Parenthesis[] InnerParentheses; }
We want Generex to instantiate and populate this class for us. For this, we use the var parenthesisMatcher = Generex.Recursive<char, Parenthesis[]>(inner => Generex.New('(') .Then( Generex<char>.Any // Matches any single character. .Process(m => m.Match[0]) // The character is now the result object. ) .Then(inner, // ch = the character we matched earlier (the previous result object) (ch, innerMatch) => new Parenthesis { Character = ch, InnerParentheses = innerMatch.Result // type Parenthesis[] } ) // The result object is now a Parenthesis object. .Then(')') .RepeatGreedy() // The result object is now IEnumerable<Parenthesis>. .Process(m => m.Result.ToArray())); // The result object is now Parenthesis[].
We can use this public override string ToString() { return string.Format("[ {0} {1}]", Character, string.Join<Parenthesis>(", ", InnerParentheses)); } Now just match the input string from earlier: var input = "(a(b(c)(d)(e(f)))(g))(h(i)(j(k))(l)(m(n)))"; var match = parenthesisMatcher.Match(input.ToCharArray()); Console.WriteLine(string.Join<Parenthesis>(", ", match.Result)); and we get the following output: [ a [ b [ c ], [ d ], [ e [ f ]]], [ g ]], [ h [ i ], [ j [ k ]], [ l ], [ m [ n ]]] Stringerex
Even though Generex is intended to match collections of objects, matching against strings is still the most
common use-case. The above examples use Additionally, there is a static class called Stringerexes, which has a few useful pre-defined building-blocks. For example, there is a Stringerexes.Integer which matches any integer (including negative) which you can just use anywhere inside a more complex Stringerex. It already has the integer associated with it as the result object. Additional FeaturesHere are a few Generex operators that do not exist in typical regular expressions. Note that each of these links link to a specific overload and there may be many others taking other parameter types:
Types in this namespace
|