J
John otac0n Gietzen
Dear RegEx Gurus,
I am writing an application to evaluate mathematics functions. The
first step in the process of creating the expressions is tokenizing the
input. I decided to use one large regular expression to preform this
tokenization:
~\G([a-zA-Z]\w*\(|[a-zA-Z]\w*|(<=|>=|!=|<>|==|=)|0x[\da-fA-F.]*|0b[\d.]*|[\d.]*|\s*|.)~
Now, according to my intuition, this should work. However, any time a
single character that is not explicitly recognized as a token comes by,
the regex engine returns two matches: one empty and one of the correct
character.
To simplify this odd behavior, I have prepared the following example:
Match the string
abcdefghijklmnop
to the expression
~\G(a|b|c*|\w)~
This "anomaly" is seen in the Perl, PHP, and C# regex engines (which
makes me think that it is expected behavior). The final destination
for this regex is C#, so I can not just ignore null entries. (The C#
regex engine stops after the first null match.) Any help or advice
would be much appreciated.
Sincerely,
John "Otac0n" Gietzen
I am writing an application to evaluate mathematics functions. The
first step in the process of creating the expressions is tokenizing the
input. I decided to use one large regular expression to preform this
tokenization:
~\G([a-zA-Z]\w*\(|[a-zA-Z]\w*|(<=|>=|!=|<>|==|=)|0x[\da-fA-F.]*|0b[\d.]*|[\d.]*|\s*|.)~
Now, according to my intuition, this should work. However, any time a
single character that is not explicitly recognized as a token comes by,
the regex engine returns two matches: one empty and one of the correct
character.
To simplify this odd behavior, I have prepared the following example:
Match the string
abcdefghijklmnop
to the expression
~\G(a|b|c*|\w)~
This "anomaly" is seen in the Perl, PHP, and C# regex engines (which
makes me think that it is expected behavior). The final destination
for this regex is C#, so I can not just ignore null entries. (The C#
regex engine stops after the first null match.) Any help or advice
would be much appreciated.
Sincerely,
John "Otac0n" Gietzen