Expression Parsing Help Needed

Theresa · Aug 5, 2008

I think this is close, but I don't quite understand this well enough
to build the expression I need to build. It is something I ran across
in existing code, need to modify, but have had no prior experience
doing. I found and have been using the following documentation to
help me experiment. http://support.alphasoftware.com/alphafivehelp/Xbasic/Overview_of_Regular_Expressions_marketing.htm

Here's the senario...
Let’s say I have the following input strings:

"ACCT Gold-P4 Mainframe - Germany Europe/Middle"
"ACCT Gold-P1 Unix-AIX-Linux - France Europe/Middle"
"ACCT Silver-P4 Database Infra Support client system - Germany Europe/
Middle"

I want to grab the portion of the string beginning after the P1/P4 and
ending with “ – “. More specifically, I need to grab:
"Mainframe"
"Unix-AIX-Linux"
"Database Infra Support client system"
(There are thousands of other potential values, but these 3
demonstrate the basic diversity I could expect.)

This is how I parsed everything up to and including P4/P1 (no problems
with this part):
"[^ ]* [^-]*-[^ ]*…

But being new at this, I am unable to come up with a parsing
expression that successfully grabs those 3 different phrase
possibilities. This is the closest I've come (though I've tried many
other possibilities):
"[^ ]* [^-]*-[^ ]* ([^(?: - )*]*)(?: - )*.*"

Note: The “(?: - )” portion is using “non-marking parenthesis” so that
it doesn’t spit out another sub-expression. Is that necessary? I'm
not sure, I just know it didn't work if I used parenthesis that
weren't non-marking. But, this returns:

“Mainframe” great!
“Unix-AIX-Linux” great!
“Database” BAD!!! Truncates value "Database Infra Support
client system"

I have tinkered and tinkered, but without a thorough understanding of
the expression parsing syntax (the documentation tells what but not
why - and the "why" of it all is important for me to grasp it fully),
I'm unable to complete the parsing expression properly.

Any help would be greatly appreciated,
Theresa

Ben Morrow · Aug 5, 2008

Quoth Theresa said:
I think this is close, but I don't quite understand this well enough
to build the expression I need to build. It is something I ran across
in existing code, need to modify, but have had no prior experience
doing. I found and have been using the following documentation to
help me experiment. http://support.alphasoftware.com/alphafivehelp/Xbasic/O=
verview_of_Regular_Expressions_marketing.htm

If you're using Perl's regular expressions, you would be better off
reading Perl's documentation (start with perldoc perlretut). (If you're
not using Perl, why are you asing here?)

Here's the senario...
Let=92s say I have the following input strings:

Please don't post quoted-printable content here. Stick to ASCII or (if
you must) un-encoded UTF8.

"ACCT Gold-P4 Mainframe - Germany Europe/Middle"
"ACCT Gold-P1 Unix-AIX-Linux - France Europe/Middle"
"ACCT Silver-P4 Database Infra Support client system - Germany Europe/
Middle"

I want to grab the portion of the string beginning after the P1/P4 and
ending with =93 =96 =93. More specifically, I need to grab:
"Mainframe"
"Unix-AIX-Linux"
"Database Infra Support client system"
(There are thousands of other potential values, but these 3
demonstrate the basic diversity I could expect.)

This is how I parsed everything up to and including P4/P1 (no problems
with this part):
"[^ ]* [^-]*-[^ ]*=85

But being new at this, I am unable to come up with a parsing
expression that successfully grabs those 3 different phrase
possibilities. This is the closest I've come (though I've tried many
other possibilities):
"[^ ]* [^-]*-[^ ]* ([^(?: - )*]*)(?: - )*.*"

^^^^^^^
This is not valid (or rather, it doesn't mean what you think). Character
classes *only* contain a list of characters (with special meanings for
'^' and '-'), not general expressions. You seem to be trying to use [^ ]
as a general 'don't match this' operator: that's not how it works.

I would write /-P\d+ (.*?) - / unless the 'P4' isn't necessarily a 'P',
in which case I would write /\w+-\w+ (.*?) - /. The ? on the end of .*?
is important: it says 'match as little as possible, provided the next
bit of the pattern will match'.

Ben

Jim Gibson · Aug 5, 2008

Theresa said:
I think this is close, but I don't quite understand this well enough
to build the expression I need to build. It is something I ran across
in existing code, need to modify, but have had no prior experience
doing. I found and have been using the following documentation to
help me experiment.
http://support.alphasoftware.com/alphafivehelp/Xbasic/Overview_of_Regular_Expr
essions_marketing.htm

Here's the senario...
Let’s say I have the following input strings:

"ACCT Gold-P4 Mainframe - Germany Europe/Middle"
"ACCT Gold-P1 Unix-AIX-Linux - France Europe/Middle"
"ACCT Silver-P4 Database Infra Support client system - Germany Europe/
Middle"

I want to grab the portion of the string beginning after the P1/P4 and
ending with “ – “. More specifically, I need to grab:
"Mainframe"
"Unix-AIX-Linux"
"Database Infra Support client system"
(There are thousands of other potential values, but these 3
demonstrate the basic diversity I could expect.)

Try this:

if( $string =~ m{ \A ACCT \s+ (?:Gold|Silver) - P\d \s+ (.*) \s+ - }x
) {
# process what is in $1
}

RedGrittyBrick · Aug 5, 2008

bugbear said:
Good spec.

/(P1|P4)(.*)( - )/

$2 should have what you want. $1 and $3 are uninteresting.

Why capture 1 and 3?

/P[14](.*) - /

Just my ¤0.02 worth.

Dr.Ruud · Aug 9, 2008

Theresa schreef:

Let’s say I have the following input strings:

"ACCT Gold-P4 Mainframe - Germany Europe/Middle"
"ACCT Gold-P1 Unix-AIX-Linux - France Europe/Middle"
"ACCT Silver-P4 Database Infra Support client system - Germany Europe/
Middle"

[...] I need to grab:
"Mainframe"
"Unix-AIX-Linux"
"Database Infra Support client system"

Both "\w+" in the code below are probably better written as
"[[:alpha:]]+".

#/usr/bin/perl
use strict;
use warnings;

while ( <DATA> ) {
m~^ACCT \w+-P[14] (.*) - \w+ \S+$~
and print $1, "\n";
}

__DATA__
ACCT Gold-P4 Mainframe - Germany Europe/Middle
ACCT Gold-P1 Unix-AIX-Linux - France Europe/Middle

ANN: eGenix mxODBC Django Database Engine - Django ODBC Adapter 1.2.0	0	Jun 18, 2013
ANN: eGenix pyOpenSSL Distribution 0.13.3.1.0.1.6	1	Jan 28, 2014
ANN: eGenix mx Base Distribution 3.2.2 (mxDateTime, mxTextTools, etc.)	0	Jan 11, 2012
Urgnet Requirement for Artix- Singapore	0	Nov 17, 2006
Java Developer Positions in downtown Chicago	0	Mar 2, 2011
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
Help with Firefox compatibility	5	Aug 5, 2005
ANN: eGenix pyOpenSSL Distribution 0.9.0-0.9.8l	0	Nov 10, 2009

Expression Parsing Help Needed

Theresa

Ben Morrow

Jim Gibson

RedGrittyBrick

Dr.Ruud

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads