Expression Parsing Help Needed

T

Theresa

I think this is close, but I don't quite understand this well enough
to build the expression I need to build. It is something I ran across
in existing code, need to modify, but have had no prior experience
doing. I found and have been using the following documentation to
help me experiment. http://support.alphasoftware.com/alphafivehelp/Xbasic/Overview_of_Regular_Expressions_marketing.htm

Here's the senario...
Let’s say I have the following input strings:

"ACCT Gold-P4 Mainframe - Germany Europe/Middle"
"ACCT Gold-P1 Unix-AIX-Linux - France Europe/Middle"
"ACCT Silver-P4 Database Infra Support client system - Germany Europe/
Middle"

I want to grab the portion of the string beginning after the P1/P4 and
ending with “ – “. More specifically, I need to grab:
"Mainframe"
"Unix-AIX-Linux"
"Database Infra Support client system"
(There are thousands of other potential values, but these 3
demonstrate the basic diversity I could expect.)

This is how I parsed everything up to and including P4/P1 (no problems
with this part):
"[^ ]* [^-]*-[^ ]*…

But being new at this, I am unable to come up with a parsing
expression that successfully grabs those 3 different phrase
possibilities. This is the closest I've come (though I've tried many
other possibilities):
"[^ ]* [^-]*-[^ ]* ([^(?: - )*]*)(?: - )*.*"

Note: The “(?: - )” portion is using “non-marking parenthesis” so that
it doesn’t spit out another sub-expression. Is that necessary? I'm
not sure, I just know it didn't work if I used parenthesis that
weren't non-marking. But, this returns:

“Mainframe” great!
“Unix-AIX-Linux” great!
“Database” BAD!!! Truncates value "Database Infra Support
client system"

I have tinkered and tinkered, but without a thorough understanding of
the expression parsing syntax (the documentation tells what but not
why - and the "why" of it all is important for me to grasp it fully),
I'm unable to complete the parsing expression properly.

Any help would be greatly appreciated,
Theresa
 
B

Ben Morrow

Quoth Theresa said:
I think this is close, but I don't quite understand this well enough
to build the expression I need to build. It is something I ran across
in existing code, need to modify, but have had no prior experience
doing. I found and have been using the following documentation to
help me experiment. http://support.alphasoftware.com/alphafivehelp/Xbasic/O=
verview_of_Regular_Expressions_marketing.htm

If you're using Perl's regular expressions, you would be better off
reading Perl's documentation (start with perldoc perlretut). (If you're
not using Perl, why are you asing here?)
Here's the senario...
Let=92s say I have the following input strings:

Please don't post quoted-printable content here. Stick to ASCII or (if
you must) un-encoded UTF8.
"ACCT Gold-P4 Mainframe - Germany Europe/Middle"
"ACCT Gold-P1 Unix-AIX-Linux - France Europe/Middle"
"ACCT Silver-P4 Database Infra Support client system - Germany Europe/
Middle"

I want to grab the portion of the string beginning after the P1/P4 and
ending with =93 =96 =93. More specifically, I need to grab:
"Mainframe"
"Unix-AIX-Linux"
"Database Infra Support client system"
(There are thousands of other potential values, but these 3
demonstrate the basic diversity I could expect.)

This is how I parsed everything up to and including P4/P1 (no problems
with this part):
"[^ ]* [^-]*-[^ ]*=85

But being new at this, I am unable to come up with a parsing
expression that successfully grabs those 3 different phrase
possibilities. This is the closest I've come (though I've tried many
other possibilities):
"[^ ]* [^-]*-[^ ]* ([^(?: - )*]*)(?: - )*.*"
^^^^^^^
This is not valid (or rather, it doesn't mean what you think). Character
classes *only* contain a list of characters (with special meanings for
'^' and '-'), not general expressions. You seem to be trying to use [^ ]
as a general 'don't match this' operator: that's not how it works.

I would write /-P\d+ (.*?) - / unless the 'P4' isn't necessarily a 'P',
in which case I would write /\w+-\w+ (.*?) - /. The ? on the end of .*?
is important: it says 'match as little as possible, provided the next
bit of the pattern will match'.

Ben
 
J

Jim Gibson

Theresa said:
I think this is close, but I don't quite understand this well enough
to build the expression I need to build. It is something I ran across
in existing code, need to modify, but have had no prior experience
doing. I found and have been using the following documentation to
help me experiment.
http://support.alphasoftware.com/alphafivehelp/Xbasic/Overview_of_Regular_Expr
essions_marketing.htm

Here's the senario...
Let’s say I have the following input strings:

"ACCT Gold-P4 Mainframe - Germany Europe/Middle"
"ACCT Gold-P1 Unix-AIX-Linux - France Europe/Middle"
"ACCT Silver-P4 Database Infra Support client system - Germany Europe/
Middle"

I want to grab the portion of the string beginning after the P1/P4 and
ending with “ – “. More specifically, I need to grab:
"Mainframe"
"Unix-AIX-Linux"
"Database Infra Support client system"
(There are thousands of other potential values, but these 3
demonstrate the basic diversity I could expect.)

Try this:

if( $string =~ m{ \A ACCT \s+ (?:Gold|Silver) - P\d \s+ (.*) \s+ - }x
) {
# process what is in $1
}
 
R

RedGrittyBrick

bugbear said:
Good spec.

/(P1|P4)(.*)( - )/

$2 should have what you want. $1 and $3 are uninteresting.

Why capture 1 and 3?

/P[14](.*) - /

Just my ¤0.02 worth.
 
D

Dr.Ruud

Theresa schreef:
Let’s say I have the following input strings:

"ACCT Gold-P4 Mainframe - Germany Europe/Middle"
"ACCT Gold-P1 Unix-AIX-Linux - France Europe/Middle"
"ACCT Silver-P4 Database Infra Support client system - Germany Europe/
Middle"

[...] I need to grab:
"Mainframe"
"Unix-AIX-Linux"
"Database Infra Support client system"


Both "\w+" in the code below are probably better written as
"[[:alpha:]]+".

#/usr/bin/perl
use strict;
use warnings;

while ( <DATA> ) {
m~^ACCT \w+-P[14] (.*) - \w+ \S+$~
and print $1, "\n";
}

__DATA__
ACCT Gold-P4 Mainframe - Germany Europe/Middle
ACCT Gold-P1 Unix-AIX-Linux - France Europe/Middle
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,231
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top