matching patterns after regex?

M

Martin

Hi,

I have a string (see below) and ideally I would like to pull out the
decimal number which follows the bounding coordinate information. For
example ideal from this string I would return...

s = '\nGROUP = ARCHIVEDMETADATA\n
GROUPTYPE = MASTERGROUP\n\n GROUP =
BOUNDINGRECTANGLE\n\n OBJECT =
NORTHBOUNDINGCOORDINATE\n NUM_VAL = 1\n
VALUE = 19.9999999982039\n END_OBJECT =
NORTHBOUNDINGCOORDINATE\n\n OBJECT =
SOUTHBOUNDINGCOORDINATE\n NUM_VAL = 1\n
VALUE = 9.99999999910197\n END_OBJECT =
SOUTHBOUNDINGCOORDINATE\n\n OBJECT =
EASTBOUNDINGCOORDINATE\n NUM_VAL = 1\n
VALUE = 10.6506458717851\n END_OBJECT =
EASTBOUNDINGCOORDINATE\n\n OBJECT =
WESTBOUNDINGCOORDINATE\n NUM_VAL = 1\n
VALUE = 4.3188348375893e-15\n END_OBJECT
= WESTBOUNDINGCOORDINATE\n\n END_GROUP


NORTHBOUNDINGCOORDINATE = 19.9999999982039
SOUTHBOUNDINGCOORDINATE = 9.99999999910197
EASTBOUNDINGCOORDINATE = 10.6506458717851
WESTBOUNDINGCOORDINATE = 4.3188348375893e-15

so far I have only managed to extract the numbers by doing re.findall
("[\d.]*\d", s), which returns

['1',
'19.9999999982039',
'1',
'9.99999999910197',
'1',
'10.6506458717851',
'1',
'4.3188348375893',
'15',
etc.

Now the first problem that I can see is that my string match chops off
the "e-15" part and I am not sure how to incorporate the potential for
that in my pattern match. Does anyone have any suggestions as to how I
could also match this? Ideally I would have a statement which printed
the number between the two bounding coordinate strings for example

NORTHBOUNDINGCOORDINATE\n NUM_VAL = 1\n
VALUE = 19.9999999982039\n END_OBJECT =
NORTHBOUNDINGCOORDINATE\n\n

Something that matched "NORTHBOUNDINGCOORDINATE" and printed the
decimal number before it hit the next string
"NORTHBOUNDINGCOORDINATE". But I am not sure how to do this. any
suggestions would be appreciated.

Many thanks

Martin
 
B

Bernard

Hi,

I have a string (see below) and ideally I would like to pull out the
decimal number which follows the bounding coordinate information. For
example ideal from this string I would return...

s = '\nGROUP                  = ARCHIVEDMETADATA\n
GROUPTYPE            = MASTERGROUP\n\n  GROUP                  =
BOUNDINGRECTANGLE\n\n    OBJECT                 =
NORTHBOUNDINGCOORDINATE\n      NUM_VAL              = 1\n
VALUE                = 19.9999999982039\n    END_OBJECT             =
NORTHBOUNDINGCOORDINATE\n\n    OBJECT                 =
SOUTHBOUNDINGCOORDINATE\n      NUM_VAL              = 1\n
VALUE                = 9.99999999910197\n    END_OBJECT             =
SOUTHBOUNDINGCOORDINATE\n\n    OBJECT                 =
EASTBOUNDINGCOORDINATE\n      NUM_VAL              = 1\n
VALUE                = 10.6506458717851\n    END_OBJECT             =
EASTBOUNDINGCOORDINATE\n\n    OBJECT                 =
WESTBOUNDINGCOORDINATE\n      NUM_VAL              = 1\n
VALUE                = 4.3188348375893e-15\n    END_OBJECT
= WESTBOUNDINGCOORDINATE\n\n  END_GROUP

NORTHBOUNDINGCOORDINATE = 19.9999999982039
SOUTHBOUNDINGCOORDINATE = 9.99999999910197
EASTBOUNDINGCOORDINATE = 10.6506458717851
WESTBOUNDINGCOORDINATE = 4.3188348375893e-15

so far I have only managed to extract the numbers by doing re.findall
("[\d.]*\d", s), which returns

['1',
 '19.9999999982039',
 '1',
 '9.99999999910197',
 '1',
 '10.6506458717851',
 '1',
 '4.3188348375893',
 '15',
etc.

Now the first problem that I can see is that my string match chops off
the "e-15" part and I am not sure how to incorporate the potential for
that in my pattern match. Does anyone have any suggestions as to how I
could also match this? Ideally I would have a statement which printed
the number between the two bounding coordinate strings for example

NORTHBOUNDINGCOORDINATE\n      NUM_VAL              = 1\n
VALUE                = 19.9999999982039\n    END_OBJECT             =
NORTHBOUNDINGCOORDINATE\n\n

Something that matched "NORTHBOUNDINGCOORDINATE" and printed the
decimal number before it hit the next string
"NORTHBOUNDINGCOORDINATE". But I am not sure how to do this. any
suggestions would be appreciated.

Many thanks

Martin

Hey Martin,

here's a regex I've just tested : (\w+COORDINATE).*\s+VALUE\s+=\s([\d\.
\w-]+)

the first match corresponds to the whateverBOUNDINGCOORDINATE and the
second match is the value.

please provide some more entries if you'd like me to test my regex
some more :)

cheers

Bernard
 
M

Martin

I have a string (see below) and ideally I would like to pull out the
decimal number which follows the bounding coordinate information. For
example ideal from this string I would return...
s = '\nGROUP = ARCHIVEDMETADATA\n
GROUPTYPE = MASTERGROUP\n\n GROUP =
BOUNDINGRECTANGLE\n\n OBJECT =
NORTHBOUNDINGCOORDINATE\n NUM_VAL = 1\n
VALUE = 19.9999999982039\n END_OBJECT =
NORTHBOUNDINGCOORDINATE\n\n OBJECT =
SOUTHBOUNDINGCOORDINATE\n NUM_VAL = 1\n
VALUE = 9.99999999910197\n END_OBJECT =
SOUTHBOUNDINGCOORDINATE\n\n OBJECT =
EASTBOUNDINGCOORDINATE\n NUM_VAL = 1\n
VALUE = 10.6506458717851\n END_OBJECT =
EASTBOUNDINGCOORDINATE\n\n OBJECT =
WESTBOUNDINGCOORDINATE\n NUM_VAL = 1\n
VALUE = 4.3188348375893e-15\n END_OBJECT
= WESTBOUNDINGCOORDINATE\n\n END_GROUP
NORTHBOUNDINGCOORDINATE = 19.9999999982039
SOUTHBOUNDINGCOORDINATE = 9.99999999910197
EASTBOUNDINGCOORDINATE = 10.6506458717851
WESTBOUNDINGCOORDINATE = 4.3188348375893e-15
so far I have only managed to extract the numbers by doing re.findall
("[\d.]*\d", s), which returns
['1',
'19.9999999982039',
'1',
'9.99999999910197',
'1',
'10.6506458717851',
'1',
'4.3188348375893',
'15',
etc.

Now the first problem that I can see is that my string match chops off
the "e-15" part and I am not sure how to incorporate the potential for
that in my pattern match. Does anyone have any suggestions as to how I
could also match this? Ideally I would have a statement which printed
the number between the two bounding coordinate strings for example
NORTHBOUNDINGCOORDINATE\n NUM_VAL = 1\n
VALUE = 19.9999999982039\n END_OBJECT =
NORTHBOUNDINGCOORDINATE\n\n
Something that matched "NORTHBOUNDINGCOORDINATE" and printed the
decimal number before it hit the next string
"NORTHBOUNDINGCOORDINATE". But I am not sure how to do this. any
suggestions would be appreciated.
Many thanks

Hey Martin,

here's a regex I've just tested : (\w+COORDINATE).*\s+VALUE\s+=\s([\d\.
\w-]+)

the first match corresponds to the whateverBOUNDINGCOORDINATE and the
second match is the value.

please provide some more entries if you'd like me to test my regex
some more :)

cheers

Bernard

Thanks Bernard it doesn't seem to be working for me...

I tried

re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)

is that what you meant? Apologies if not, that results in a syntax
error:

In [557]: re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)
------------------------------------------------------------
File "<ipython console>", line 1
re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)
^
SyntaxError: unexpected character after line continuation character

Thanks
 
S

Steven D'Aprano

I tried

re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)

You need to put quotes around strings.

In this case, because you're using regular expressions, you should use a
raw string:

re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)

will probably work.
 
M

Martin

re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)

You need to put quotes around strings.

In this case, because you're using regular expressions, you should use a
raw string:

re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)

will probably work.

Thanks I see.

so I tried it and if I use it as it is, it matches the first instance:
I
n [594]: re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)
Out[594]: [('NORTHBOUNDINGCOORDINATE', '1')]

So I adjusted the first part of the regex, on the basis I could sub
NORTH for SOUTH etc.

In [595]: re.findall(r"(NORTHBOUNDINGCOORDINATE).*\s+VALUE\s+=\s([\d\.
\w-]+)",s)
Out[595]: [('NORTHBOUNDINGCOORDINATE', '1')]

But in both cases it doesn't return the decimal value rather the value
that comes after NUM_VAL = , rather than VALUE = ?
 
M

Martin

I tried
re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)
You need to put quotes around strings.
In this case, because you're using regular expressions, you should use a
raw string:
re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)

will probably work.

Thanks I see.

so I tried it and if I use it as it is, it matches the first instance:
I
n [594]: re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)
Out[594]: [('NORTHBOUNDINGCOORDINATE', '1')]

So I adjusted the first part of the regex, on the basis I could sub
NORTH for SOUTH etc.

In [595]: re.findall(r"(NORTHBOUNDINGCOORDINATE).*\s+VALUE\s+=\s([\d\.
\w-]+)",s)
Out[595]: [('NORTHBOUNDINGCOORDINATE', '1')]

But in both cases it doesn't return the decimal value rather the value
that comes after NUM_VAL = , rather than VALUE = ?

I think I kind of got that to work...but I am clearly not quite
understanding how it works as I tried to use it again to match
something else.

In this case I want to print the values 0.000000 and 2223901.039333
from a string like this...

YDim=1200\n\t\tUpperLeftPointMtrs=(0.000000,2223901.039333)\n\t\t

I tried which I though was matching the statement and printing the
decimal number after the equals sign??

re.findall(r"(\w+UpperLeftPointMtrs)*=\s([\d\.\w-]+)", s)

where s is the string

Many thanks for the help
 
B

Bernard

On Wed, 12 Aug 2009 05:12:22 -0700, Martin wrote:
I tried
re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)
You need to put quotes around strings.
In this case, because you're using regular expressions, you should use a
raw string:
re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)
will probably work.
Thanks I see.
so I tried it and if I use it as it is, it matches the first instance:
I
n [594]: re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)
Out[594]: [('NORTHBOUNDINGCOORDINATE', '1')]
So I adjusted the first part of the regex, on the basis I could sub
NORTH for SOUTH etc.
In [595]: re.findall(r"(NORTHBOUNDINGCOORDINATE).*\s+VALUE\s+=\s([\d\..
\w-]+)",s)
Out[595]: [('NORTHBOUNDINGCOORDINATE', '1')]
But in both cases it doesn't return the decimal value rather the value
that comes after NUM_VAL = , rather than VALUE = ?

I think I kind of got that to work...but I am clearly not quite
understanding how it works as I tried to use it again to match
something else.

In this case I want to print the values 0.000000 and 2223901.039333
from a string like this...

YDim=1200\n\t\tUpperLeftPointMtrs=(0.000000,2223901.039333)\n\t\t

I tried which I though was matching the statement and printing the
decimal number after the equals sign??

re.findall(r"(\w+UpperLeftPointMtrs)*=\s([\d\.\w-]+)", s)

where s is the string

Many thanks for the help

You have to do it with 2 matches in the same regex:

regex = r"UpperLeftPointMtrs=\(([\d\.]+),([\d\.]+)"

The first match is before the , and the second one is after the , :)

You should probably learn how to play with regexes.
I personnaly use a visual tool called RX Toolkit[1] that comes with
Komodo IDE.

[1] http://docs.activestate.com/komodo/4.4/regex.html
 
M

Mark Lawrence

Bernard said:
On Aug 12, 1:23 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
On Wed, 12 Aug 2009 05:12:22 -0700, Martin wrote:
I tried
re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)
You need to put quotes around strings.
In this case, because you're using regular expressions, you should use a
raw string:
re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)
will probably work.
--
Steven
Thanks I see.
so I tried it and if I use it as it is, it matches the first instance:
I
n [594]: re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)
Out[594]: [('NORTHBOUNDINGCOORDINATE', '1')]
So I adjusted the first part of the regex, on the basis I could sub
NORTH for SOUTH etc.
In [595]: re.findall(r"(NORTHBOUNDINGCOORDINATE).*\s+VALUE\s+=\s([\d\.
\w-]+)",s)
Out[595]: [('NORTHBOUNDINGCOORDINATE', '1')]
But in both cases it doesn't return the decimal value rather the value
that comes after NUM_VAL = , rather than VALUE = ?
I think I kind of got that to work...but I am clearly not quite
understanding how it works as I tried to use it again to match
something else.

In this case I want to print the values 0.000000 and 2223901.039333
from a string like this...

YDim=1200\n\t\tUpperLeftPointMtrs=(0.000000,2223901.039333)\n\t\t

I tried which I though was matching the statement and printing the
decimal number after the equals sign??

re.findall(r"(\w+UpperLeftPointMtrs)*=\s([\d\.\w-]+)", s)

where s is the string

Many thanks for the help

You have to do it with 2 matches in the same regex:

regex = r"UpperLeftPointMtrs=\(([\d\.]+),([\d\.]+)"

The first match is before the , and the second one is after the , :)

You should probably learn how to play with regexes.
I personnaly use a visual tool called RX Toolkit[1] that comes with
Komodo IDE.

[1] http://docs.activestate.com/komodo/4.4/regex.html
Haven't tried it myself but how about this?
http://re-try.appspot.com/
 
M

Martin

Bernard said:
On Aug 12, 1:23 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
On Wed, 12 Aug 2009 05:12:22 -0700, Martin wrote:
I tried
re.findall((\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+),s)
You need to put quotes around strings.
In this case, because you're using regular expressions, you should use a
raw string:
re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)
will probably work.
--
Steven
Thanks I see.
so I tried it and if I use it as it is, it matches the first instance:
I
n [594]: re.findall(r"(\w+COORDINATE).*\s+VALUE\s+=\s([\d\.\w-]+)",s)
Out[594]: [('NORTHBOUNDINGCOORDINATE', '1')]
So I adjusted the first part of the regex, on the basis I could sub
NORTH for SOUTH etc.
In [595]: re.findall(r"(NORTHBOUNDINGCOORDINATE).*\s+VALUE\s+=\s([\d\.
\w-]+)",s)
Out[595]: [('NORTHBOUNDINGCOORDINATE', '1')]
But in both cases it doesn't return the decimal value rather the value
that comes after NUM_VAL = , rather than VALUE = ?
I think I kind of got that to work...but I am clearly not quite
understanding how it works as I tried to use it again to match
something else.
In this case I want to print the values 0.000000 and 2223901.039333
from a string like this...
YDim=1200\n\t\tUpperLeftPointMtrs=(0.000000,2223901.039333)\n\t\t
I tried which I though was matching the statement and printing the
decimal number after the equals sign??
re.findall(r"(\w+UpperLeftPointMtrs)*=\s([\d\.\w-]+)", s)
where s is the string
Many thanks for the help
You have to do it with 2 matches in the same regex:
regex = r"UpperLeftPointMtrs=\(([\d\.]+),([\d\.]+)"
The first match  is before the , and the second one is after the , :)
You should probably learn how to play with regexes.
I personnaly use a visual tool called RX Toolkit[1] that comes with
Komodo IDE.

Haven't tried it myself but how about this?http://re-try.appspot.com/

Thanks Mark and Bernard. I have managed to get it working and I
appreciate the help with understanding the syntax. The web links are
also very useful, I'll give them a go.

Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,812
Latest member
GracielaWa

Latest Threads

Top