Regular Expressions

S

..:: sjf ::..

Hello,

I would like to please to help me with build a regular expression.
There are following piece of html code in my files:

<FONT COLOR="#FF0000">A - TYPE1: any_text<BR>
B - TYPE2: any_text_2<BR>
C - TYPE2: any_text_3<BR>
w - any_text_15<BR>
</FONT>
html code
</BODY></HTML>

I need to have only following data:
(B, any_text_2)
(C, any_text_3)
that is, these data TYPE2 in which.

Thanks in advance
 
D

Diez B. Roggisch

B - TYPE2: any_text_2<BR>
C - TYPE2: any_text_3<BR>
w - any_text_15<BR>
</FONT>
html code
</BODY></HTML>

I need to have only following data:
(B, any_text_2)
(C, any_text_3)
that is, these data TYPE2 in which.

you should utilize the htmlparser class to extract the text first. Then this
regular expression might help:

r"(.) TYPE. : (.*)"
 
B

Benjamin Arai

I would just use the re library because regular expressions will allow
you to get right down to the data on the first try anyways without
further parsing. If you use the htmlparser library first it may cause
some unneeded processing time.

you should utilize the htmlparser class to extract the text first. Then this
regular expression might help:

r"(.) TYPE. : (.*)"

Benjamin Arai
Araisoft

Email: (e-mail address removed)
Website: http://www.araisoft.com
 
D

Diez B. Roggisch

Benjamin said:
I would just use the re library because regular expressions will allow
you to get right down to the data on the first try anyways without
further parsing. If you use the htmlparser library first it may cause
some unneeded processing time.

That depends on how well the html is written. You often end up writing
complicated regexes to extract data from certain special cases, and
sometimes even with two passes. So in general, its better to use the right
tool for the job - if speed _is_ a concern you can still try to optimize.
 
S

..:: sjf ::..

pewnego dnia niejaki Diez B. Roggisch (e-mail address removed) wstuka³ by³ ;-)
you should utilize the htmlparser class to extract the text first. Then
this regular expression might help:
r"(.) TYPE. : (.*)"

Thanks. And now, let's assume that I have a following strings:
S1 = "B - TYPE2: any_text_2 TYPE3: any_text_23"
S2 = "C - TYPE2: any_text_3"

and I want to have one regular expression that produce only following data:
("B", "any_text_2")
("C", "any_text_3")
that is, any characters starting TYPE3 till end will be omitted.
How do make this?
 
D

Diez B. Roggisch

...:: sjf ::.. said:
Thanks. And now, let's assume that I have a following strings:
S1 = "B - TYPE2: any_text_2 TYPE3: any_text_23"
S2 = "C - TYPE2: any_text_3"

and I want to have one regular expression that produce only following
data: ("B", "any_text_2")
("C", "any_text_3")
that is, any characters starting TYPE3 till end will be omitted.
How do make this?

r"(.) TYPE. : ([^ ]*)"
 
S

..:: sjf ::..

pewnego dnia niejaki Diez B. Roggisch (e-mail address removed) wstuka³ by³ ;-)
..:: sjf ::.. said:
Thanks. And now, let's assume that I have a following strings:
S1 = "B - TYPE2: any_text_2 TYPE3: any_text_23"
S2 = "C - TYPE2: any_text_3"
and I want to have one regular expression that produce only following
data: ("B", "any_text_2")
("C", "any_text_3")
that is, any characters starting TYPE3 till end will be omitted.
How do make this?
r"(.) TYPE. : ([^ ]*)"

it works as long as any_text_2 and any_text_3 _not_ contains spaces, but in
my files these contain spaces and then this regexp cuts any_text_2 (and
any_text_3) after first space :((
 
D

Diez B. Roggisch

...:: sjf ::.. said:
it works as long as any_text_2 and any_text_3 _not_ contains spaces, but
in my files these contain spaces and then this regexp cuts any_text_2 (and
any_text_3) after first space :((

As long as you don't provide better examples, I can't come up with a better
solution. I suggest a read:

http://catb.org/~esr/faqs/smart-questions.html

And while I don't have a problem helping you, people (including me)
generally prefer that someone shows that he tries for himself and not only
tries to benefit from the helpfulness of other people. So how about trying
to figure out how regexes work fer yourself?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,189
Messages
2,571,016
Members
47,616
Latest member
gijoji4272

Latest Threads

Top