String Regex problem

Fazer · Nov 24, 2003

Hello,

I have a string which has a url (Begins with a http://) somewhere in
it. I want to detect such a url and just spit out the url. Since I
am very poor in regex, can someone show me how to do it using a few
examples?

Thanks a lot!

djw · Nov 24, 2003

Fazer said:
Hello,

I have a string which has a url (Begins with a http://) somewhere in
it. I want to detect such a url and just spit out the url. Since I
am very poor in regex, can someone show me how to do it using a few
examples?

Thanks a lot!

I would look here to improve your re-ex skills:

http://www.amk.ca/python/howto/regex/

Also, I find Kodos to be invaluable in developing and debugging regexs.
Highly recommended.

http://kodos.sourceforge.net

Of course, you could just use urlparse in the standard library...

Good luck,

Don

Skip Montanaro · Nov 24, 2003

Don> http://www.amk.ca/python/howto/regex/
...
Don> http://kodos.sourceforge.net

If you're a Mac Python person there's also Dinu Gherman's excellent
RegexPlor:

http://starship.python.net/crew/gherman/RegexPlor.html

Even if you're not, it's worth popping over there to watch the MPEG clip of
RegexPlor in action.

Skip

Andrei · Nov 25, 2003

Skip Montanaro wrote on Mon, 24 Nov 2003 21:35:48 -0600:

Don> http://kodos.sourceforge.net

If you're a Mac Python person there's also Dinu Gherman's excellent
RegexPlor:

http://starship.python.net/crew/gherman/RegexPlor.html

<snip>

I'm biased here, but Kiki (but http://project5.freezope.org/kiki) is
cross-platform and doesn't depend on Qt but on wxPy which is much easier
for Windows users.

Anyway, here's a regex I ripped out of my own code - you might want to
simplify it though:

"""Regex for finding URLs:
URL's start with http(s)/ftp/news ((http)|(ftp)|(news))
followed by ://
then any number of non-whitespace characters including
numbers, dots, forward slashes, commas, question marks,
ampersands, equality signs, dashes, underscores and plusses,
but ending in a non-dot and non-plus!

Result:

(?:http|https|ftp|news)://(?:[@a-zA-Z0-9,/%:\&+#\?=\-_~;]+\.*)+[a-zA-Z0-9,/%:\&#\?=\-_]

Tests:
Plain old link: http://www.mail.yahoo.com.
Containing numbers: ftp://bla.com/di~ng/co.rt,39,%93 or other
Go to news://bl_a.com/?ha-h+a&query=tb for more info.
A real link: <a href="http://x.com">http://x.com</a>.
ftp://verylong.org/url/must/be/chopped/to/pieces/oritwontfit.html
(long one)
<IMG src="http://b.com/image.gif" /> (a plain image tag)
<a href=http://fixedlink.com/orginialinvalid.html>fixed</a> (original
invalid HTML)
Link containing an anchor
<b>"http://myhomepage.com/index.html#01"</b>.
"""

--
Yours,

Andrei

=====
Mail address in header catches spam. Real contact info (decode with rot13):
(e-mail address removed). Fcnz-serr! Cyrnfr qb abg hfr va choyvp cbfgf. V ernq
gur yvfg, fb gurer'f ab arrq gb PP.

Fazer · Nov 25, 2003

djw said:
I would look here to improve your re-ex skills:

http://www.amk.ca/python/howto/regex/

Also, I find Kodos to be invaluable in developing and debugging regexs.
Highly recommended.

http://kodos.sourceforge.net

Of course, you could just use urlparse in the standard library...

Good luck,

Don

Wow awesome! Thanks a lot for kodos. I hope I find it useful. I
have actually found a better solution rather than using regex it self.

Here's my solution and I think it works well:
[x for x in moo.split(' ') if x.startswith('http://')]

SQL Connection string regex pattern to parse sections	1	May 9, 2024
How to read a file as binary or hex "string" so that I can do regex search?	3	Dec 18, 2024
Python3 string slicing	2	Dec 11, 2023
My regex kung-fu is not strong =(	0	Apr 4, 2020
Problem Splitting Text String	2	Dec 28, 2022
RegEx issues	6	Jan 24, 2009
Finding all regex matches by index?	1	May 29, 2012
Travel time math problem	13	Jan 21, 2025

String Regex problem

Fazer

djw

Skip Montanaro

Andrei

Fazer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads