Use Regular Expressions to extract URL's

Jimbo · Apr 30, 2010

Hello

I am using regular expressions to grab URL's from a string(of HTML
code). I am getting on very well & I seem to be grabbing the full URL
but
I also get a '"' character at the end of it. Do you know how I can get
rid of the '"' char at the end of my URL

Example of problem:

I get this when I extract a url from a string
http://google.com"

I want to get this
http://google.com

My regular expression:

Code:

def find_urls(string):
    """ Extract all URL's from a string & return as a list """

    url_list = re.findall(r'(?:http://|www.).*?["]',string)
    return url_list

Steven D'Aprano · Apr 30, 2010

Hello

I am using regular expressions to grab URL's from a string(of HTML
code). I am getting on very well & I seem to be grabbing the full URL
but
I also get a '"' character at the end of it. Do you know how I can get
rid of the '"' char at the end of my URL

Live dangerously and just drop the last character from string s no matter
what it is:

s = s[:-1]

Or be a little more cautious and test first:

if s.endswith('"'):
s = s[:-1]

Or fix the problem at the source. Using regexes to parse HTML is always
problematic. You should consider using a proper HTML parser. Otherwise,
try this regex:

r'"(http://(?:www)?\..*?)"'

Novocastrian_Nomad · Apr 30, 2010

Or perhaps more generically:

import re
string = 'scatter "http://wwww.yahoo.com quotes and text anywhere www.google.com" "www.bing.com" or not'
print re.findall(r'(?:http://|www.)[^"\s]+',string)

Click to expand...

Click to expand...

['http://wwww.yahoo.com', 'www.google.com', 'www.bing.com']

Walter Overby · May 1, 2010

A John Gruber post from November seems relevant. I have not tried his
regex in any language.

http://daringfireball.net/2009/11/liberal_regex_for_matching_urls

Regards,

Walter.

Utility to locate errors in regular expressions	3	May 24, 2013
How to extract all values except the last value in a string separated by comma in sql	2	Jun 15, 2023
Understanding '?' in regular expressions	2	Nov 16, 2012
Regular expressions, help?	7	Apr 19, 2012
regular expressions and matching delimeters	17	May 21, 2014
Large regular expressions	1	Mar 15, 2010
extract from json	6	Mar 7, 2014
Python Regular Expressions	4	Jun 22, 2011

Use Regular Expressions to extract URL's

Jimbo

Steven D'Aprano

Novocastrian_Nomad

Walter Overby

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads