how to strip the domain name in python?

Marko.Cain.23 · Apr 14, 2007

Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk

pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?

Thank you.

Alex Martelli · Apr 14, 2007

Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk

pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?

You're using reverse slashes in your RE pattern, to start with, while
the URLs contain plain slashes (or don't have any slashes, in the case
of the second one).

Anyway, forget REs, and use standard library module urlparse,
specifically its urlparse.urlsplit function.

Alex

Michael Bentley · Apr 14, 2007

Hi,

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk

pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)

if (match):
s1, s2 = match[0]

print s2

but none of the site matched, can you please tell me what am i
missing?

change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
\/(.*)\.(.*)", re.S)

Marko.Cain.23 · Apr 14, 2007

Hi,

Click to expand...

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)

Click to expand...

if (match):
s1, s2 = match[0]

Click to expand...

print s2

Click to expand...

but none of the site matched, can you please tell me what am i
missing?

Click to expand...

change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
\/(.*)\.(.*)", re.S)

Thanks. I try this:

but when the 'line' is http://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?

pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)

match = re.findall(pattern, line)

if (match):

s1, s2 = match[0]

print s2

Marko.Cain.23 · Apr 15, 2007

On Apr 13, 2007, at 11:49 PM, (e-mail address removed) wrote:

Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk
pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?

Click to expand...

Click to expand...

change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
\/(.*)\.(.*)", re.S)

Click to expand...

Thanks. I try this:

but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?

pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)

match = re.findall(pattern, line)

if (match):

s1, s2 = match[0]

print s2

Can anyone please help me with my problem? I still can't solve it.

Basically, I want to strip out the text after the first '.' in url
address:

http://www.cnn.com -> cnn.com

Marc 'BlackJack' Rintsch · Apr 15, 2007

Marko.Cain.23 said:
On Apr 13, 2007, at 11:49 PM, (e-mail address removed) wrote:

I have a list of url names like this, and I am trying to strip out the
domain name using the following code:

pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)

Click to expand...

if (match):
s1, s2 = match[0]

Click to expand...

print s2

Click to expand...

but none of the site matched, can you please tell me what am i
missing?

Click to expand...

change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
\/(.*)\.(.*)", re.S)

Click to expand...

Thanks. I try this:

but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?

pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)

match = re.findall(pattern, line)

if (match):

s1, s2 = match[0]

print s2

Click to expand...

Can anyone please help me with my problem? I still can't solve it.

Basically, I want to strip out the text after the first '.' in url
address:

http://www.cnn.com -> cnn.com

from urlparse import urlsplit

def get_domain(url):
net_location = urlsplit(url)[1]
return '.'.join(net_location.rsplit('.', 2)[-2:])

def main():
print get_domain('http://www.cnn.com')

Ciao,
Marc 'BlackJack' Rintsch

Marko.Cain.23 · Apr 15, 2007

In <[email protected]>, Marko.Cain.23
wrote:

On Apr 14, 12:02 am, Michael Bentley <[email protected]>
wrote:
On Apr 13, 2007, at 11:49 PM, (e-mail address removed) wrote:
Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk
pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
\/(.*)\.(.*)", re.S)
Thanks. I try this:
but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?
pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)
match = re.findall(pattern, line)
if (match):
s1, s2 = match[0]
print s2

Click to expand...

Click to expand...

Can anyone please help me with my problem? I still can't solve it.

Click to expand...

Basically, I want to strip out the text after the first '.' in url
address:

Click to expand...

http://www.cnn.com-> cnn.com

Click to expand...

from urlparse import urlsplit

def get_domain(url):
net_location = urlsplit(url)[1]
return '.'.join(net_location.rsplit('.', 2)[-2:])

def main():
print get_domain('http://www.cnn.com')

Ciao,
Marc 'BlackJack' Rintsch

Thanks for your help.

But if the input string is "http://www.ebay.co.uk/", I only get
"co.uk"

how can I change it so that it works for both www.ebay.co.uk and www.cnn.com?

Steve Holden · Apr 16, 2007

In <[email protected]>, Marko.Cain.23
wrote:

On Apr 14, 10:36 am, (e-mail address removed) wrote:
On Apr 14, 12:02 am, Michael Bentley <[email protected]>
wrote:
On Apr 13, 2007, at 11:49 PM, (e-mail address removed) wrote:
Hi,
I have a list of url names like this, and I am trying to strip out the
domain name using the following code:
http://www.cnn.com
www.yahoo.com
http://www.ebay.co.uk
pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)
if (match):
s1, s2 = match[0]
print s2
but none of the site matched, can you please tell me what am i
missing?
change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
\/(.*)\.(.*)", re.S)
Thanks. I try this:
but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?
pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)
match = re.findall(pattern, line)
if (match):
s1, s2 = match[0]
print s2
Can anyone please help me with my problem? I still can't solve it.
Basically, I want to strip out the text after the first '.' in url
address:
http://www.cnn.com-> cnn.com

Click to expand...

from urlparse import urlsplit

def get_domain(url):
net_location = urlsplit(url)[1]
return '.'.join(net_location.rsplit('.', 2)[-2:])

def main():
print get_domain('http://www.cnn.com')

Ciao,
Marc 'BlackJack' Rintsch

Click to expand...

Thanks for your help.

But if the input string is "http://www.ebay.co.uk/", I only get
"co.uk"

how can I change it so that it works for both www.ebay.co.uk and www.cnn.com?

... net_location = urlsplit(url)[1]
... return net_location.split(".", 1)[1]
...
regards
Steve

Michael Bentley · Apr 16, 2007

In <[email protected]>,
Marko.Cain.23
wrote:

On Apr 14, 10:36 am, (e-mail address removed) wrote:
On Apr 14, 12:02 am, Michael Bentley <[email protected]>
wrote:

Click to expand...

On Apr 13, 2007, at 11:49 PM, (e-mail address removed) wrote:

I have a list of url names like this, and I am trying to strip
out the
domain name using the following code:

pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
match = re.findall(pattern, line)

Click to expand...

if (match):
s1, s2 = match[0]

Click to expand...

print s2

Click to expand...

but none of the site matched, can you please tell me what am i
missing?

Click to expand...

change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile
("http:\/
\/(.*)\.(.*)", re.S)

Click to expand...

Thanks. I try this:

Click to expand...

but when the 'line' ishttp://www.cnn.com, I get 's2' com,
but i want 'cnn.com' (everything after the first '.'), how can I do
that?

Click to expand...

pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)

Click to expand...

match = re.findall(pattern, line)

Click to expand...

if (match):

Click to expand...

s1, s2 = match[0]

Click to expand...

print s2

Click to expand...

Can anyone please help me with my problem? I still can't solve it.

Click to expand...

Basically, I want to strip out the text after the first '.' in url
address:

Click to expand...

http://www.cnn.com-> cnn.com

Click to expand...

from urlparse import urlsplit

def get_domain(url):
net_location = urlsplit(url)[1]
return '.'.join(net_location.rsplit('.', 2)[-2:])

def main():
print get_domain('http://www.cnn.com')

Ciao,
Marc 'BlackJack' Rintsch

Click to expand...

Thanks for your help.

But if the input string is "http://www.ebay.co.uk/", I only get
"co.uk"

how can I change it so that it works for both www.ebay.co.uk and
www.cnn.com?

from urlparse import urlsplit

def get_domain(url):
net_location = (
urlsplit(url)[1]
and urlsplit(url)[1].split('.')
or urlsplit(url)[2].split('.')
) # tricksy way to get long line into email
if net_location[0].lower() == 'www':
net_location = net_location[1:]
return '.'.join(net_location)

def main():
testItems = ['http://www.cnn.com',
'www.yahoo.com',
'http://www.ebay.co.uk']

for testItem in testItems:
print get_domain(testItem)

if __name__ == '__main__':
main()

Im trying to make a item duplication script for a game	1	Apr 12, 2022
Autoselect item with TamperMonkey	2	Jan 2, 2023
re.findall() hangs in python	5	Apr 1, 2007
How to sort a CSV file with merge sort JAVA	7	May 6, 2021
FAQ 4.32 How do I strip blank space from the beginning/end of a string?	0	Feb 25, 2011
I made a blockchain and want to make a cryptocurrency, but my code doesn't verify hash of each block	2	Jun 2, 2024
Send commands to USB device in Python	1	Feb 11, 2014
How to properly insert a landing page within same container beneath an image element?	1	Oct 7, 2024

how to strip the domain name in python?

Marko.Cain.23

Alex Martelli

Michael Bentley

Marko.Cain.23

Marko.Cain.23

Marc 'BlackJack' Rintsch

Marko.Cain.23

Steve Holden

Michael Bentley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads