Looking for browser emulator

R

Roy Smith

I've got to write some tests in python which simulate getting a page of
HTML from an http server, finding a link, clicking on it, and then
examining the HTML on the next page to make sure it has certain features.

I can use urllib to do the basic fetching, and lxml gives me the tools
to find the link I want and extract its href attribute. What's missing
is dealing with turning the href into an absolute URL that I can give to
urlopen(). Browsers implement all sorts of stateful logic such as "if
the URL has no hostname, use the same hostname as the current page".
I'm talking about something where I can execute this sequence of calls:

urlopen("http://foo.com:9999/bar")
urlopen("/baz")

and have the second one know that it needs to get
"http://foo.com:9999/baz". Does anything like that exist?

I'm really trying to stay away from Selenium and go strictly with
something I can run under unittest.
 
J

Jon Clements

I've got to write some tests in python which simulate getting a page of
HTML from an http server, finding a link, clicking on it, and then
examining the HTML on the next page to make sure it has certain features.

I can use urllib to do the basic fetching, and lxml gives me the tools
to find the link I want and extract its href attribute.  What's missing
is dealing with turning the href into an absolute URL that I can give to
urlopen().  Browsers implement all sorts of stateful logic such as "if
the URL has no hostname, use the same hostname as the current page".  
I'm talking about something where I can execute this sequence of calls:

urlopen("http://foo.com:9999/bar")
urlopen("/baz")

and have the second one know that it needs to get
"http://foo.com:9999/baz".  Does anything like that exist?

I'm really trying to stay away from Selenium and go strictly with
something I can run under unittest.

lxml.html.make_links_absolute() ?
 
J

Jon Clements

I've got to write some tests in python which simulate getting a page of
HTML from an http server, finding a link, clicking on it, and then
examining the HTML on the next page to make sure it has certain features.

I can use urllib to do the basic fetching, and lxml gives me the tools
to find the link I want and extract its href attribute.  What's missing
is dealing with turning the href into an absolute URL that I can give to
urlopen().  Browsers implement all sorts of stateful logic such as "if
the URL has no hostname, use the same hostname as the current page".  
I'm talking about something where I can execute this sequence of calls:

urlopen("http://foo.com:9999/bar")
urlopen("/baz")

and have the second one know that it needs to get
"http://foo.com:9999/baz".  Does anything like that exist?

I'm really trying to stay away from Selenium and go strictly with
something I can run under unittest.

lxml.html.make_links_absolute() ?
 
G

Gary Herron

I've got to write some tests in python which simulate getting a page of
HTML from an http server, finding a link, clicking on it, and then
examining the HTML on the next page to make sure it has certain features.

I can use urllib to do the basic fetching, and lxml gives me the tools
to find the link I want and extract its href attribute. What's missing
is dealing with turning the href into an absolute URL that I can give to
urlopen(). Browsers implement all sorts of stateful logic such as "if
the URL has no hostname, use the same hostname as the current page".
I'm talking about something where I can execute this sequence of calls:

urlopen("http://foo.com:9999/bar")
urlopen("/baz")

and have the second one know that it needs to get
"http://foo.com:9999/baz". Does anything like that exist?

I'm really trying to stay away from Selenium and go strictly with
something I can run under unittest.


Try mechanize
http://wwwsearch.sourceforge.net/mechanize/
billed as
Stateful programmatic web browsing in Python.


I handles clicking on links, cookies, logging in/out, and filling in of
forms in the same way as a "real" browser, but it's all under
programmatic control from Python.


In Ubuntu, it's the python-mechanize package.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,816
Latest member
SapanaCarpetStudio

Latest Threads

Top