searching string url

G

googlinggoogler

Hiya,

Im trying to find a method of searching a html file (ive grabbed it
with FancyURLopener), basically in the html file there is a series of
links in the following format -

A HREF="../../company/11/13/820.htm">some name</A

so I want to search the file for "../../company/" and then get the 13
charecters after it so that I can work it as a URL, if you see what I
mean?

Very gratefully

David
 
D

Devan L

Sounds somewhat like homework. So I won't just give you a code
solution. Use the regular expression(re) module to match the urls.
 
R

Robert Kern

Hiya,

Im trying to find a method of searching a html file (ive grabbed it
with FancyURLopener), basically in the html file there is a series of
links in the following format -

A HREF="../../company/11/13/820.htm">some name</A

so I want to search the file for "../../company/" and then get the 13
charecters after it so that I can work it as a URL, if you see what I
mean?

You want to use BeautifulSoup.

http://www.crummy.com/software/BeautifulSoup/

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 
G

googlinggoogler

Thanks for the rapid replys, I cracked the problem 15 seconds after
posting here, doh!

Anyway to the orginally replier - I wish it was homework ;-), that
would mean I wouldnt be trying to find myself a job as a recent
graduate... I decided to crawl something similar to the yellow pages
(do you have them in the US?) for my select area and then find all
pages corresponding to my ideal field of work, and grab their details
into a txt file.

Trouble is I keep thinking of cool new bits to add, python truely is a
beautifal language. Ideally would like to somehow write all the
information into a word mail merge - but I think that requires more
research!

Cheers!

David
 
P

Peter Hansen

Anyway to the orginally replier - I wish it was homework ;-), that
would mean I wouldnt be trying to find myself a job as a recent
graduate... I decided to crawl something similar to the yellow pages

Sounds like a useful task, and a good way to learn more about Python
while you're job-hunting, as well as helping you with the hunt.
> for my select area and then find all
pages corresponding to my ideal field of work, and grab their details
into a txt file.

Trouble is I keep thinking of cool new bits to add, python truely is a
beautifal language. Ideally would like to somehow write all the
information into a word mail merge - but I think that requires more
research!

This sounds suspiciously like you are thinking some kind of "shotgun"
approach to sending out your resume might be a good idea. If that's so,
as an employer, I strongly recommend against it. Instead, spend the
time you would otherwise have spent building this tool (or at least the
word mail merge part) and dig deeper into those companies you do find,
filtering out the ones you wouldn't want to work at. From the
remainders, adapt your cover letter to *custom-fit* each opportunity.

Mass-mailed resumes with generic cover letters are a good way to kill
trees but not a particular effective way to get noticed by an employer,
at least not noticed in a good way...

-Peter
 
A

Aahz

Mass-mailed resumes with generic cover letters are a good way to kill
trees but not a particular effective way to get noticed by an employer,
at least not noticed in a good way...

Excellent advice! Speaking of which, my company has an ad up on
www.python.org in the jobs area. Look for Printra. Craigslist is
supposed to be a good place, but we've been getting spammed pretty
heavily from there.
 
M

Mike Meyer

Anyway to the orginally replier - I wish it was homework ;-), that
would mean I wouldnt be trying to find myself a job as a recent
graduate... I decided to crawl something similar to the yellow pages
(do you have them in the US?) for my select area and then find all
pages corresponding to my ideal field of work, and grab their details
into a txt file.

I'm actually working on a general framework for doing this kind of
thing. It's designed specifically for walking through a collection of
pages from a web-based search engine, applying extra criteria to the
results, and then running a bit of code on any that pass that check.

It works for one site, but my attempt to try it on a second site
turned up a fundamental flaw. My first site used full URLs for
everything, so I happily passed soup between various methods. The
second site used relative urls for everything, and it all broke.
Trouble is I keep thinking of cool new bits to add, python truely is a
beautifal language. Ideally would like to somehow write all the
information into a word mail merge - but I think that requires more
research!

Given a working scrape, the only extra work is how to get it into a
mail merge. That depends on your platform and the software you're
using to send the mail. Shouldn't be all that hard.

<mike
 
G

googlinggoogler

Cheers for all your replies,

Peter Hansen: Couldnt agree more with you about not effectivly spamming
companies with my resume, thats why im crawling the local newspapers
website to - to find scrapes of information about companies :)

I Wasnt planning to automate everything, more the boring address and
names really - I hate typing that stuff up. I was planning orginally
just to compile a list of companies that appear suitable and then go
through them by hand.

Cheers

Dave
 
P

Paul McGuire

Dave -

Check out the URL extractor example that ships with pyparsing. It
handles many kinds of URL formats.

Download pyparsing at pyparsing.sourceforge.net.

-- Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,262
Messages
2,571,310
Members
47,977
Latest member
MillaDowdy

Latest Threads

Top