OT: regex to find email

J

Josh Close

I've been trying to find a good regex to parse emails, but haven't
found any to my liking. I basically need to have

( r'[a-z0-9\.\-\_]@[a-z0-9\.\-\_]', re.IGNORECASE )

but the first part can't start with .-_ and the last part has to have
a . in it (first/last being before/after the @).

Thanks.

-Josh
 
P

Peter Hansen

Josh said:
I've been trying to find a good regex to parse emails, but haven't
found any to my liking. I basically need to have

( r'[a-z0-9\.\-\_]@[a-z0-9\.\-\_]', re.IGNORECASE )

but the first part can't start with .-_ and the last part has to have
a . in it (first/last being before/after the @).

Use this instead:

from email.Utils import parseaddr

See the docs for assistance.

-Peter
 
P

Peter Hansen

Josh said:
I've been trying to find a good regex to parse emails, but haven't
found any to my liking. I basically need to have

( r'[a-z0-9\.\-\_]@[a-z0-9\.\-\_]', re.IGNORECASE )

but the first part can't start with .-_ and the last part has to have
a . in it (first/last being before/after the @).

Ignore my response (which I cancelled so some folks won't see it
anyway). 'Twas bad advice; I didn't read the post. :-(

-Peter
 
J

Josh Close

Josh said:
I've been trying to find a good regex to parse emails, but haven't
found any to my liking. I basically need to have

( r'[a-z0-9\.\-\_]@[a-z0-9\.\-\_]', re.IGNORECASE )

but the first part can't start with .-_ and the last part has to have
a . in it (first/last being before/after the @).

Use this instead:

from email.Utils import parseaddr

The docs say this.

parseaddr(address)
Parse address - which should be the value of some
address-containing field such as To: or Cc: - into its constituent
realname and email address parts. Returns a tuple of that information,
unless the parse fails, in which case a 2-tuple of ('', '') is
returned.

I need to find all address in a message, not just in the headers. It
looks like for all these methods you have to specify To: From: CC: etc
(i guess that wouldn't necessarily be headers then).

If I missed something there, let me know, 'cause this would be a lot easier.

-Josh
 
J

Jorgen Grahn

I've been trying to find a good regex to parse emails, but haven't
found any to my liking. I basically need to have

( r'[a-z0-9\.\-\_]@[a-z0-9\.\-\_]', re.IGNORECASE )

but the first part can't start with .-_ and the last part has to have
a . in it (first/last being before/after the @).

I've seen no references to RFC 2822 in this thread ... please note that what
all these regexes catch is unlikely to be exactly the set of all valid RFC
2822 addresses.

A quick look suggests (among other things) that addresses may start with '-'
or '_' and /lots/ of other characters, and the domain part does not (of
course) need to contain a '.'.

People get more than annoyed when some input form tells them that their
email address is invalid ...

/Jorgen
 
F

Fredrik Lundh

Jorgen said:
I've seen no references to RFC 2822 in this thread ... please note that what
all these regexes catch is unlikely to be exactly the set of all valid RFC
2822 addresses.

the perl faq is also required reading:

http://www.perldoc.com/perl5.6/pod/perlfaq9.html#How-do-I-check-a-valid-mail-address-

Q. How do I check a valid mail address?

A. You can't, at least, not in real time. Bummer, eh?

Without sending mail to the address and seeing whether there's a human
on the other hand to answer you, you cannot determine whether a mail
address is valid.

what morally sound reasons are there to scrape mail addresses from text
documents, btw?

</F>
 
J

Josh Close

the perl faq is also required reading:

http://www.perldoc.com/perl5.6/pod/perlfaq9.html#How-do-I-check-a-valid-mail-address-

Q. How do I check a valid mail address?

A. You can't, at least, not in real time. Bummer, eh?

Without sending mail to the address and seeing whether there's a human
on the other hand to answer you, you cannot determine whether a mail
address is valid.

what morally sound reasons are there to scrape mail addresses from text
documents, btw?

Well, I do know that a lot of email hosting companies like hotmail,
yahoo, etc, have certain standards for user names which are probably a
lot more strict than RFC2822. That's what I was going by, but it could
possibly miss a few names/domains with it.

-Josh
 
C

Carl Scharenberg

Fredrik Lundh said:
the perl faq is also required reading:

http://www.perldoc.com/perl5.6/pod/perlfaq9.html#How-do-I-check-a-valid-mail-address-

Q. How do I check a valid mail address?

A. You can't, at least, not in real time. Bummer, eh?

Without sending mail to the address and seeing whether there's a human
on the other hand to answer you, you cannot determine whether a mail
address is valid.

what morally sound reasons are there to scrape mail addresses from text
documents, btw?

</F>

Just as an example: I run the mailing list for a dance club that is
only active during the academic year. So our first email each
September has dozens of bounces from no-longer-valid addresses that
need to be removed from the list. I just paste the email containing
all the bounce notification text into a file and use a regex to grab
all the email addresses into a list and generate the proper removal
commands for majordomo. It beats copy-pasting each bad email address
individually from the email containing big lists of bounced addresses.

Webscrapers suck, though. As soon as I put up a webpage with my email
address my spam volume shot way up. I need to replace it with a gif
showing my address.

Carl
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,206
Messages
2,571,069
Members
47,678
Latest member
Aniruddha Das

Latest Threads

Top