galileo228 said:
Code:
fileHandle = open('/Users/Matt/Documents/python/results.txt','r')
names = fileHandle.readlines()
Now, the 'names' list has values looking like this: ['(e-mail address removed)
\n', '(e-mail address removed)\n', etc]. So I ran the following code:
And that did the trick! 'Names' now has ['aaa12', 'bbb34', etc].
Obviously this only worked because all of the domain names were the
same. If they were not then based on your comments and my own
research, I would've had to use regex and the split(), which looked
massively complicated to learn.
The complexities stemmed from several factors that, with more
details, could have made the solutions less daunting:
(a) you mentioned "finding" the email addresses -- this makes
it sound like there's other junk in the file that has to be
sifted through to find "things that look like an email address".
If the sole content of the file is lines containing only email
addresses, then "find the email address" is a bit like [1]
(b) you omitted the detail that the domains are all the same.
Even if they're not the same, (a) reduces the problem to a much
easier task:
s = set()
for line in file('results.txt'):
s.add(line.rsplit('@', 1)[0].lower())
print s
If it was previously a CSV or tab-delimited file, Python offers
batteries-included processing to make it easy:
import csv
f = file('results.txt', 'rb')
r = csv.DictReader(f) # CSV
# r = csv.DictReader(f, delimiter='\t') # tab delim
s = set()
for row in r:
s.add(row['Email'].lower())
f.close()
or even
f = file(...)
r = csv.DictReader(...)
s = set(row['Email'].lower() for row in r)
f.close()
Hope this gives you more ideas to work with.
-tkc
[1]
http://jacksmix.files.wordpress.com/2007/05/findx.jpg