B
Bart Van der Donck
Hello,
I have been assigned a task to filter out an email address from the
body of a (.msg) source file.
The source file looks odd and displays differently in various
plaintext readers. It looks like some sort of half binary / half ascii
format (including the headers). The body of the file is more-or-less
consistent. The address to be extracted is in the following format:
"- n a m e @ h o s t . c o m "
All text in the source file is with such spaces between.
Spaces can be displayed like EOL, space or nothing. Binary characters
seem to be inserted randomly; sometimes I can recognize a pattern of a
repeated string. Maybe someone is familiar with this format ? The
messages were saved from MS Outlook.
I tried many variants, my best shot goes to:
if (/(-)(\s\s\s)(.+)(@)(.+)(\.)(.+)(\s\s\s)/gs) { ...
But still no success. I was thinking of an encoding issue (Unicode/
UTF?), but the source file seems too different for that.
Thanks
I have been assigned a task to filter out an email address from the
body of a (.msg) source file.
The source file looks odd and displays differently in various
plaintext readers. It looks like some sort of half binary / half ascii
format (including the headers). The body of the file is more-or-less
consistent. The address to be extracted is in the following format:
"- n a m e @ h o s t . c o m "
All text in the source file is with such spaces between.
Spaces can be displayed like EOL, space or nothing. Binary characters
seem to be inserted randomly; sometimes I can recognize a pattern of a
repeated string. Maybe someone is familiar with this format ? The
messages were saved from MS Outlook.
I tried many variants, my best shot goes to:
if (/(-)(\s\s\s)(.+)(@)(.+)(\.)(.+)(\s\s\s)/gs) { ...
But still no success. I was thinking of an encoding issue (Unicode/
UTF?), but the source file seems too different for that.
Thanks