Help with a regexp please

N

Nigel Scott

Hi

I am writing a piece of perl for processing emails and part of the
process involves finding the boundaries of multiple MIME parts.

I am trying to extract the boundary from the headers using a pattern
like this:

my $pattern = ".*boundary *= *[\'\"]*(.*)[\'\"]*.*";

This is to cover the cases where the boundary itself may be contained in
double quotes, single quotes or no quotes at all. For some reason
though, if the boundary is contained double quotes, eg.

Content-Type: multipart/mixed;
boundary="----=_Part_174034_7372797.1070374686532"

and I use:
my $boundary =~ s/$pattern/$1/is;

$boundary becomes ----=_Part_174034_7372797.1070374686532'
with an extra single quote on the end.

I have tried looking at various perl and regexp tutorials, but I can't
work out what is wrong with my pattern.

Any help appreciated,
Nige.
 
B

Brian McCauley

Nigel Scott said:
I am writing a piece of perl for processing emails and part of the
process involves finding the boundaries of multiple MIME parts.

There are modules to do that, you know.
my $pattern = ".*boundary *= *[\'\"]*(.*)[\'\"]*.*";
^^^^^^^^^^^

Firstly it's easier to see what's what if you qute regex using qr//
not qq().

my $pattern = qr/.*boundary *= *['"]*(.*)['"]*.*/;

Now let's focus on just one bit of that

/['"]*(.*)['"]*/

If you have two greedy subexpressions in a regex the first one gets
first bite and the character class ['"] is a subset of the character
class . so the above is equivalent to:

/['"]*(.*)/

Perhaps you meant

/(['"]?)(.*)\1/

For real examples of parsing MIME headers see the source code of the
modules you should be using anyhow.

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
 
N

Nige

Brian said:
If you have two greedy subexpressions in a regex the first one gets
first bite and the character class ['"] is a subset of the character
class . so the above is equivalent to:

/['"]*(.*)/

Perhaps you meant

/(['"]?)(.*)\1/

For real examples of parsing MIME headers see the source code of the
modules you should be using anyhow.

Hi Brian - thanks for the reply.

I have actually installed the MIME::parser modules and attempted to use
them, however I end up with empty files for each part of the message,
and only certain parts are written. The debug from the module simple
says something along the lines of "writing to file" and then finishes
with some timing stats. I don't have the exact output as I am now at
home. Also, I don't really need the full MIME parsing functionality -
all I need is to extract the inline text/plain parts from the message,
hence my attempts you see above.

I've had a read up about greedy expressions and understand that my
pattern is wrong, so I'll give it another with a ? instead, tomorrow at
work.

Thanks again,
Nige
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,999
Messages
2,570,243
Members
46,838
Latest member
KandiceChi

Latest Threads

Top