JavaMail and base64 problem

M

Martin Gregorie

When I create and store a mail message by constructing a MimeMessage from
an InputStream and writing it to an mbox file using the mstor provider
some or all of its content will be encoded as base64 if the message
contains the "Content-Transfer-Encoding: base64" header.

However, when the messages are read back into Message objects, using
JavaMail and the mstor provider, some base64 encoded parts of the content
cause an exception. The exception message says that the base64 encoding
length isn't a multiple of 4 bytes.

I've never seen this error when reading messages from mbox files created
by Postfix - only when the message has been written by JavaMail/mstor and
then read back by it.

I'm using JavaMail 1.4.1 and mstor 0.9.11 on a Linux (Fedora 8) system.

Has anybody else seen this problem and, if so, how did you get round it?
 
M

Martin Gregorie

An example of the before and after versions of the Base64 text might be
useful here. If the length is not a multiple of 4 then either it is
being truncated or something is being added to the end.
Difficult, as they are buried in very large files, but I'll see what I
can do.

If the decode fails the message gets spat into a single message mbox file
where I can look at it. The failing region in this file is plain text in
the examples I've looked at, not base64 encoded, and doesn't seem to be
truncated. The exception can occur at the end of an attachment
(attachment has "Content-Transfer_Encoding: base64" and the rest of the
message is present) or at the end of the message body (message header has
"Content-Transfer_Encoding: base64").

The main point is that the files are not modified between being written
by program A and being read into program B, so I wasn't expecting any
problem of this type because both programs use the same set of supporting
class libraries and are run in the same environment.

I assume that the base64 encode/decode is in the depths of JavaMail
rather than in the mstor provider code and that, if the input bytestring
wasn't long enough it would be padded with '=' so its length is a
multiple of 4 bytes. Do these sound like reasonable assumptions?
 
M

Martin Gregorie

Base64 expects you to pad to a multiple of 4 with =
I know - I've read the RFC.

The problem is that I'm building the MimeMessage object for output with
the MimeMessage(Folder, InputStream) constructor which accepts the entire
message (headers and all) from the InputStream and parses them. To insert
the padding myself I'd have to parse the stream to decide if there is a
Content-Transfer-Encoding header that says base64 encoding is required
and, if so, find the right place(es) to add the padding into the stream.
They could be in the middle, for base64 attachments, or at the end, for a
single part base64 message. This is the job that I'd expect the
MimeMessage constructor to handle, since its already parsing the input
stream - at least that's what the Javadocs say it does.

Neither MimeMessage nor MimeBodyPart give me any control over content
transfer encoding apart from manipulating the list of headers. The only
way I can get that control involves operating at a very low level and
directly using the methods in MimeUtility but I really want to avoid
doing that. I'm currently storing the headers and content for each
message in a single database row as two CLOB fields, which makes for a
nice, simple schema. I really don't want to split the parts of the
message content out because that would require me to use a recursive
structure of arbitrary depth (message parts can be multi-parts and so ad
infinitum). That is do-able, of course, but makes database handling
considerably more complex so its to be avoided if possible. Apart from
this one extract/reload task I have nothing else that would be helped by
using a more complex schema.
 
A

Arne Vajhøj

Martin said:
I know - I've read the RFC.

As far as I know, then the MIME standard does not mandate length to
be multiple of 4 and MimeUtility class does not implement it either.

It is standard base64 and implemented into MimeUtility that no
lines are longer than 76 bytes.

When \r\n are inserted then the length are no longer a multipla
of 4 (until another is inserted).

That does not explain your real problem though.

Arne
 
J

Joshua Cranmer

Arne said:
As far as I know, then the MIME standard does not mandate length to
be multiple of 4 and MimeUtility class does not implement it either.

<http://tools.ietf.org/html/rfc4648#section-3.2> says:
In some circumstances, the use of padding ("=") in base-encoded data is
not required or used. In the general case, when assumptions about the
size of transported data cannot be made, padding is required to yield
correct decoded data.

Implementations MUST include appropriate pad characters at the end of
encoded data unless the specification referring to this document
explicitly states otherwise.

<http://tools.ietf.org/html/rfc2045#section-6.8> says:
Special processing is performed if fewer than 24 bits are available at
the end of the data being encoded. A full encoding quantum is always
completed at the end of a body. When fewer than 24 input bits are
available in an input group, zero bits are added (on the right) to form
an integral number of 6-bit groups. Padding at the end of the data is
performed using the "=" character.

In other words, you are required to pad.
 
A

Arne Vajhøj

Joshua said:
<http://tools.ietf.org/html/rfc4648#section-3.2> says:
In some circumstances, the use of padding ("=") in base-encoded data is
not required or used. In the general case, when assumptions about the
size of transported data cannot be made, padding is required to yield
correct decoded data.

Implementations MUST include appropriate pad characters at the end of
encoded data unless the specification referring to this document
explicitly states otherwise.

<http://tools.ietf.org/html/rfc2045#section-6.8> says:
Special processing is performed if fewer than 24 bits are available at
the end of the data being encoded. A full encoding quantum is always
completed at the end of a body. When fewer than 24 input bits are
available in an input group, zero bits are added (on the right) to form
an integral number of 6-bit groups. Padding at the end of the data is
performed using the "=" character.

In other words, you are required to pad.

I think you should have read the rest of my post.

It is required to pad. But it may insert \r\n to
limit line length to 76. In which case the length
is not always a multipla of 4.

Arne
 
M

Martin Gregorie

Implementations MUST include appropriate pad characters at the end of
encoded data unless the specification referring to this document
explicitly states otherwise.

<http://tools.ietf.org/html/rfc2045#section-6.8> says: Special
processing is performed if fewer than 24 bits are available at the end
of the data being encoded. A full encoding quantum is always completed
at the end of a body. When fewer than 24 input bits are available in an
input group, zero bits are added (on the right) to form an integral
number of 6-bit groups. Padding at the end of the data is performed
using the "=" character.

In other words, you are required to pad.
Some clarification:

I do the following with all the mail:

Postfix->mbox->Javamail.MimeMessage->headers -->}fields in a DB row
->body -->}

If I do this:
DB->headers+body->attachment->JavaMail.Mimemessage --> Postfix --> MUA

the retrieved message is correctly formatted and readable as an
attachment. Base64 attachments (e.g.) images are viewable.

However, if I do this:
DB->headers+body->JavaMail.MimeMessage --> mstor provider --> mbox file

mbox file --> mstor provider -> JavaMail.MimeMessage

then a small proportion of the messages that originally contained Base64
attachments or a non-Mime Base64-encoded body (some M$ MUAs do this) will
fail to be read back from the mbox file with the "Base64 not a multiple
of 4 bytes" exception. In both cases the complete MIME body is treated as
a data stream and not parsed by my code.

Headers+body means that I concatenate the headers and body to create a
single InputStream that's passed to the MimeMessage constructor, which is
then written to an mbox file by JavaMail using the mstor provider. I have
to ensure that the stream ends with CRLFCRLF before constructing the
MimeMessage. If I omit this step the blank line between the message body
and the next message's 'From ' envelope header is omitted and mstor
becomes incapable of parsing the mbox file, so quite its possible that
the provider is doing Base64 encode/decode as well. My guess is that the
provider module is causing the problem since messages sent to Postfix via
pop3 provider doesn't cause problems, but a small fraction of the
messages sent to the mbox file via the mstor provider are not readable by
mstor+JavaMail.

As I said at the start of this thread, has anybody seen this problem
before? Alternatively, has anybody done something similar and NOT seen
the problem?

he purpose of this post is to see if anybody on this newsgroup can
confirm or deny this guess.

IOW I want to know if the Base64 encoding is handled by the mstor module
or within JavaMail. Once I can determine that I can start a dialogue with
the appropriate author.


TIA,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,744
Latest member
CortneyMcK

Latest Threads

Top