RMail and RFC-2047

O

Oliver Cromm

I was playing around with the RMail package and I was missing RFC-2047
support. I found the "module Rfc2047" in
<20031204151316.GC849@jupp%gmx.de>
but noticed the following:

In the regex to discover encoded words:

| WORD = %r{=\?([!#$%&'*+-/0-9A-Z\\^\`a-z{|}~]+)\?([BbQq])\?([!->@-~]+)\?=} # :nodoc:

I had to change % to \% to run. Maybe it's just Cygwin.

The second thing is that the module doesn't correctly interpret the
"encoded-word - linear white space - encoded word" sequence, where
all the white space should be deleted.

So I added a regex to delete this whitespace before further processing:
module Rfc2047

WORD = %r{=\?([!#$\%&'*+-/0-9A-Z\\^\`a-z{|}~]+)\?([BbQq])\?([!->@-~]+)\?=} # :nodoc:
| WORDSEQ = %r{(=\?[!#$\%&'*+-/0-9A-Z\\^\`a-z{|}~]+\?[BbQq]\?[!->@-~]+\?=)\s*(=\?[!#$\%&'*+-/0-9A-Z\\^\`a-z{|}~]+\?[BbQq]\?[!->@-~]+\?=)}

[Comment skipped]
def Rfc2047.decode_to(target, from)
| from.gsub!(WORDSEQ, '\1\2')

out = from.gsub(WORD) do
|word|
charset, encoding, text = $1, $2, $3

It works so far, but I wonder whether '\s*' is the correct expression
and whether there is a more efficient way to do this.


I also observed that decoding of non-Western character sets (Win-1251
to
Big5) to UTF-8 didn't work. Does anybody already suspect why or do I
have
to track down the error further?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,146
Messages
2,570,832
Members
47,374
Latest member
anuragag27

Latest Threads

Top