O
Oliver Cromm
I was playing around with the RMail package and I was missing RFC-2047
support. I found the "module Rfc2047" in
<20031204151316.GC849@jupp%gmx.de>
but noticed the following:
In the regex to discover encoded words:
| WORD = %r{=\?([!#$%&'*+-/0-9A-Z\\^\`a-z{|}~]+)\?([BbQq])\?([!->@-~]+)\?=} # :nodoc:
I had to change % to \% to run. Maybe it's just Cygwin.
The second thing is that the module doesn't correctly interpret the
"encoded-word - linear white space - encoded word" sequence, where
all the white space should be deleted.
So I added a regex to delete this whitespace before further processing:
[Comment skipped]
It works so far, but I wonder whether '\s*' is the correct expression
and whether there is a more efficient way to do this.
I also observed that decoding of non-Western character sets (Win-1251
to
Big5) to UTF-8 didn't work. Does anybody already suspect why or do I
have
to track down the error further?
support. I found the "module Rfc2047" in
<20031204151316.GC849@jupp%gmx.de>
but noticed the following:
In the regex to discover encoded words:
| WORD = %r{=\?([!#$%&'*+-/0-9A-Z\\^\`a-z{|}~]+)\?([BbQq])\?([!->@-~]+)\?=} # :nodoc:
I had to change % to \% to run. Maybe it's just Cygwin.
The second thing is that the module doesn't correctly interpret the
"encoded-word - linear white space - encoded word" sequence, where
all the white space should be deleted.
So I added a regex to delete this whitespace before further processing:
module Rfc2047
WORD = %r{=\?([!#$\%&'*+-/0-9A-Z\\^\`a-z{|}~]+)\?([BbQq])\?([!->@-~]+)\?=} # :nodoc:
| WORDSEQ = %r{(=\?[!#$\%&'*+-/0-9A-Z\\^\`a-z{|}~]+\?[BbQq]\?[!->@-~]+\?=)\s*(=\?[!#$\%&'*+-/0-9A-Z\\^\`a-z{|}~]+\?[BbQq]\?[!->@-~]+\?=)}
[Comment skipped]
def Rfc2047.decode_to(target, from)
| from.gsub!(WORDSEQ, '\1\2')
out = from.gsub(WORD) do
|word|
charset, encoding, text = $1, $2, $3
It works so far, but I wonder whether '\s*' is the correct expression
and whether there is a more efficient way to do this.
I also observed that decoding of non-Western character sets (Win-1251
to
Big5) to UTF-8 didn't work. Does anybody already suspect why or do I
have
to track down the error further?