J
James Edward Gray II
Here's the short-story on the current situation with our mailing list =20=
to Usenet gateway:
* Our Usenet host rejects multipart/alternative messages
because they are technically illegal Usenet posts
* This means that some emails do not reach comp.lang.ruby
(several messages each day according to the logs)
* We don't like this
To solve this, we want to enhance the gateway to convert multipart/=20
alternative messages into something we can legally post to Usenet. I =20=
have two thoughts on this strategy:
1. If possible, we should gather all text/plain portions of an email =20=
and post those with a content-type of text/plain
2. If that fails, we can just post the original body but force the =20
content-type to text/plain for maximum compatibility
Now I need all of you email and Usenet experts to tell me if that's a =20=
sane strategy. If another approach would be better, please clue me in.
I've pretty much made it this far. The code at the bottom of this =20
message is the mail_to_news.rb script used by the gateway rewritten =20
using this strategy.
If you aren't familiar with the gateway code, you can get details =20
from the articles at:
http://blog.grayproductions.net/categories/the_gateway
There's one problem left I know I haven't solved correctly. Help me =20
figure out a decent strategy for this last piece and we can deploy =20
the new code.
The outstanding issue is how to handle character sets for the =20
constructed message. You'll see in the code below that I just pull =20
the charset param from the original message, but after looking at a =20
few messages, I realize that this doesn't make sense. For example, =20
here are the relevant portions of a recent post that wasn't gated =20
correctly:
Content-Type: multipart/alternative; boundary=3DApple-Mail-18-445454026=
--Apple-Mail-18-445454026
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=3DUS-ASCII;
delsp=3Dyes;
format=3Dflowed
As you can see, the overall email doesn't have a charset but each =20
text portion can. If we are going to merge these parts, what's the =20
best strategy for handling the charset?
I thought of trying to convert them all to UTF-8 with Iconv, but I'm =20
not sure what to do if a type doesn't declare a charset or when Iconv =20=
chokes on what is declared? Please share your opinions.
If you are feeling really adventurous, rewrite the relevant portion =20
of the code below which I will bracket with a FIX ME comments.
Here's the script:
#!/usr/bin/env ruby
# written by James Edward Gray II <[email protected]>
$KCODE =3D "u"
GATEWAY_DIR =3D File.join(File.dirname(__FILE__), "..").freeze
$LOAD_PATH << File.join(GATEWAY_DIR, "config") << File.join=20
(GATEWAY_DIR, "lib")
require "tmail"
require "servers_config"
require "nntp"
require "logger"
require "timeout"
# prepare log
log =3D Logger.new(ARGV.shift || $stdout)
log.datetime_format =3D "%Y-%m-%d %H:%M "
# build incoming and outgoing message object
incoming =3D TMail::Mail.parse($stdin.read)
outgoing =3D TMail::Mail.new
# skip any flagged messages
if incoming["X-Rubymirror"].to_s =3D=3D "yes"
log.info "Skipping message ##{incoming.message_id}, sent by =20
news_to_mail"
exit
elsif incoming["X-Spam-Status"].to_s =3D~ /\AYes/
log.info "Ignoring Spam ##{incoming.message_id}: " +
"#{incoming.subject}=96#{incoming.from}"
exit
end
# only allow certain headers through
%w[from subject in_reply_to transfer_encoding date].each do |header|
outgoing.send("#{header}=3D", incoming.send(header))
end
outgoing.message_id =3D incoming.message_id.sub(/\.+>$/, ">")
%w[X-ML-Name X-Mail-Count X-X-Sender].each do |header|
outgoing[header] =3D incoming[header].to_s if incoming.key?header
end
# doctor headers for Ruby Talk
outgoing.references =3D if incoming.key? "References"
incoming.references
else
if incoming.key? "In-Reply-To"
incoming.reply_to
else
if incoming.subject =3D~ /^Re:/
outgoing.reply_to =3D =
"<this_is_a_dummy_message-id@rubygateway>"
end
end
end
outgoing["X-Ruby-Talk"] =3D incoming.message_id
outgoing["X-Received-From"] =3D <<END_GATEWAY_DETAILS.gsub(/\s+/, " ")
This message has been automatically forwarded from the ruby-talk =20
mailing list by
a gateway at #{ServersConfig::NEWSGROUP}. If it is SPAM, it did not =20
originate at
#{ServersConfig::NEWSGROUP}. Please report the original sender, and =20
not us.
Thanks! For more details about this gateway, please visit:
http://blog.grayproductions.net/categories/the_gateway
END_GATEWAY_DETAILS
outgoing["X-Rubymirror"] =3D "Yes"
# translate the body of the message, if needed
if incoming.multipart? and incoming.sub_type =3D=3D "alternative"
### FIX ME ###
# handle multipart/alternative messages
# extract body
body =3D ""
extract_text =3D lambda do |message_or_part|
if message_or_part.multipart?
message_or_part.each_part { |part| extract_text[part] }
elsif message_or_part.content_type =3D=3D "text/plain"
body +=3D message_or_part.body
end
end
extract_text[incoming]
if body.empty?
outgoing.body =3D "Note: the content-type of this message was =20
altered by " +
"the gateway.\n\n#{incoming.body}"
else
outgoing.body =3D "Note: non-text portions of this message were =20=
stripped " +
"by the gateway.\n\n#{body}"
end
# set the content type of the new message
outgoing.set_content_type( "text", "plain",
"charset" =3D> incoming.type_param=20
("charset") )
### END FIX ME ###
else
%w[content_type body].each do |header|
outgoing.send("#{header}=3D", incoming.send(header))
end
end
log.info "Sending message ##{incoming.message_id}: " +
"#{incoming.subject}=96#{incoming.from}=85"
log.info "Message looks like:\n#{outgoing.encoded}"
# connect to NNTP host
begin
nntp =3D nil
Timeout.timeout(30) do
nntp =3D Net::NNTP.new( ServersConfig::NEWS_SERVER,
Net::NNTP::NNTP_PORT,
ServersConfig::NEWS_USER,
ServersConfig::NEWS_PASS )
end
rescue Timeout::Error
log.error "The NNTP connection timed out"
exit -1
rescue
log.fatal "Unable to establish connection to NNTP host: =
#{$!.message}"
exit -1
end
# attempt to send newsgroup post
unless $DEBUG
begin
result =3D nil
Timeout.timeout(30) { result =3D nntp.post(outgoing.encoded) }
rescue Timeout::Error
log.error "The NNTP post timed out"
exit -1
rescue
log.fatal "Unable to post to NNTP host: #{$!.message}"
exit -1
end
log.info "=85 Sent. nntp.post() result: #{result}"
end
__END__
Thanks for the help.
James Edward Gray II
to Usenet gateway:
* Our Usenet host rejects multipart/alternative messages
because they are technically illegal Usenet posts
* This means that some emails do not reach comp.lang.ruby
(several messages each day according to the logs)
* We don't like this
To solve this, we want to enhance the gateway to convert multipart/=20
alternative messages into something we can legally post to Usenet. I =20=
have two thoughts on this strategy:
1. If possible, we should gather all text/plain portions of an email =20=
and post those with a content-type of text/plain
2. If that fails, we can just post the original body but force the =20
content-type to text/plain for maximum compatibility
Now I need all of you email and Usenet experts to tell me if that's a =20=
sane strategy. If another approach would be better, please clue me in.
I've pretty much made it this far. The code at the bottom of this =20
message is the mail_to_news.rb script used by the gateway rewritten =20
using this strategy.
If you aren't familiar with the gateway code, you can get details =20
from the articles at:
http://blog.grayproductions.net/categories/the_gateway
There's one problem left I know I haven't solved correctly. Help me =20
figure out a decent strategy for this last piece and we can deploy =20
the new code.
The outstanding issue is how to handle character sets for the =20
constructed message. You'll see in the code below that I just pull =20
the charset param from the original message, but after looking at a =20
few messages, I realize that this doesn't make sense. For example, =20
here are the relevant portions of a recent post that wasn't gated =20
correctly:
Content-Type: multipart/alternative; boundary=3DApple-Mail-18-445454026=
--Apple-Mail-18-445454026
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=3DUS-ASCII;
delsp=3Dyes;
format=3Dflowed
As you can see, the overall email doesn't have a charset but each =20
text portion can. If we are going to merge these parts, what's the =20
best strategy for handling the charset?
I thought of trying to convert them all to UTF-8 with Iconv, but I'm =20
not sure what to do if a type doesn't declare a charset or when Iconv =20=
chokes on what is declared? Please share your opinions.
If you are feeling really adventurous, rewrite the relevant portion =20
of the code below which I will bracket with a FIX ME comments.
Here's the script:
#!/usr/bin/env ruby
# written by James Edward Gray II <[email protected]>
$KCODE =3D "u"
GATEWAY_DIR =3D File.join(File.dirname(__FILE__), "..").freeze
$LOAD_PATH << File.join(GATEWAY_DIR, "config") << File.join=20
(GATEWAY_DIR, "lib")
require "tmail"
require "servers_config"
require "nntp"
require "logger"
require "timeout"
# prepare log
log =3D Logger.new(ARGV.shift || $stdout)
log.datetime_format =3D "%Y-%m-%d %H:%M "
# build incoming and outgoing message object
incoming =3D TMail::Mail.parse($stdin.read)
outgoing =3D TMail::Mail.new
# skip any flagged messages
if incoming["X-Rubymirror"].to_s =3D=3D "yes"
log.info "Skipping message ##{incoming.message_id}, sent by =20
news_to_mail"
exit
elsif incoming["X-Spam-Status"].to_s =3D~ /\AYes/
log.info "Ignoring Spam ##{incoming.message_id}: " +
"#{incoming.subject}=96#{incoming.from}"
exit
end
# only allow certain headers through
%w[from subject in_reply_to transfer_encoding date].each do |header|
outgoing.send("#{header}=3D", incoming.send(header))
end
outgoing.message_id =3D incoming.message_id.sub(/\.+>$/, ">")
%w[X-ML-Name X-Mail-Count X-X-Sender].each do |header|
outgoing[header] =3D incoming[header].to_s if incoming.key?header
end
# doctor headers for Ruby Talk
outgoing.references =3D if incoming.key? "References"
incoming.references
else
if incoming.key? "In-Reply-To"
incoming.reply_to
else
if incoming.subject =3D~ /^Re:/
outgoing.reply_to =3D =
"<this_is_a_dummy_message-id@rubygateway>"
end
end
end
outgoing["X-Ruby-Talk"] =3D incoming.message_id
outgoing["X-Received-From"] =3D <<END_GATEWAY_DETAILS.gsub(/\s+/, " ")
This message has been automatically forwarded from the ruby-talk =20
mailing list by
a gateway at #{ServersConfig::NEWSGROUP}. If it is SPAM, it did not =20
originate at
#{ServersConfig::NEWSGROUP}. Please report the original sender, and =20
not us.
Thanks! For more details about this gateway, please visit:
http://blog.grayproductions.net/categories/the_gateway
END_GATEWAY_DETAILS
outgoing["X-Rubymirror"] =3D "Yes"
# translate the body of the message, if needed
if incoming.multipart? and incoming.sub_type =3D=3D "alternative"
### FIX ME ###
# handle multipart/alternative messages
# extract body
body =3D ""
extract_text =3D lambda do |message_or_part|
if message_or_part.multipart?
message_or_part.each_part { |part| extract_text[part] }
elsif message_or_part.content_type =3D=3D "text/plain"
body +=3D message_or_part.body
end
end
extract_text[incoming]
if body.empty?
outgoing.body =3D "Note: the content-type of this message was =20
altered by " +
"the gateway.\n\n#{incoming.body}"
else
outgoing.body =3D "Note: non-text portions of this message were =20=
stripped " +
"by the gateway.\n\n#{body}"
end
# set the content type of the new message
outgoing.set_content_type( "text", "plain",
"charset" =3D> incoming.type_param=20
("charset") )
### END FIX ME ###
else
%w[content_type body].each do |header|
outgoing.send("#{header}=3D", incoming.send(header))
end
end
log.info "Sending message ##{incoming.message_id}: " +
"#{incoming.subject}=96#{incoming.from}=85"
log.info "Message looks like:\n#{outgoing.encoded}"
# connect to NNTP host
begin
nntp =3D nil
Timeout.timeout(30) do
nntp =3D Net::NNTP.new( ServersConfig::NEWS_SERVER,
Net::NNTP::NNTP_PORT,
ServersConfig::NEWS_USER,
ServersConfig::NEWS_PASS )
end
rescue Timeout::Error
log.error "The NNTP connection timed out"
exit -1
rescue
log.fatal "Unable to establish connection to NNTP host: =
#{$!.message}"
exit -1
end
# attempt to send newsgroup post
unless $DEBUG
begin
result =3D nil
Timeout.timeout(30) { result =3D nntp.post(outgoing.encoded) }
rescue Timeout::Error
log.error "The NNTP post timed out"
exit -1
rescue
log.fatal "Unable to post to NNTP host: #{$!.message}"
exit -1
end
log.info "=85 Sent. nntp.post() result: #{result}"
end
__END__
Thanks for the help.
James Edward Gray II