Found a ruby bug in the URI class, what do I do?

B

Ben Johnson

I'm pretty sure this is a bug, and it seem so obvious that I'm thinking
I might be doing something wrong. Check it out:

=> true
URI::InvalidURIError: the scheme http does not accept registry part:
whatever_again.domain.com (or bad hostname?)
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/uri/generic.rb:195:in
`initialize'
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/uri/http.rb:78:in
`initialize'
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/uri/common.rb:488:in
`new'
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/uri/common.rb:488:in
`parse'
from (irb):45


When you add an underscore the sub domain it raises an exception, even
though that is a perfectly valid host name. I dug deeper and found the
following:

=> ["http", nil, "whatever.domain.com", nil, nil, "", nil, nil, nil]
=> ["http", nil, nil, nil, "whatever_again.domain.com", "", nil, nil,
nil]


Notice its not recognizing the second URI as a host, which is really
strange, its actually saying its the registry.

Anyways, can anyone shed some light on this or do you have any ideas how
I can easily patch this?

Check out the split method source:
http://www.ruby-doc.org/stdlib/libdoc/uri/rdoc/classes/URI.html#M009379

That doesn't look to easy to patch since its one big regular expression
that breaks up the string.

Thanks for your help!
 
H

Hassan Schroeder

URI::InvalidURIError: the scheme http does not accept registry part:
whatever_again.domain.com (or bad hostname?)
When you add an underscore the sub domain it raises an exception, even
though that is a perfectly valid host name.

No, it's not. Check the DNS RFCs: A-Z, a-z, 0-9 and the hyphen are
the only legal characters.

FWIW,
 
B

Brian Candler

Hassan said:
No, it's not. Check the DNS RFCs: A-Z, a-z, 0-9 and the hyphen are
the only legal characters.

That's not strictly a DNS limitation. However there are old (pre-DNS)
RFCs that say those are the only legal characters in a "hostname".

The key DNS RFCs are 1034 and 1035. RFC 1034 gives a very liberal
definition of a domain name (binary labels, each 1 to 63 bytes long,
maximum 255 bytes in total)

However it then makes a recommendation:

"3.5. Preferred name syntax

The DNS specifications attempt to be as general as possible in the rules
for constructing domain names. The idea is that the name of any
existing object can be expressed as a domain name with minimal changes.
However, when assigning a domain name for an object, the prudent user
will select a name which satisfies both the rules of the domain system
and any existing rules for the object, whether these rules are published
or implied by existing programs.

For example, when naming a mail domain, the user should satisfy both the
rules of this memo and those in RFC-822. When creating a new host name,
the old rules for HOSTS.TXT should be followed. This avoids problems
when old software is converted to use domain names.

The following syntax will result in fewer problems with many
applications that use domain names (e.g., mail, TELNET)."

(then gives a BNF description of letters/digits/hyphens)

In practice - there *are* hosts out there which have underscores in
their hostnames, so a library which forbids this on the basis of ancient
HOSTS.TXT rules causes real problems.

RFC 2822 (for E-mail addresses) is much more liberal.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,967
Messages
2,570,148
Members
46,694
Latest member
LetaCadwal

Latest Threads

Top