unsafe readline(), anything better?

Rob Muhlestein · Dec 28, 2006

Looking for a combination of readpartial() and readline() in order to
safely read a line of maxlen. IO.readline() appears to suffer from lacking
this maximum allowing DOS against servers reading HTTP headers, for
example, just by using sock.readline() alone, which is what
Net::HTTPResponse and others do:

1974 class << HTTPResponse
1975 def read_new(sock) #:nodoc: internal use only 1976 httpv,
code, msg = read_status_line(sock) 1977 res =
response_class(code).new(httpv, code, msg) 1978
each_response_header(sock) do |k,v| 1979 res.add_field k, v
1980 end
1981 res
1982 end
1983
1984 private
1985
1986 def read_status_line(sock) 1987 str = sock.readline
1988 m =
/\AHTTP(?:\/(\d+\.\d+))?\s+(\d\d\d)\s*(.*)\z/in.match(str) o r
1989 raise HTTPBadResponse, "wrong status line: #{str.dump}"
1990 m.captures
1991 end

I know this is fundamentally a problem with the popular readline C libs
out there. Wish they had an nreadline like the sprintf and snprintf
additional function. Sure I could readchar() or readbytes() watching each
read for a newline, but that is just unfun. Have I overlooked something
obvious in my search? Hoping to not have to write my own safe/buffered IO
layer like I've had to do with other langs.

If there is enough interest, maybe I'll hack a readmaxline() method into
an IO patch to submit. Actually, on second thought, how about adding a
second parameter:

ios.readline(sep_string=$/,maxlen=nil)

The tough question would then be whether to raise an LineTooLong exception
or just return what could be read of the line up to that point.

Thanks,

Rob

William James · Dec 28, 2006

Rob said:
Looking for a combination of readpartial() and readline() in order to
safely read a line of maxlen. IO.readline() appears to suffer from lacking
this maximum allowing DOS against servers reading HTTP headers, for
example, just by using sock.readline() alone, which is what
Net::HTTPResponse and others do:

1974 class << HTTPResponse
1975 def read_new(sock) #:nodoc: internal use only 1976 httpv,
code, msg = read_status_line(sock) 1977 res =
response_class(code).new(httpv, code, msg) 1978
each_response_header(sock) do |k,v| 1979 res.add_field k, v
1980 end
1981 res
1982 end
1983
1984 private
1985
1986 def read_status_line(sock) 1987 str = sock.readline
1988 m =
/\AHTTP(?:\/(\d+\.\d+))?\s+(\d\d\d)\s*(.*)\z/in.match(str) o r
1989 raise HTTPBadResponse, "wrong status line: #{str.dump}"
1990 m.captures
1991 end

I know this is fundamentally a problem with the popular readline C libs
out there. Wish they had an nreadline like the sprintf and snprintf
additional function. Sure I could readchar() or readbytes() watching each
read for a newline, but that is just unfun. Have I overlooked something
obvious in my search? Hoping to not have to write my own safe/buffered IO
layer like I've had to do with other langs.

If there is enough interest, maybe I'll hack a readmaxline() method into
an IO patch to submit. Actually, on second thought, how about adding a
second parameter:

ios.readline(sep_string=$/,maxlen=nil)

The tough question would then be whether to raise an LineTooLong exception
or just return what could be read of the line up to that point.

Thanks,

Rob

class File
def safe_readline(sep_string=$/,maxlen=nil)
buf_size = 1024
line = ""
while !self.eof?
s = read( [ buf_size, maxlen - line.size ].min )
line << s
if i = line.index( sep_string )
line = line[0,i+sep_string.size]
return [ line, true ]
end
return [ line[0,maxlen], false ] if maxlen &&
line.size >= maxlen
end
[ line, true ]
end
end

open('junk'){|h| p h.safe_readline("\n",9) }

Rob Muhlestein · Dec 28, 2006

Rob said:
Rob said:

Looking for a combination of readpartial() and readline() in order to
safely read a line of maxlen. IO.readline() appears to suffer from
lacking this maximum allowing DOS against servers reading HTTP headers,
for example, just by using sock.readline() alone, which is what
Net::HTTPResponse and others do:

1974 class << HTTPResponse
1975 def read_new(sock) #:nodoc: internal use only 1976
httpv, code, msg = read_status_line(sock) 1977 res =
response_class(code).new(httpv, code, msg) 1978
each_response_header(sock) do |k,v| 1979 res.add_field k, v
1980 end
1981 res
1982 end
1983
1984 private
1985
1986 def read_status_line(sock) 1987 str =
sock.readline 1988 m =
/\AHTTP(?:\/(\d+\.\d+))?\s+(\d\d\d)\s*(.*)\z/in.match(str) o r
1989 raise HTTPBadResponse, "wrong status line:
#{str.dump}" 1990 m.captures
1991 end

I know this is fundamentally a problem with the popular readline C libs
out there. Wish they had an nreadline like the sprintf and snprintf
additional function. Sure I could readchar() or readbytes() watching
each read for a newline, but that is just unfun. Have I overlooked
something obvious in my search? Hoping to not have to write my own
safe/buffered IO layer like I've had to do with other langs.

If there is enough interest, maybe I'll hack a readmaxline() method into
an IO patch to submit. Actually, on second thought, how about adding a
second parameter:

ios.readline(sep_string=$/,maxlen=nil)

The tough question would then be whether to raise an LineTooLong
exception or just return what could be read of the line up to that
point.

Thanks,

Rob

Click to expand...

class File
def safe_readline(sep_string=$/,maxlen=nil)
buf_size = 1024
line = ""
while !self.eof?
s = read( [ buf_size, maxlen - line.size ].min ) line << s
if i = line.index( sep_string )
line = line[0,i+sep_string.size]
return [ line, true ]
end
return [ line[0,maxlen], false ] if maxlen &&
line.size >= maxlen
end
[ line, true ]
end
end

open('junk'){|h| p h.safe_readline("\n",9) }

Thanks for the attempt, this is very close to the reader-flavor of
classes I've had to write in other langs. But it doesn't play nice
dealing with non-seekable streams like sockets, which can cause some
nasty IO socket blocking (or the nonblocking socket read run around).
For example:

Header1: something
Header2: else, followed by blank line
Content-Length: 55 (or whatever)

Here is the data portion that could be binary or text.

If I called h.safe_readline("\r\n",1024) and was working with an IO
socket there wouldn't be anything left by the time I want to read
the data. That is, if I ever had a chance to try since the read would
block my proggy from doing anything.

One solution is to create a Reader class and maintain a buffer which
keeps the extra overrun available. I was hoping at the binary level
during the byte read that readline does that it could keep a counter of
the number of bytes read as it reads them and throw or whatever when
maxlen exceeded. Doing a readbyte from most languages at the script/lang
level is usually way to costly, it would probably be too costly to weigh
down read() with a count and check for every byte, but perhaps not in
another safe_readline() native extension.

I do love that Ruby let's me add to IO itself, which handles adding
@prev_read buffer to store the overrun in for next read and is what
I'll do and post for review.

Thanks again,

Robert Klemme · Dec 29, 2006

I do love that Ruby let's me add to IO itself, which handles adding
@prev_read buffer to store the overrun in for next read and is what
I'll do and post for review.

I would also consider the alternative of creating a SafeLineReader that
can be stacked on *any* IO object (or rather any object that implements
#read with the proper protocol) as input stream much the same way Java
streams can be stacked. You gain flexibility but of course you loose
the easy "I just need to open a file and it works". I do not have a
clear bias either way because I love the simplicity of Ruby's IO classes
but on the other hand you can also stuff too much into a single class.

Yet another alternative is to put your code into a Module which you can
mix in to any IO object. You can still decide to to that on a general
basis for class IO and StringIO.

Just my 0.02EUR...

Kind regards

robert

Rob Muhlestein · Dec 29, 2006

I would also consider the alternative of creating a SafeLineReader that
can be stacked on *any* IO object (or rather any object that implements
#read with the proper protocol) as input stream much the same way Java
streams can be stacked. You gain flexibility but of course you loose the
easy "I just need to open a file and it works". I do not have a clear
bias either way because I love the simplicity of Ruby's IO classes but on
the other hand you can also stuff too much into a single class.

Glad to read your post. Couldn't sleep last night thinking through
this. Then this morning found that vulnerability in WEBrick based
on unsafe readlines (actually gets, but same deal). [see WEBrick DOS
Security Flaw thread]

The more I got into adding to IO the more I kept coming back to stack
design that you mention and that I'm familiar with. Mostly there are
a lot of methods in the IO and StringIO classes that would all have to
have equivalents that support the prev_read buffer. Also we face the
default duplex nature of IO and this is only for reads. So I do really
think stacking (the reader) stream is the way to go--especially with
support for things like '<<' in Ruby, which I love.

Maybe we could call it SafeReader so we can have its readline calling
a read_until (like I've done in other langs). In my few remaining
vacation hours, I'll try to write up a spec, perhaps in Rspec and post
for comment.

I'm still somewhat new to this Ruby stuff, but think I can get a gem
started on RubyForge for it, it would be my first. Then if there is enough
demand, we can implement one in C. By the way, anyone know of any gem
naming conventions to distinguish a pure Ruby gem from C extension
version, like Perl's XS suffix convention?

Also on a related topic, is there a convention or other reliable way to
avoid gem library directory and file name collisions? I did notice, for
example, webrick follows the webrick.rb and webrick/ convention as does
rspec and others. Are these names somehow registered publicly to avoid gem
collisions with others? Perhaps Brian or Ryan would care to set us newbies
straight on that. [I did do quite a bit of homework reading
http://rubygems.org particularly section 8. Distributing Gems]

Rob Muhlestein · Dec 29, 2006

Maybe we could call it SafeReader so we can have its readline calling a
read_until (like I've done in other langs). In my few remaining vacation
hours, I'll try to write up a spec, perhaps in Rspec and post for
comment.

Humm, from this last recent ruby CVS commit just today makes me wonder if
matz is listening to this thread. If so, matz, thanks for getting me giddy
about programming again, and thanks for taking a shot at addressing this!

matz 2006-12-30 04:21:50 +0900 (Sat, 30 Dec 2006)

New Revision: 11428

Modified files:
trunk/ChangeLog
trunk/ext/stringio/stringio.c
trunk/io.c
trunk/version.h

Log:
* ext/stringio/stringio.c (strio_gets): accepts limit argument.

* ext/stringio/stringio.c (strio_readline, strio_each,
strio_readlines): ditto.

* ext/stringio/stringio.c (strio_getline): add limit capability.

* io.c (rb_io_gets_m): accepts limit argument. [ruby-talk:231563]

* io.c (rb_io_readline, rb_io_readlines, rb_io_each_line,
argf_getline):
ditto.

* io.c (appendline): add limit capability.

* io.c (rb_io_getline_fast, rb_io_getline): ditto.

* io.c (rb_io_getline): small refactoring for DRY.

* io.c (rb_io_s_foreach, rb_io_s_readlines): small refactoring.

unsafe readline(), anything better?

Rob Muhlestein

William James

Rob Muhlestein

Robert Klemme

Rob Muhlestein

Rob Muhlestein

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads