Reading String Data as a File

D

Doug Jolley

I use Net::HTTP to collect some data as a string. I now need to pass
that string data to a Ruby method that is expecting to receive the data
from a file (i.e., the method expects the data to be stored in a file
and to have a path to the file passed to it as a parameter). Is there
anyway to resolve this dilemma short of writing the string data to a
file and then reading it in from the file?

Thanks for any input.

... doug
 
R

Ryan Davis

I use Net::HTTP to collect some data as a string. I now need to pass
that string data to a Ruby method that is expecting to receive the data
from a file (i.e., the method expects the data to be stored in a file
and to have a path to the file passed to it as a parameter). Is there
anyway to resolve this dilemma short of writing the string data to a
file and then reading it in from the file?

ri StringIO
 
B

Brian Candler

Ryan said:
ri StringIO

That will work if the code in question will accept an open File/IO
object as an argument.

If it takes only a pathname argument, then you're stuck with writing the
data to a file (ri Tempfile may help).

If you have control of the target code, then refactor it. e.g.

class Foo
# original entry point
def read_file(pathname)
File.open(pathname,"rb") { |f| read_io(f) }
end

# entry point for already-open object, e.g. STDIN, a StringIO etc.
def read_io(io)
io.each_line { ... }
end
end
 
D

Doug Jolley

If it takes only a pathname argument, then you're
stuck with writing the data to a file

Unfortunately that is precisely my case and that is precisely what I was
trying to avoid. (And, unfortunately, I don't have any control over the
target code.)

Interestingly, a post that I found seemed to say that I could use the
StringIO approach in the case where a pathname argument was required.
The post said:
Any easy way to work with a string in a method that is expecting
a file is to create a new StringIO object and pass the result to
the method requiring a file type. For example:
some_method(StringIO.new("Your string here"))

He did say, "file". It's just that usually methods that follow that
form are expecting a path. Anyway, as one might expect, it didn't work
for me. I get the following error:

/test1:5:in `read': can't convert StringIO into String (TypeError)

As Ryan says, I guess that I'm stuck to write this out to a temp file.

Thanks to all who responded to my inquiry.

... doug
 
T

Tony Arcieri

[Note: parts of this message were removed to make it a legal post.]

Unfortunately that is precisely my case and that is precisely what I was
trying to avoid. (And, unfortunately, I don't have any control over the
target code.)


It's Ruby. You can always patch or alias_method_chain the target code if
you're willing to bear some slight brittleness.
 
D

Doug Jolley

It's Ruby. You can always patch or alias_method_chain the target code
if you're willing to bear some slight brittleness.

Good point. I've been considering whether I should re-think my position
that the underlying code is inaccessible. The truth is, the block of
data that I have in memory is actually a Rails layout. I was reluctant
to mention the Rails aspects in this forum. So, I don't know if I could
ever figure out what would need to be done; but, your idea is definitely
a good one. Thanks for the input.

... doug
 
R

Ryan Davis

It's Ruby. You can always patch or alias_method_chain the target code if
you're willing to bear some slight brittleness.

That is EXACTLY what I was coming back to say... Tony beat me to it.
 
R

Robert Klemme

2010/6/29 Tony Arcieri said:
It's Ruby. =A0You can always patch or alias_method_chain the target code = if
you're willing to bear some slight brittleness.

Is this always possible? Wouldn't you need some knowledge of the
inner workings of the target code? In this case for example, does it
open the file with File.open or maybe with File.foreach?

This is an interesting point of interface design: usually it is more
convenient to just pass a file name somewhere and that method opens
the file (or URL) and reads the data. But from a modularity point of
view it is generally better to pass an open IO like instance.

You can nicely layer this e.g.

class X
# convenience method that will open the file for you
def read_file(path)
File.open path |io|
read io
end
end

# yet another convenience method
def read_url(url)
...
end

# read the data
def read(io)
io.each_line do |line|
# whatever
end
end
end

The only drawback here is the additional method needed but convenience
comes at a price. :)

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
B

Brian Candler

Robert said:
Is this always possible? Wouldn't you need some knowledge of the
inner workings of the target code? In this case for example, does it
open the file with File.open or maybe with File.foreach?

You simply find that part of the code, and replace the offending
method(s) with something else. In the limit, you replace everything with
your own code :)

It would be convenient to be able to mock out File and Dir with a
virtual, in-RAM filesystem. I'm not aware of a library which does that,
but in principle I think it could be done.
This is an interesting point of interface design: usually it is more
convenient to just pass a file name somewhere and that method opens
the file (or URL) and reads the data. But from a modularity point of
view it is generally better to pass an open IO like instance.

Definitely. The original csv.rb in ruby 1.8 got this very badly wrong.

The new (faster_csv) interface is capable of this, but it suffers from
missing documentation. IIRR you have to do something like

FasterCSV.new($stdin).each do |row|
p row
end

Since the documented "primary" interface is
FasterCSV.foreach("path/to/file.csv"), you have to dig through the code
to work out how to handle an open stream.
 
R

Robert Klemme

2010/6/30 Brian Candler said:
You simply find that part of the code, and replace the offending
method(s) with something else. In the limit, you replace everything with
your own code :)

That's what I always wanted to do - seems I have to resurrect my
WorldDomination gem. :)
It would be convenient to be able to mock out File and Dir with a
virtual, in-RAM filesystem. I'm not aware of a library which does that,
but in principle I think it could be done.

Well, /tmp is in memory on many systems and writing a small file is
also a mostly in memory operation. Of course, this is not as cheap as
doing it completely in userland but probably sufficient for many
applications (although it's not really nice). At least one can use
Tempfile for this, e.g.

Tempfile "prefix", "/tmp" do |io|
io.write everything

io.seek 0
whatever_load_routine io
end
Definitely. The original csv.rb in ruby 1.8 got this very badly wrong.

The new (faster_csv) interface is capable of this, but it suffers from
missing documentation. IIRR you have to do something like

FasterCSV.new($stdin).each do |row|
=C2=A0p row
end

Since the documented "primary" interface is
FasterCSV.foreach("path/to/file.csv"), you have to dig through the code
to work out how to handle an open stream.

Or have the idea to look at "ri CSV.new"...

Thanks for the hint. This is good to know.

Cheers

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
B

Brian Candler

Robert said:
At least one can use Tempfile for this, e.g.

Tempfile "prefix", "/tmp" do |io|
io.write everything

io.seek 0
whatever_load_routine io
end

or rather:

Tempfile.open "prefix", "/tmp" do |io|
io.write everything
io.flush
whatever_load_routine io.path
end
 
J

James Edward Gray II

=20
Definitely. The original csv.rb in ruby 1.8 got this very badly wrong.
=20
The new (faster_csv) interface is capable of this, but it suffers from=20=
missing documentation.

I agree that FasterCSV's documentation isn't perfect. I'm pretty sure =
all of its functions are documented, but you would need to read the API =
like a novel to find them. I've been trying more tutorial style =
documentation lately, but there again it's hard to reference what you =
specifically want to know.

I'm open to suggestions and I do take patches.
IIRR you have to do something like
=20
FasterCSV.new($stdin).each do |row|
p row
end

That works, yes.
Since the documented "primary" interface is=20
FasterCSV.foreach("path/to/file.csv"), you have to dig through the = code=20
to work out how to handle an open stream.

That's mostly due to a pet peeve of mine. I often see code that slurps =
when foreach() would have worked fine. That's why I try to push that as =
a first choice.

Do you think it would help if I added Wrapping an IO under the Shortcut =
Interface on this page?

http://fastercsv.rubyforge.org/classes/FasterCSV.html

James Edward Gray II
 
R

Robert Klemme

2010/6/30 Brian Candler said:
or rather:

Tempfile.open "prefix", "/tmp" do |io|
=A0io.write everything
=A0io.flush

I'd rather io.close instead of io.flush to release resources as soon
as possible.
=A0whatever_load_routine io.path
end

Ooops! Yes, of course. I copied the wrong example. Sorry for my confusio=
n.

Cheers

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
B

Brian Candler

Robert said:
I'd rather io.close instead of io.flush to release resources as soon
as possible.

But tempfile will want to close itself using the block form anyway.

In most versions of ruby, Tempfile with a block returns nil. A change
was committed so that it returns the (closed) object, but that hasn't
made it into either of the versions I have lying around here.
hello
=> nil
 
B

Brian Candler

James said:
I'm open to suggestions and I do take patches.

Specifically, I'd like to see how to parse CSV from stdin. You provide
an example in the opposite direction:

# FCSV($stderr) { |csv_err| csv_err << %w{my data here} } # to
$stderr

A bit more experimentation suggests that

FCSV($stdin).each { |a,b,c| p a,b,c }

works, so if that's a reasonable way to drive the library, I'd like to
see that mentioned under shortcuts. (I thought I'd tried that before and
it failed, but I must have done something different)
 
R

Robert Klemme

But tempfile will want to close itself using the block form anyway.

Yes, but later. This can make a difference if you are low on file
descriptors. And you do not risk weird effects by the same process
opening the file twice.
In most versions of ruby, Tempfile with a block returns nil. A change
was committed so that it returns the (closed) object, but that hasn't
made it into either of the versions I have lying around here.

hello
=> nil

The non block form obviously returns the Tempfile instance and if you
want it to be returned from the block what stops you from explicitly
returning it?

IMHO the method with block should return whatever the implementor of the
block chooses. That is far more reusable than always returning the
Tempfile. Most of the time the Tempfile instance is of no use anyway
since it is closed then.

Kind regards

robert
 
B

Brian Candler

Robert said:
The non block form obviously returns the Tempfile instance and if you
want it to be returned from the block what stops you from explicitly
returning it?

Only that it's a bit verbose:

tf = nil
Tempfile.open(...) do |io|
tf = io
...
end
puts tf.path

http://redmine.ruby-lang.org/issues/show/504
IMHO the method with block should return whatever the implementor of the
block chooses. That is far more reusable than always returning the
Tempfile.

Maybe that's what the accepted patch does - I haven't tested it. It
would be consistent with File.open { ... } if it worked that way.

Anyway, I think we're talking about minutiae. You say that one should
close the file at the earliest opportunity to "save resources", but the
only resource we're talking about is one slot in the kernel file
descriptor table, and most apps aren't going to be constrained by that.
 
R

Robert Klemme

Only that it's a bit verbose:

tf = nil
Tempfile.open(...) do |io|
tf = io
...
end
puts tf.path

No, I was talking about the other version which returns whatever the
block returns. You would do

tf = Tempfile.open(...) do |io|
...
io
end
puts tf.path

Apparently people differ in their preferences.
Maybe that's what the accepted patch does - I haven't tested it. It
would be consistent with File.open { ... } if it worked that way.

Exactly that is what the patch does:

http://redmine.ruby-lang.org/repositories/diff/ruby-19?rev=19454
Anyway, I think we're talking about minutiae. You say that one should
close the file at the earliest opportunity to "save resources", but the
only resource we're talking about is one slot in the kernel file
descriptor table, and most apps aren't going to be constrained by that.

That's true. But I also have seen issues caused by files being opened
more than once by the same process. Plus, you'll notice much faster if
you try to write to the tempfile after you thought you were done when
you close the file. If the code is more complicated these bugs can be
hard to track. How much simpler is it if you see this:

irb(main):006:0> Tempfile.open "x" do |io|
irb(main):007:1* p io
irb(main):008:1> io.puts "hello"
irb(main):009:1> io.close
irb(main):010:1> io.puts "world"
irb(main):011:1> end
#<File:C:/Users/Robert/x20100630-4456-1ixsj0i-0>
IOError: closed stream
from (irb):10:in `block in irb_binding'
from /usr/local/lib/ruby19/1.9.1/tempfile.rb:199:in `open'
from (irb):6
from /usr/local/bin/irb19:12:in `<main>'

It's probably not that big a deal but I believe such discussions bring
benefit to the community by presenting alternative solutions to a
problem along with arguments. I always like this food for thought.
Thanks for sharing your thoughts!

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,818
Latest member
Brigette36

Latest Threads

Top