Sorting a logfile, how would you write it?

Frank Meyer · Aug 10, 2007

I've written a little ruby program which can sort logfiles with the
following format:

4.text text text
1.text text text
2.text text text
10.text text text
2.text2 text2 text2

The file is given as a command line parameter and after sorting the
entries it writes them back into this file.

The program is in the attachement.

What I want to know is how would you write such a tool in ruby? I'm
asking this because I'm still learning ruby and I want to learn how to
do it in ruby (ans its design principles).

Thank you!

Turing

Attachments:
http://www.ruby-forum.com/attachment/86/test.rb

William James · Aug 10, 2007

I've written a little ruby program which can sort logfiles with the
following format:

4.text text text
1.text text text
2.text text text
10.text text text
2.text2 text2 text2

The file is given as a command line parameter and after sorting the
entries it writes them back into this file.

The program is in the attachement.

What I want to know is how would you write such a tool in ruby? I'm
asking this because I'm still learning ruby and I want to learn how to
do it in ruby (ans its design principles).

Thank you!

Turing

Attachments:http://www.ruby-forum.com/attachment/86/test.rb

File.open( ARGV.first, "r+" ){|file|
array = file.readlines
file.rewind
file.truncate(0)
file.puts array.sort_by{|s| s[/^\d+/].to_i }
}

Ryan Davis · Aug 11, 2007

I've written a little ruby program which can sort logfiles with the
following format:

4.text text text
1.text text text
2.text text text
10.text text text
2.text2 text2 text2

Click to expand...

...
File.open( ARGV.first, "r+" ){|file|
array = file.readlines
file.rewind
file.truncate(0)
file.puts array.sort_by{|s| s[/^\d+/].to_i }
}

your version takes a lot of memory, is slow, and doesn't properly
sort the content of the line, just the number. swap the two "2."
lines and you'll see what I mean. Using the right tool for the job
(`sort`) does wonders:

% ruby -e 'n = 1_000_000; File.open("blah.txt", "w") { |f| n.times
{ m = rand 5; f.puts "#{rand n}. file#{m} file#{m} file#{m}" } }'
% cp blah.txt blah2.txt
% time ruby -e 'File.open( ARGV.first, "r+" ) { |file| array =
file.readlines; file.rewind; file.truncate(0); file.puts array.sort_by
{|s| s[/^\d+/].to_i } }' blah.txt
real 0m8.182s ...
% time ruby -e 'path = ARGV.shift; system %(sort -n "#{path}" > "#
{path}.tmp"); File.rename "#{path}.tmp", path' blah2.txt
real 0m3.175s ...
% cmp blah.txt blah2.txt
blah.txt blah2.txt differ: char 50, line 3
% head blah.txt blah2.txt
==> blah.txt <==
3. file4 file4 file4
4. file4 file4 file4
6. file3 file3 file3
6. file1 file1 file1
6. file0 file0 file0
7. file0 file0 file0
7. file4 file4 file4
8. file1 file1 file1
8. file3 file3 file3
8. file3 file3 file3

==> blah2.txt <==
3. file4 file4 file4
4. file4 file4 file4
6. file0 file0 file0
6. file1 file1 file1
6. file3 file3 file3
7. file0 file0 file0
7. file4 file4 file4
8. file1 file1 file1
8. file3 file3 file3
8. file3 file3 file3
532 %

Robert Klemme · Aug 11, 2007

I've written a little ruby program which can sort logfiles with the
following format:

4.text text text
1.text text text
2.text text text
10.text text text
2.text2 text2 text2

Click to expand...

...
File.open( ARGV.first, "r+" ){|file|
array = file.readlines
file.rewind
file.truncate(0)
file.puts array.sort_by{|s| s[/^\d+/].to_i }
}

Click to expand...

your version takes a lot of memory, is slow, and doesn't properly sort
the content of the line, just the number. swap the two "2." lines and
you'll see what I mean. Using the right tool for the job (`sort`) does
wonders:

% ruby -e 'n = 1_000_000; File.open("blah.txt", "w") { |f| n.times { m =
rand 5; f.puts "#{rand n}. file#{m} file#{m} file#{m}" } }'
% cp blah.txt blah2.txt
% time ruby -e 'File.open( ARGV.first, "r+" ) { |file| array =
file.readlines; file.rewind; file.truncate(0); file.puts
array.sort_by{|s| s[/^\d+/].to_i } }' blah.txt
real 0m8.182s ...
% time ruby -e 'path = ARGV.shift; system %(sort -n "#{path}" >
"#{path}.tmp"); File.rename "#{path}.tmp", path' blah2.txt
real 0m3.175s ...
% cmp blah.txt blah2.txt
blah.txt blah2.txt differ: char 50, line 3
% head blah.txt blah2.txt
==> blah.txt <==
3. file4 file4 file4
4. file4 file4 file4
6. file3 file3 file3
6. file1 file1 file1
6. file0 file0 file0
7. file0 file0 file0
7. file4 file4 file4
8. file1 file1 file1
8. file3 file3 file3
8. file3 file3 file3

==> blah2.txt <==
3. file4 file4 file4
4. file4 file4 file4
6. file0 file0 file0
6. file1 file1 file1
6. file3 file3 file3
7. file0 file0 file0
7. file4 file4 file4
8. file1 file1 file1
8. file3 file3 file3
8. file3 file3 file3
532 %

It's a one liner:

ruby -i.bak -e 'puts ARGF.readlines.sort_by {|l| l[/^\d+/].to_i}' file

Less memory usage:

ruby -i.bak -e 'puts ARGF.readlines.sort! {|a,b| a[/^\d+/].to_i <=>
b[/^\d+/].to_i}' file

Kind regards

robert

William James · Aug 11, 2007

I've written a little ruby program which can sort logfiles with the
following format:
4.text text text
1.text text text
2.text text text
10.text text text
2.text2 text2 text2

Click to expand...

...
File.open( ARGV.first, "r+" ){|file|
array = file.readlines
file.rewind
file.truncate(0)
file.puts array.sort_by{|s| s[/^\d+/].to_i }
}

Click to expand...

your version takes a lot of memory,

Wrong.

When the number of lines to sort is small,
it uses a small amount of memory.
When the number of lines to sort is medium,
it uses a medium amount of memory.
When the number of lines to sort is large,
it uses a large amount of memory.

is slow,

Everything is relative. If its speed is compared to the
speed of other versions written in scripting languages, it
is not slow.

and doesn't properly
sort the content of the line,

Wrong.

Looking at the source code of the original poster immediately
reveals that he wants to sort only on the number at the
beginning of the line.

just the number. swap the two "2."
lines and you'll see what I mean. Using the right tool for the job
(`sort`) does wonders:

% ruby -e 'n = 1_000_000; File.open("blah.txt", "w") { |f| n.times
{ m = rand 5; f.puts "#{rand n}. file#{m} file#{m} file#{m}" } }'
% cp blah.txt blah2.txt
% time ruby -e 'File.open( ARGV.first, "r+" ) { |file| array =
file.readlines; file.rewind; file.truncate(0); file.puts array.sort_by
{|s| s[/^\d+/].to_i } }' blah.txt
real 0m8.182s ...
% time ruby -e 'path = ARGV.shift; system %(sort -n "#{path}" > "#
{path}.tmp"); File.rename "#{path}.tmp", path' blah2.txt

Wrong.

The original poster stated:

The file is given as a command line parameter and after sorting the
entries it writes them back into this file.

Your code makes no attempt to write to the original file; it uses
a temporary file.

Furthermore, your solution won't even run:

E:\Ruby>ruby -e 'path = ARGV.shift; system %(sort -n "#{path}"

"#{path}.tmp"); File.rename "#{path}.tmp", path' data

-e:1: unterminated string meets end of file

If your code is put in a file ...

E:\Ruby>ruby try.rb data
Input file specified two times.

.... it still won't work.

Perhaps your attempt at a solution requires Unix, and you,
in your ignorance, or your thoughtlessness, or your
ignorance and your thoughtlessness, assumed that every
user of Ruby is a user of Unix.

William James · Aug 11, 2007

On Aug 10, 2007, at 13:54 , William James wrote:

I've written a little ruby program which can sort logfiles with the
following format:
4.text text text
1.text text text
2.text text text
10.text text text
2.text2 text2 text2
...
File.open( ARGV.first, "r+" ){|file|
array = file.readlines
file.rewind
file.truncate(0)
file.puts array.sort_by{|s| s[/^\d+/].to_i }
}

Click to expand...

Click to expand...

your version takes a lot of memory, is slow, and doesn't properly sort
the content of the line, just the number. swap the two "2." lines and
you'll see what I mean. Using the right tool for the job (`sort`) does
wonders:

Click to expand...

% ruby -e 'n = 1_000_000; File.open("blah.txt", "w") { |f| n.times { m =
rand 5; f.puts "#{rand n}. file#{m} file#{m} file#{m}" } }'
% cp blah.txt blah2.txt
% time ruby -e 'File.open( ARGV.first, "r+" ) { |file| array =
file.readlines; file.rewind; file.truncate(0); file.puts
array.sort_by{|s| s[/^\d+/].to_i } }' blah.txt
real 0m8.182s ...
% time ruby -e 'path = ARGV.shift; system %(sort -n "#{path}" >
"#{path}.tmp"); File.rename "#{path}.tmp", path' blah2.txt
real 0m3.175s ...
% cmp blah.txt blah2.txt
blah.txt blah2.txt differ: char 50, line 3
% head blah.txt blah2.txt
==> blah.txt <==
3. file4 file4 file4
4. file4 file4 file4
6. file3 file3 file3
6. file1 file1 file1
6. file0 file0 file0
7. file0 file0 file0
7. file4 file4 file4
8. file1 file1 file1
8. file3 file3 file3
8. file3 file3 file3

Click to expand...

==> blah2.txt <==
3. file4 file4 file4
4. file4 file4 file4
6. file0 file0 file0
6. file1 file1 file1
6. file3 file3 file3
7. file0 file0 file0
7. file4 file4 file4
8. file1 file1 file1
8. file3 file3 file3
8. file3 file3 file3
532 %

Click to expand...

It's a one liner:

ruby -i.bak -e 'puts ARGF.readlines.sort_by {|l| l[/^\d+/].to_i}' file

It's my understanding that when you use -i, a temporary file
is created, the original file is deleted, and the temporary
file is renamed. Doesn't this cause unnecessary disk
fragmentation?

Less memory usage:

ruby -i.bak -e 'puts ARGF.readlines.sort! {|a,b| a[/^\d+/].to_i <=>
b[/^\d+/].to_i}' file

Of course, you're trading speed for memory.

Eric Hodel · Aug 12, 2007

File.open( ARGV.first, "r+" ){|file|
array = file.readlines
file.rewind
file.truncate(0)
file.puts array.sort_by{|s| s[/^\d+/].to_i }
}

Click to expand...

your version takes a lot of memory,

Click to expand...

Wrong.

This method uses at least 2x the file size worth of memory. That's a
lot.

E:\Ruby>ruby -e 'path = ARGV.shift; system %(sort -n "#{path}"
-e:1: unterminated string meets end of file

I just ran it, it worked fine.

You'll probably have to redo the quoting for a non-bourne-compatible
shell.

Perhaps your attempt at a solution requires Unix, and you,
in your ignorance, or your thoughtlessness, or your
ignorance and your thoughtlessness, assumed that every
user of Ruby is a user of Unix.

Please try to flame harder. This one just made me chuckle.

Eric Hodel · Aug 12, 2007

It's a one liner:

ruby -i.bak -e 'puts ARGF.readlines.sort_by {|l| l[/^\d+/].to_i}'
file

Click to expand...

It's my understanding that when you use -i, a temporary file
is created, the original file is deleted, and the temporary
file is renamed. Doesn't this cause unnecessary disk
fragmentation?

If I had a filesystem where I had to worry about fragmentation I
wouldn't care. The amount of time spent figuring out some best way
to "fix" it is going to be less than the time running a defragmenter
will take.

Gregory Brown · Aug 12, 2007

Please try to flame harder. This one just made me chuckle.

Me too. Besides, `sort` is still the right tool

http://www.mingw.org/msys.shtml

William James · Aug 12, 2007

To do this safely you'll need a temporary file.
Slurping a file into memory, sorting it, then writing it back to the same
file is an unsound practice, i.e. not "rerunnable-safe". Suppose, for
example, you suffer a power failure half-way through writing back the file,
or the write fails due to "disk full" or "user disk quota exceeded" or for
any other reason. Oops, you've just corrupted your input file.

Of course. But I'm willing to take that miniscule chance when
I'm doing a write to a small file that takes a fraction of a
second.

The question remains: doesn't using a temp file cause more
disk fragmentation than writing directly to the original file?

Robert Klemme · Aug 12, 2007

On Aug 10, 2007, at 13:54 , William James wrote:
I've written a little ruby program which can sort logfiles with the
following format:
4.text text text
1.text text text
2.text text text
10.text text text
2.text2 text2 text2
...
File.open( ARGV.first, "r+" ){|file|
array = file.readlines
file.rewind
file.truncate(0)
file.puts array.sort_by{|s| s[/^\d+/].to_i }
}
your version takes a lot of memory, is slow, and doesn't properly sort
the content of the line, just the number. swap the two "2." lines and
you'll see what I mean. Using the right tool for the job (`sort`) does
wonders:
% ruby -e 'n = 1_000_000; File.open("blah.txt", "w") { |f| n.times { m =
rand 5; f.puts "#{rand n}. file#{m} file#{m} file#{m}" } }'
% cp blah.txt blah2.txt
% time ruby -e 'File.open( ARGV.first, "r+" ) { |file| array =
file.readlines; file.rewind; file.truncate(0); file.puts
array.sort_by{|s| s[/^\d+/].to_i } }' blah.txt
real 0m8.182s ...
% time ruby -e 'path = ARGV.shift; system %(sort -n "#{path}" >
"#{path}.tmp"); File.rename "#{path}.tmp", path' blah2.txt
real 0m3.175s ...
% cmp blah.txt blah2.txt
blah.txt blah2.txt differ: char 50, line 3
% head blah.txt blah2.txt
==> blah.txt <==
3. file4 file4 file4
4. file4 file4 file4
6. file3 file3 file3
6. file1 file1 file1
6. file0 file0 file0
7. file0 file0 file0
7. file4 file4 file4
8. file1 file1 file1
8. file3 file3 file3
8. file3 file3 file3
==> blah2.txt <==
3. file4 file4 file4
4. file4 file4 file4
6. file0 file0 file0
6. file1 file1 file1
6. file3 file3 file3
7. file0 file0 file0
7. file4 file4 file4
8. file1 file1 file1
8. file3 file3 file3
8. file3 file3 file3
532 %

Click to expand...

It's a one liner:

ruby -i.bak -e 'puts ARGF.readlines.sort_by {|l| l[/^\d+/].to_i}' file

Click to expand...

It's my understanding that when you use -i, a temporary file
is created, the original file is deleted, and the temporary
file is renamed.
Correct.

Doesn't this cause unnecessary disk
fragmentation?

Huh? Are you still on MS DOS? I haven't heard someone worry about disk
fragmentation in ages. I don't think that this is an issue for any
modern file system.

Less memory usage:

ruby -i.bak -e 'puts ARGF.readlines.sort! {|a,b| a[/^\d+/].to_i <=>
b[/^\d+/].to_i}' file

Click to expand...

Of course, you're trading speed for memory.

Where exactly do you see that trade off? I was trading elegance for
memory. Sure there are effects, that could make one or the other
solution faster but if I would be really worrying about speed then I'd
use "sort" anyway.

Kind regards

robert

Frank Meyer · Aug 12, 2007

Thanks for all your suggestions, it helped me a lot to learn more about
Ruby's library. I didn't know that there are so many handy functions

And about the temporary file, I'm using it only for private purposes and
I didn't want to bother with creating a temporary file in my first
attempt to write a ruby program which can sort these log files.

Thank you all!

Turing

Gregory Brown · Aug 12, 2007

--- William James said:
--- William James said:

Of course. But I'm willing to take that miniscule chance when
I'm doing a write to a small file that takes a fraction of a
second.

Click to expand...

That may be an acceptable risk for a program written for private use.
Not so for a production program. After all, impatient users often
press [CTRL-C] in my experience, and that could cause corruption
if it occurred while the file was being rewritten.

You can of course capture that, but you're write that it's creating
additional unnecessary work.

How would you write it vs Smalltalk	7	Sep 22, 2007
sorting songs to directories	4	Jan 20, 2011
how would you...?	5	May 17, 2008
1st program -- how would you improve this?	9	Oct 5, 2006
Basic Question: How do you check to see if gets is a number?	6	Jul 11, 2010
Lets play a guessing game. (how to code this better?)	19	May 5, 2011
array problem with sorting - maybe easy but not for me	2	Jan 12, 2009
important qution on ruby	4	Nov 8, 2010

Sorting a logfile, how would you write it?

Frank Meyer

William James

Ryan Davis

Robert Klemme

William James

William James

Eric Hodel

Eric Hodel

Gregory Brown

William James

Robert Klemme

Frank Meyer

Gregory Brown

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads