Writing UNIX 'wc' program

Hey all,

I've been spending the last week learning Ruby. Prior to that, I had
spent some time learning Python. For various reasons, it looks like I'm
gravitating more to Ruby.

That being said, I decided to write a small program similiar to the UNIX
'wc' program.

Right now, it's very stripped down being that it only accepts input data
from STDIN if not a tty and I also haven't yet implemented the command
line arguments.

For the context of this post, I only included the logic that gets/prints
the word count and gets/prints the length of the longest line since
these are the ones I have questions about. Here it is:


----- Beginning of Program -----

#!/bin/env ruby

# Read input from stdin only if not a tty. The only reason I gave such a
# constraint here was just to see that I could do it. It's one of the
# first things I do in learning a new language
if not STDIN.tty?
data = STDIN.read
end

exit if not data


# PRINT THE WORD COUNT

# I'm wondering if there's an easier way to do this. It would
# be nice of the String::count method accepted regex patterns
# and not just strings.

# As it stands, this method creates a seperate array of words
# for which I get the count of. I would've rather done this
# without the extra overhead but I guess it's no big deal: it works!
printf("Word Count: %d\n", data.split(/\s/).length)


# GET THE LENGTH OF THE LONGEST LINE

# If there's a more elegant solution than what I have below, I'm all
# ears
line_length = 0
data.split(/\n/).each do |line|
line_length = line.length if line_length < line.length
end

printf("Longest Line Length: %d\n", line_length)

----- End of Program -----


My question here isn't correctness as much as elegance. I'm fairly sure
the solutions I've provided are correct (maybe); I'm just wondering if
anyone has a better solution.

Thanks,
Keith P. Boruff
 
M

Mikael Brockman

@*(&SPAM&)*optonline.net said:
Hey all,

I've been spending the last week learning Ruby. Prior to that, I had
spent some time learning Python. For various reasons, it looks like
I'm gravitating more to Ruby.

That being said, I decided to write a small program similiar to the
UNIX 'wc' program.

Right now, it's very stripped down being that it only accepts input
data from STDIN if not a tty and I also haven't yet implemented the
command line arguments.

For the context of this post, I only included the logic that
gets/prints the word count and gets/prints the length of the longest
line since these are the ones I have questions about. Here it is:

[...]

# I'm wondering if there's an easier way to do this. It would
# be nice of the String::count method accepted regex patterns
# and not just strings.

# As it stands, this method creates a seperate array of words
# for which I get the count of. I would've rather done this
# without the extra overhead but I guess it's no big deal: it works!
printf("Word Count: %d\n", data.split(/\s/).length)

How about something like this?

| class String
| def match_count (pattern)
| count = 0
| scan (pattern) { count = count + 1 }
| return count
| end
|
| def word_count
| match_count /\w+/
| end
| end
|
| puts "Word Count: #{data.word_count}"

Or put the methods outside of String if you prefer.
# GET THE LENGTH OF THE LONGEST LINE

# If there's a more elegant solution than what I have below, I'm all
# ears
line_length = 0
data.split(/\n/).each do |line|
line_length = line.length if line_length < line.length
end

I'd write something like

| maximum_length = 0
| data.each_line do |line|
| if line.length > maximum_length then
| maximum_length = line.length
| end
| end

which doesn't need to keep all lines in an array.


mikael
 
G

Gavin Sinclair

A couple of tips.

* ARGF is your friend when it comes to input; it is a virtual file
that gets all input from named files or all from STDIN.

* 'unless' is a substitute for 'if not'.

* 'puts' is worth knowing about (I use 'printf' 1% of the time):

puts "Word count: " + data.split(/\s/).length
puts "Longest Line Length: #{line_length}"

* As far as elegance goes, see how you like this:

longest_length = data.split(/\n/).map { |l| l.length }.max

The way you've done it will perform better though, I expect.

Cheers,
Gavin


----- Beginning of Program -----
#!/bin/env ruby
# Read input from stdin only if not a tty. The only reason I gave such a
# constraint here was just to see that I could do it. It's one of the
# first things I do in learning a new language
if not STDIN.tty?
data = STDIN.read
end
exit if not data

# PRINT THE WORD COUNT
# I'm wondering if there's an easier way to do this. It would
# be nice of the String::count method accepted regex patterns
# and not just strings.
# As it stands, this method creates a seperate array of words
# for which I get the count of. I would've rather done this
# without the extra overhead but I guess it's no big deal: it works!
printf("Word Count: %d\n", data.split(/\s/).length)

# GET THE LENGTH OF THE LONGEST LINE
# If there's a more elegant solution than what I have below, I'm all
# ears
line_length = 0
data.split(/\n/).each do |line|
line_length = line.length if line_length < line.length
end
 
G

gabriele renzi

il Sun, 27 Jun 2004 09:37:24 +0900, Gavin Sinclair
* As far as elegance goes, see how you like this:

longest_length = data.split(/\n/).map { |l| l.length }.max

The way you've done it will perform better though, I expect.

maybe even:
longest= data.split(/\n/).sort_by{ |l| l.length }.last

to get the line instead of the number of elements (I believe you're
keeping the array in memory to handle the lines more then one time)
 
F

Florian Gross

gabriele said:
longest= data.split(/\n/).sort_by{ |l| l.length }.last

to get the line instead of the number of elements (I believe you're
keeping the array in memory to handle the lines more then one time)

For something like this it would be great to have a .max_by built-in.

Are there any good reasons for not having it? I might write a RCR for it
soon.

Regards,
Florian Gross
 

Mikael said:
How about something like this?

| class String
| def match_count (pattern)
| count = 0
| scan (pattern) { count = count + 1 }
| return count
| end
|
| def word_count
| match_count /\w+/
| end
| end
|
| puts "Word Count: #{data.word_count}"

Or put the methods outside of String if you prefer.

This is good. I'll give it a shot.
I'd write something like

| maximum_length = 0
| data.each_line do |line|
| if line.length > maximum_length then
| maximum_length = line.length
| end
| end

which doesn't need to keep all lines in an array.

This is good too. However, this solution keeps the newline at the end of
each line in the iteration so my longest line length of my test data is
one more. In keeping with the actual wc program, it doesn't seem to
count the new line. To add to your solution to fix this, I did this:

maximum_length = 0
data.each_line do |line|

line.chomp!

if line.length > maximum_length then
maximum_length = line.length
end
end

Keith Boruff
 
G

George Ogata

Florian Gross said:
For something like this it would be great to have a .max_by built-in.

Are there any good reasons for not having it? I might write a RCR for
it soon.

#max, like #sort, takes a block. That good enough, isn't it?
 
R

Richard Lionheart

Hi,

I like the following because:
1. It doesn't store the input in an array
2. As a toy program, it's easier to provide the data in-line rather than
through $stdin
3. It's quite succinct, IMHO

max_len = wd_cnt = 0
DATA.each_line do |line|
line.chomp!
max_len = line.length > max_len ? line.length : max_len
# Note: pattern recognizes contractions (embedded apostrophe)
wd_cnt += line.scan(/\w+'?(\w+)?/).size
end
puts "Max. len. = #{max_len}"
puts "Wd. count = #{wd_cnt}"

# Yogi'isms
__END__
When asked about his philosophy of life, he replied: "When you reach a fork
in the road, take it!"
When Yogi was told that Dublin, Ireland elected a Jewish mayor, he
excalimed: "only in America!"
When asked "What time is it?", Yogi inquired: "You mean now?"
"No one goes to that restaurant any more: it's too crowded!"

HTH,
Richard
 
K

Kristof Bastiaensen

#!/bin/env ruby

# Read input from stdin only if not a tty. The only reason I gave such a
# constraint here was just to see that I could do it. It's one of the
# first things I do in learning a new language
if not STDIN.tty?
data = STDIN.read
end

exit if not data


# PRINT THE WORD COUNT

# I'm wondering if there's an easier way to do this. It would
# be nice of the String::count method accepted regex patterns
# and not just strings.

# As it stands, this method creates a seperate array of words
# for which I get the count of. I would've rather done this
# without the extra overhead but I guess it's no big deal: it works!
printf("Word Count: %d\n", data.split(/\s/).length)


# GET THE LENGTH OF THE LONGEST LINE

# If there's a more elegant solution than what I have below, I'm all
# ears
line_length = 0
data.split(/\n/).each do |line|
line_length = line.length if line_length < line.length
end

printf("Longest Line Length: %d\n", line_length)

----- End of Program -----


My question here isn't correctness as much as elegance. I'm fairly sure
the solutions I've provided are correct (maybe); I'm just wondering if
anyone has a better solution.

Thanks,
Keith P. Boruff

Hi,

If you want you could get rid of the loop using inject:

line_length = data.inject(0){ |m, l|
m = (l.length > m ? l.length : m) } - 1

#the - 1 is for the extra newline character.

I think however that for large files it may not be so
efficient (since the whole file has to be loaded in memory).

You could put everything in a loop for STDIN.each:

# your initialization code ...
line_length = 0
wc = 0
STDIN.each do |l|
wc += l.split.length
line_length = l.length if l.length > line_length
end
line_length -= 1

#show the result

Regards,
Kristof Bastiaensen
 
F

Florian Gross

George said:
#max, like #sort, takes a block. That good enough, isn't it?

It's not as comfortable as a #max_by that would use a Schwartzian
transform IMHO.
 
D

Daniel Berger

@*(&SPAM&)*optonline.net said:
Hey all,

I've been spending the last week learning Ruby. Prior to that, I had
spent some time learning Python. For various reasons, it looks like I'm
gravitating more to Ruby.

That being said, I decided to write a small program similiar to the UNIX
'wc' program.

Right now, it's very stripped down being that it only accepts input data
from STDIN if not a tty and I also haven't yet implemented the command
line arguments.

For the context of this post, I only included the logic that gets/prints
the word count and gets/prints the length of the longest line since
these are the ones I have questions about. Here it is:


----- Beginning of Program -----

#!/bin/env ruby

# Read input from stdin only if not a tty. The only reason I gave such a
# constraint here was just to see that I could do it. It's one of the
# first things I do in learning a new language
if not STDIN.tty?
data = STDIN.read
end

exit if not data


# PRINT THE WORD COUNT

# I'm wondering if there's an easier way to do this. It would
# be nice of the String::count method accepted regex patterns
# and not just strings.

# As it stands, this method creates a seperate array of words
# for which I get the count of. I would've rather done this
# without the extra overhead but I guess it's no big deal: it works!
printf("Word Count: %d\n", data.split(/\s/).length)


# GET THE LENGTH OF THE LONGEST LINE

# If there's a more elegant solution than what I have below, I'm all
# ears
line_length = 0
data.split(/\n/).each do |line|
line_length = line.length if line_length < line.length
end

printf("Longest Line Length: %d\n", line_length)

----- End of Program -----


My question here isn't correctness as much as elegance. I'm fairly sure
the solutions I've provided are correct (maybe); I'm just wondering if
anyone has a better solution.

Thanks,
Keith P. Boruff

For File.wc see "ptools", available on the RAA.

Regards,

Dan
 
M

Martin DeMello

Florian Gross said:
It's not as comfortable as a #max_by that would use a Schwartzian
transform IMHO.

It's irrelevant - max only traverses the list once anyway, so each list
element is preprocessed only once. The Schwartzian transform wouldn't
make any difference.

martin
 
M

Martin DeMello

Florian Gross said:
It's not as comfortable as a #max_by that would use a Schwartzian
transform IMHO.

Oops - ignore my other reply. This would indeed be syntactically neater.

martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,147
Messages
2,570,833
Members
47,378
Latest member
BlakeLig

Latest Threads

Top