My CPU Hates Me

A

Ari Brown

Pattern matching problem. This time, it doesn't print out any thing
and just soaks up my CPU. I tried slowly adding more and more for it
to do, and it worked great -- until TABLE7. Then it just soaks up my
CPU and makes me cry. At first, when nothing was printing, I added
$stdout.flush to make it print. But it didn't print! This makes me
think that it's something in the when part.

Whats going on?

Help!


lines.each do |line|
case line
when /
^"(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","
(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.
*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)
","(.*)"$/
TABLE1.puts("\"#{$1}\",\"#{$2}\",\"#{$3}\",\"#{$4}\",\"#{$5}\",
\"#{$6}\",\"#{$7}\",\"#{$8}\",\"#{$9}\""); print '-'; $stdout.flush
TABLE2.puts("\"#{$10}\",\"#{$11}\",\"#{$12}\"");
print '-'; $stdout.flush
TABLE3.puts("\"#{$13}\",\"#{$14}\",\"#{$15}\"");
print '-'; $stdout.flush
TABLE4.puts("\"#{$16}\",\"#{$17}
\""); print '-'; $stdout.flush
TABLE5.puts("\"#{$18}\",\"#{$19}
\""); print '-'; $stdout.flush
TABLE6.puts("\"#{$20}\",\"#{$21}
\""); print '-'; $stdout.flush
TABLE7.print("\"#{$22}\",\"#{$23}
\""); print'!'; $stdout.flush
TABLE7.print("\"#{$24}\",\"#{$25}
\""); print'!'; $stdout.flush
TABLE7.print("\"#{$26}\",\"#{$27}
\""); print'!'; $stdout.flush
TABLE7.print("\"#{$28}\",\"#{$29}
\""); print'!'; $stdout.flush
TABLE7.print("\"#{$30}\",\"#{$31}\",\"#{$32}\"");
print '-'; $stdout.flush
# TABLE8.puts("\"#{33}\",\"#{34}
\""); print '-'; $stdout.flush

puts;
$stdout.flush
print
'.';
$stdout.flush
else
print '$'
end
end



-------------------------------------------------------|
~ Ari
crap my sig won't fit
 
M

Michael Glaesemann

/
^"(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)"
,"(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)"
,"(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)"
,"(.*)","(.*)"$/

You don't happen to be trying to parse a CSV file by any chance? If
so, why not use FasterCSV?

Michael Glaesemann
grzm seespotcode net
 
J

Jim Clark

Ari said:
Pattern matching problem. This time, it doesn't print out any thing
and just soaks up my CPU. I tried slowly adding more and more for it
to do, and it worked great -- until TABLE7. Then it just soaks up my
CPU and makes me cry. At first, when nothing was printing, I added
$stdout.flush to make it print. But it didn't print! This makes me
think that it's something in the when part.

Whats going on?

Help!


lines.each do |line|
case line
when
/^"(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)"$/
There are several ways to optimize the regular expression but the most
important thing is to not be greedy. What I mean by this is that using
(.*) matches everything to the end of the line and then the regular
expression backtracks to find the next " character specified. It will
choose the " character closest to the end of the line but that is not
the one you want so it backtracks again and again and so on wasting CPU
cycles.

Instead of being greedy and using "(.*)", your best bet would be to use
"([^"]*)". This assumes that there are no " characters within each
field. This stops the regex from getting past the next " character of
each field and eliminates all that backtracking.

Alternatively, you could look at splitting the line on the comma (see
http://www.ruby-doc.org/core/classes/String.html#M000818) and end up
with a nice array to reference each field. You'll still have the quotes
that you'll need to strip from each item (unless you use the three
character separator of "," and manually remove the leading " character
from the first element and the trailing " character from the last
element). This will likely be the fastest way since the regex doesn't
need to be evaluated. However, you may need to put in more logic if not
all lines are to be split in the text file such as comment lines.

Regards,
Jim
 
J

John Joyce

Ari said:
Pattern matching problem. This time, it doesn't print out any
thing and just soaks up my CPU. I tried slowly adding more and
more for it to do, and it worked great -- until TABLE7. Then it
just soaks up my CPU and makes me cry. At first, when nothing was
printing, I added $stdout.flush to make it print. But it didn't
print! This makes me think that it's something in the when part.

Whats going on?

Help!


lines.each do |line|
case line
when /
^"(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)
","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*
)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.*)","(.
*)","(.*)","(.*)"$/
There are several ways to optimize the regular expression but the
most important thing is to not be greedy. What I mean by this is
that using (.*) matches everything to the end of the line and then
the regular expression backtracks to find the next " character
specified. It will choose the " character closest to the end of the
line but that is not the one you want so it backtracks again and
again and so on wasting CPU cycles.

Instead of being greedy and using "(.*)", your best bet would be to
use "([^"]*)". This assumes that there are no " characters within
each field. This stops the regex from getting past the next "
character of each field and eliminates all that backtracking.

Alternatively, you could look at splitting the line on the comma
(see http://www.ruby-doc.org/core/classes/String.html#M000818) and
end up with a nice array to reference each field. You'll still have
the quotes that you'll need to strip from each item (unless you use
the three character separator of "," and manually remove the
leading " character from the first element and the trailing "
character from the last element). This will likely be the fastest
way since the regex doesn't need to be evaluated. However, you may
need to put in more logic if not all lines are to be split in the
text file such as comment lines.

Regards,
Jim
Ari also check out Unit Testing in any of the Ruby books. You can
test your regex for failures as you go. Regex is one of those
instances where UT is really immediately and obviously useful. (tho
UT is truthfully useful all the time)

Also, sometimes reading giant files, even if you use readline or
another way to break it into smaller parts to work with, you should
consider reading x bytes of the file at a time. (you can add a
routine to check where the last \n appeared before x bytes and then
use the location (in bytes) of the last \n to rewind the file to and
start reading again for x bytes more or until end of file.

RegEx is great but will become a big resource hog if you just let it
go on a big file. Chop it up into smaller tasks, and you can report
on the progress of the whole process.

John Joyce
 
A

Ari Brown

You don't happen to be trying to parse a CSV file by any chance? If
so, why not use FasterCSV?

FasterCSV.... I think I will! Thanks!

Ari
-------------------------------------------|
Nietzsche is my copilot
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,817
Latest member
DicWeils

Latest Threads

Top