J
jdm
I'm parsing some html and have a table-driven state machine that configures
itself by reading tuples (one per line) of "present state", "pattern to match",
and "next state" with this code:
lineArray=Array.new()
stateTable=Hash.new()
File.open("stateTable.txt") { |file|
file.each { |line|
lineArray=line.scan(/[^\s]+/)
stateTable[lineArray[0]]=lineArray[1..-1]
}
}
I've printed this hash out in several different ways with the same results: the
key-value pairs look as expected (no extraneous spaces, newlines, etc.). Once
the hash is set up, it drives a state machine with this code:
1 state="html"
2 while input=gets() # text lines are the s.m.'s "clock"
3 if input.chomp().length>0 # skip blank lines
4 if stateTable.has_key?(state) # is current state defined by a tuple?
# for now all states are defined
5 if input=~Regexp.new(stateTable[state][0]) # change state if match
6 state=stateTable[state][3]
7 elsif # else complain
8 print("\nline #{$NR}: no match on #{stateTable[state][0]}\n")
9 exit
10 end
11 end # if state in stateTable
12 end # if input.chomp()
13 end # while
I have confirmed multiple times and ways that stateTable["html"][0] contains
"<html>" yet the if on line 4 is never successful even though the first
non-blank line in the input is "<html>". I tried doing it manually by inserting
the following between lines 3 and 4:
if input=~/<html>/ ...
and this worked (moving the state machine to the next state which is "title")
but the problem repeated itself all over again in that state too. So I have no
problem pattern matching with regex literals but can't pattern match with
regex's derived from ostensibly identical strings read from a file.
For the conditional on line 5 I have also tried:
Regexp.new(Regexp.escape(stateTable[state][0]))
and
Regexp.new(stateTable[state][0].to_s)
and
Regexp.new(stateTable[state][0]).match(input) # returned nil
to no avail.
For line 5 I initially had:
if input=~stateTable[state][0]
This didn't work either and generated the following warning:
warning: string=~string will be obsolete; use explicit regexp
I'm using version 1.8.1 (2003-12-25) on Windows (i386-mswin32).
The point of this post is not to get better ways to parse html (but feel free to
suggest them anyway - the point is to find out why I can't read a string from
a file and then use it (as expected) as a regex in a match operator expression.
I humbly await searing insight and enlightenment from the collective (to which
resistance is futile in any case).
itself by reading tuples (one per line) of "present state", "pattern to match",
and "next state" with this code:
lineArray=Array.new()
stateTable=Hash.new()
File.open("stateTable.txt") { |file|
file.each { |line|
lineArray=line.scan(/[^\s]+/)
stateTable[lineArray[0]]=lineArray[1..-1]
}
}
I've printed this hash out in several different ways with the same results: the
key-value pairs look as expected (no extraneous spaces, newlines, etc.). Once
the hash is set up, it drives a state machine with this code:
1 state="html"
2 while input=gets() # text lines are the s.m.'s "clock"
3 if input.chomp().length>0 # skip blank lines
4 if stateTable.has_key?(state) # is current state defined by a tuple?
# for now all states are defined
5 if input=~Regexp.new(stateTable[state][0]) # change state if match
6 state=stateTable[state][3]
7 elsif # else complain
8 print("\nline #{$NR}: no match on #{stateTable[state][0]}\n")
9 exit
10 end
11 end # if state in stateTable
12 end # if input.chomp()
13 end # while
I have confirmed multiple times and ways that stateTable["html"][0] contains
"<html>" yet the if on line 4 is never successful even though the first
non-blank line in the input is "<html>". I tried doing it manually by inserting
the following between lines 3 and 4:
if input=~/<html>/ ...
and this worked (moving the state machine to the next state which is "title")
but the problem repeated itself all over again in that state too. So I have no
problem pattern matching with regex literals but can't pattern match with
regex's derived from ostensibly identical strings read from a file.
For the conditional on line 5 I have also tried:
Regexp.new(Regexp.escape(stateTable[state][0]))
and
Regexp.new(stateTable[state][0].to_s)
and
Regexp.new(stateTable[state][0]).match(input) # returned nil
to no avail.
For line 5 I initially had:
if input=~stateTable[state][0]
This didn't work either and generated the following warning:
warning: string=~string will be obsolete; use explicit regexp
I'm using version 1.8.1 (2003-12-25) on Windows (i386-mswin32).
The point of this post is not to get better ways to parse html (but feel free to
suggest them anyway - the point is to find out why I can't read a string from
a file and then use it (as expected) as a regex in a match operator expression.
I humbly await searing insight and enlightenment from the collective (to which
resistance is futile in any case).