string scan question

S

srinsriram

this is probably elementary but I just havent found the right/
reliable way to do this (that works always)

if a string has content in tags such as <TagName> content goes here </
TagName> whats the best way to put the content inside an array.. the
content can have whitespace chars (end of lines tabs etc) that should
be preserved in the array element. These tags are simple (no
properties).

I assume that the scan method is relevant but am having trouble
constructing a regex that works reliably.
 
P

Peter Szinek

this is probably elementary but I just havent found the right/
reliable way to do this (that works always)

if a string has content in tags such as <TagName> content goes here </
TagName> whats the best way to put the content inside an array.. the
content can have whitespace chars (end of lines tabs etc) that should
be preserved in the array element. These tags are simple (no
properties).

I assume that the scan method is relevant but am having trouble
constructing a regex that works reliably.

try

html_file.scan(/<TagName>(.+?)</).flatten

This will put the text contents of all <TagName> tags into an array.

Cheers,
Peter
 
S

srinsriram

Here is a simple test case

s = <VariableValue>\nXXX\n</VariableValue>\n\n<VariableValue>
\n<Choice> Administrative Support - Supervisors<Choice>\n</
VariableValue>\n\n"

for the VariableValue tag, there should be 2 array elements \nXXX
\n AND \n<Choice> Administrative Support - Supervisors<Choice>\n

s.scan(/<VariableValue>(.+?)</).flatten returns an empty array
 
P

Peter Szinek

Here is a simple test case

s = <VariableValue>\nXXX\n</VariableValue>\n\n<VariableValue>
\n<Choice> Administrative Support - Supervisors<Choice>\n</
VariableValue>\n\n"

for the VariableValue tag, there should be 2 array elements \nXXX
\n AND \n<Choice> Administrative Support - Supervisors<Choice>\n

s.scan(/<VariableValue>(.+?)</).flatten returns an empty array

Ah OK, I did not know there can be other tags inside. This works better:

s.scan(/<VariableValue>(.+?)<\/VariableValue>/m).flatten

(note the 'm' flag for multiline)

however, to really match your example, I needed this:

s.scan(/<VariableValue>(.+?)<\/\n?VariableValue>/m).flatten

are you sure there is a line break between / and the tag name?

Cheers,
Peter
 
A

Alex Young

Here is a simple test case

s = <VariableValue>\nXXX\n</VariableValue>\n\n<VariableValue>
\n<Choice> Administrative Support - Supervisors<Choice>\n</
VariableValue>\n\n"
If it's actually XML, just use REXML. Anything else is asking for
trouble, really.
 
S

srinsriram

No there isnt any (due to wrapping here)..
I didnt know about the multiline option (seem to have missed that in
the docs). that worked

thanks very much
 
A

Alex Young

Alex said:
If it's actually XML, just use REXML. Anything else is asking for
trouble, really.
Sorry, I didn't notice that your <Choice> tags aren't matched. Is that
intentional? If so, ignore my suggestion - REXML clearly won't work.
 
S

srinsriram

Sorry, I didn't notice that your <Choice> tags aren't matched. Is that
intentional? If so, ignore my suggestion - REXML clearly won't work.

the content can be quite nonstandard and have mismatched tags etc
(like real life html).. so this is not xml
I will use rexml when the input is xml.. thanks for your suggestion.
this group is very useful for newbies.
 
P

Peter Szinek

btw, if the content is funky, you could still try Hpricot - it handles
such crap surprisingly nicely, and unless you would like to match more
complicated things than a text between an opening and closing tag, it
will really make your life easier.

Cheers,
Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,150
Members
46,697
Latest member
AugustNabo

Latest Threads

Top