parsing indented plain text

B

B Mills

Hi.

I'm still kind of new to Ruby and scripting in general, and am trying to
select specific entries from the plain text output of the Apple System
Profiler which arranges data like this:

Network:

Internal Modem:

Type: PPP (PPPSerial)
Hardware: Modem
BSD Device Name: modem
IPv4:
Configuration Method: PPP
IPv6:
Configuration Method: Automatic
Proxies:
FTP Passive Mode: Yes

Built-in Ethernet:

Type: Ethernet
Hardware: Ethernet
BSD Device Name: en0
IPv4 Addresses: 10.3.9.249

So on and so forth. (There is an XML output option, but it uses the
Apple plist format, which provides very little useful structure and a
much larger filesize). I want to find a way to parse this based on the
number of preceding white spaces at the beginning of the line (and then
by colon), so I can get a specific property from a Hash:

profile['Network:'][ 'Built-in Ethernet:']['IPv4 Addresses:']

I started out by using grep to create an array of of each level of
indentation, and nesting loops to select a range between two indexes of
that level:

key1 = file.grep(/^\w/)
key1.each do |k1|
range1 = file.index(key1[key1.index(k1)]
if k1 == key1.last
range2 = range1
else
range2 = file.index(key1[key1.index(k1) + 1]
end
stub = file[range1...range2]
key2 = stub.grep(^/\s\s\s\s\w/)
key2.each do |k2|
....rinse and repeat...
key3.each do |k3|

...and so on until I run out of indentation levels. I'm running into
problems where the loop stops at the last element and won't dig down
into anything below it. It also doesn't handle context, as there are
duplicate values in the file. I also tend to have the problem of making
things more complicated than they have to be, so I figure there is a
more elegant and Ruby-ish way to do this. Any suggestions?
 
L

Luke Ivers

Hi.

I'm still kind of new to Ruby and scripting in general, and am trying to
select specific entries from the plain text output of the Apple System
Profiler which arranges data like this:

Network:

Internal Modem:

Type: PPP (PPPSerial)
Hardware: Modem
BSD Device Name: modem
IPv4:
Configuration Method: PPP
IPv6:
Configuration Method: Automatic
Proxies:
FTP Passive Mode: Yes

Built-in Ethernet:

Type: Ethernet
Hardware: Ethernet
BSD Device Name: en0
IPv4 Addresses: 10.3.9.249

So on and so forth. (There is an XML output option, but it uses the
Apple plist format, which provides very little useful structure and a
much larger filesize). I want to find a way to parse this based on the
number of preceding white spaces at the beginning of the line (and then
by colon), so I can get a specific property from a Hash:

profile['Network:'][ 'Built-in Ethernet:']['IPv4 Addresses:']

I started out by using grep to create an array of of each level of
indentation, and nesting loops to select a range between two indexes of
that level:

key1 = file.grep(/^\w/)
key1.each do |k1|
range1 = file.index(key1[key1.index(k1)]
if k1 == key1.last
range2 = range1
else
range2 = file.index(key1[key1.index(k1) + 1]
end
stub = file[range1...range2]
key2 = stub.grep(^/\s\s\s\s\w/)
key2.each do |k2|
....rinse and repeat...
key3.each do |k3|

...and so on until I run out of indentation levels. I'm running into
problems where the loop stops at the last element and won't dig down
into anything below it. It also doesn't handle context, as there are
duplicate values in the file. I also tend to have the problem of making
things more complicated than they have to be, so I figure there is a
more elegant and Ruby-ish way to do this. Any suggestions?
This is already built for using YAML... it's structured as YAML already.
If this is saved as a file named network.yml, you can do this:
profile = {}
File.open('network.yml') { |f| profile = YAML.load(f) }
You'll get the following
=> {"Network"=>{"Built-in Ethernet"=>{"BSD Device Name"=>"en0", "IPv4 Addresses"
=>"10.3.9.249", "Hardware"=>"Ethernet", "Type"=>"Ethernet"}, "Internal Modem"=>{
"Proxies"=>{"FTP Passive Mode"=>true}, "BSD Device Name"=>"modem", "IPv4"=>{"Con
figuration Method"=>"PPP"}, "IPv6"=>{"Configuration Method"=>"Automatic"}, "Hard
ware"=>"Modem", "Type"=>"PPP (PPPSerial)"}}}

Meaning you can do profile["Network"]["Build-in Ethernet"]...
You get the idea.
Gotta go eat lunch now.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top