parsing indented plain text

B Mills · Jan 31, 2007

Hi.

I'm still kind of new to Ruby and scripting in general, and am trying to
select specific entries from the plain text output of the Apple System
Profiler which arranges data like this:

Network:

Internal Modem:

Type: PPP (PPPSerial)
Hardware: Modem
BSD Device Name: modem
IPv4:
Configuration Method: PPP
IPv6:
Configuration Method: Automatic
Proxies:
FTP Passive Mode: Yes

Built-in Ethernet:

Type: Ethernet
Hardware: Ethernet
BSD Device Name: en0
IPv4 Addresses: 10.3.9.249

So on and so forth. (There is an XML output option, but it uses the
Apple plist format, which provides very little useful structure and a
much larger filesize). I want to find a way to parse this based on the
number of preceding white spaces at the beginning of the line (and then
by colon), so I can get a specific property from a Hash:

profile['Network:'][ 'Built-in Ethernet:']['IPv4 Addresses:']

I started out by using grep to create an array of of each level of
indentation, and nesting loops to select a range between two indexes of
that level:

key1 = file.grep(/^\w/)
key1.each do |k1|
range1 = file.index(key1[key1.index(k1)]
if k1 == key1.last
range2 = range1
else
range2 = file.index(key1[key1.index(k1) + 1]
end
stub = file[range1...range2]
key2 = stub.grep(^/\s\s\s\s\w/)
key2.each do |k2|
....rinse and repeat...
key3.each do |k3|

...and so on until I run out of indentation levels. I'm running into
problems where the loop stops at the last element and won't dig down
into anything below it. It also doesn't handle context, as there are
duplicate values in the file. I also tend to have the problem of making
things more complicated than they have to be, so I figure there is a
more elegant and Ruby-ish way to do this. Any suggestions?

Luke Ivers · Jan 31, 2007

Hi.

I'm still kind of new to Ruby and scripting in general, and am trying to
select specific entries from the plain text output of the Apple System
Profiler which arranges data like this:

Network:

Internal Modem:

Type: PPP (PPPSerial)
Hardware: Modem
BSD Device Name: modem
IPv4:
Configuration Method: PPP
IPv6:
Configuration Method: Automatic
Proxies:
FTP Passive Mode: Yes

Built-in Ethernet:

Type: Ethernet
Hardware: Ethernet
BSD Device Name: en0
IPv4 Addresses: 10.3.9.249

So on and so forth. (There is an XML output option, but it uses the
Apple plist format, which provides very little useful structure and a
much larger filesize). I want to find a way to parse this based on the
number of preceding white spaces at the beginning of the line (and then
by colon), so I can get a specific property from a Hash:

profile['Network:'][ 'Built-in Ethernet:']['IPv4 Addresses:']

I started out by using grep to create an array of of each level of
indentation, and nesting loops to select a range between two indexes of
that level:

key1 = file.grep(/^\w/)
key1.each do |k1|
range1 = file.index(key1[key1.index(k1)]
if k1 == key1.last
range2 = range1
else
range2 = file.index(key1[key1.index(k1) + 1]
end
stub = file[range1...range2]
key2 = stub.grep(^/\s\s\s\s\w/)
key2.each do |k2|
....rinse and repeat...
key3.each do |k3|

...and so on until I run out of indentation levels. I'm running into
problems where the loop stops at the last element and won't dig down
into anything below it. It also doesn't handle context, as there are
duplicate values in the file. I also tend to have the problem of making
things more complicated than they have to be, so I figure there is a
more elegant and Ruby-ish way to do this. Any suggestions?

This is already built for using YAML... it's structured as YAML already.
If this is saved as a file named network.yml, you can do this:
profile = {}
File.open('network.yml') { |f| profile = YAML.load(f) }
You'll get the following
=> {"Network"=>{"Built-in Ethernet"=>{"BSD Device Name"=>"en0", "IPv4 Addresses"
=>"10.3.9.249", "Hardware"=>"Ethernet", "Type"=>"Ethernet"}, "Internal Modem"=>{
"Proxies"=>{"FTP Passive Mode"=>true}, "BSD Device Name"=>"modem", "IPv4"=>{"Con
figuration Method"=>"PPP"}, "IPv6"=>{"Configuration Method"=>"Automatic"}, "Hard
ware"=>"Modem", "Type"=>"PPP (PPPSerial)"}}}

Meaning you can do profile["Network"]["Build-in Ethernet"]...
You get the idea.
Gotta go eat lunch now.

parsing text from "ethtool" command	3	Nov 1, 2011
No matter what I do, IDLE will not work...	7	Nov 10, 2011
Python crashes consistently	6	Apr 16, 2008
Rental/Sale Latest VoipSWITCH 2.0.0.879+All Modules+Training	0	Jul 16, 2007
Seek Contract Programming Work - 17 Years Experience	0	Feb 22, 2005
word_set = set() def should_preceed_with_an(phrase): first_word =	1	Jan 26, 2013
comp.lang.vhdl FAQ part 1 of 4: general	0	Jul 8, 2003
Java decode alternative	0	Apr 1, 2005

parsing indented plain text

B Mills

Luke Ivers

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads