Reading a portion of a file

cmfvulcanius · Mar 8, 2007

I am using a script with a single file containing all data in multiple
sections. Each section begins with "#VS:CMD:command:START" and ends
with "#VS:CMD:command:STOP". There is a blank line in between each
section. I'm looking for the best way to grab one section at a time.
Will I have to read the entire file to a string and parse it further
or is it possible to grab the section directly when doing a read? I'm
guessing regex is the best possible way. Any help is greatly
appreciated.

Thanks

Rune Strand · Mar 8, 2007

I am using a script with a single file containing all data in multiple
sections. Each section begins with "#VS:CMD:command:START" and ends
with "#VS:CMD:command:STOP". There is a blank line in between each
section. I'm looking for the best way to grab one section at a time.
Will I have to read the entire file to a string and parse it further
or is it possible to grab the section directly when doing a read? I'm
guessing regex is the best possible way. Any help is greatly
appreciated.

Seems like something along these line will do:

_file_ = "filepart.txt"

begin_tag = '#VS:CMD:command:START'
end_tag = '#VS:CMD:command:STOP'

sections = []
new_section = []
for line in open(_file_):
line = line.strip()
if begin_tag in line:
new_section = []
elif end_tag in line:
sections.append(new_section)
else:
if line: new_section.append(line)

for s in sections: print s

If your want more control, perhaps flagging "inside_section",
"outside_section" is an idea.

Jordan · Mar 8, 2007

I am using a script with a single file containing all data in multiple
sections. Each section begins with "#VS:CMD:command:START" and ends
with "#VS:CMD:command:STOP". There is a blank line in between each
section. I'm looking for the best way to grab one section at a time.
Will I have to read the entire file to a string and parse it further
or is it possible to grab the section directly when doing a read? I'm
guessing regex is the best possible way. Any help is greatly
appreciated.

Click to expand...

Seems like something along these line will do:

_file_ = "filepart.txt"

begin_tag = '#VS:CMD:command:START'
end_tag = '#VS:CMD:command:STOP'

sections = []
new_section = []
for line in open(_file_):
line = line.strip()
if begin_tag in line:
new_section = []
elif end_tag in line:
sections.append(new_section)
else:
if line: new_section.append(line)

for s in sections: print s

If your want more control, perhaps flagging "inside_section",
"outside_section" is an idea.

You probably don't want to use regex for something this simple; it's
likely to make things even more complicated. Is there a space between
the begin_tag and the first word of a section (same question with the
end_tag)?

Jordan · Mar 8, 2007

On Mar 8, 5:12 pm, (e-mail address removed) wrote:

Click to expand...

Seems like something along these line will do:

Click to expand...

_file_ = "filepart.txt"

Click to expand...

begin_tag = '#VS:CMD:command:START'
end_tag = '#VS:CMD:command:STOP'

Click to expand...

sections = []
new_section = []
for line in open(_file_):
line = line.strip()
if begin_tag in line:
new_section = []
elif end_tag in line:
sections.append(new_section)
else:
if line: new_section.append(line)

Click to expand...

for s in sections: print s

Click to expand...

If your want more control, perhaps flagging "inside_section",
"outside_section" is an idea.

Click to expand...

You probably don't want to use regex for something this simple; it's
likely to make things even more complicated. Is there a space between
the begin_tag and the first word of a section (same question with the
end_tag)?

Sent the post too soon. What is the endline character for the file
type? What type of file is it? An example section would be nice
too. Cheers.

cmfvulcanius · Mar 8, 2007

On Mar 8, 5:12 pm, (e-mail address removed) wrote:
I am using a script with a single file containing all data in multiple
sections. Each section begins with "#VS:CMD:command:START" and ends
with "#VS:CMD:command:STOP". There is a blank line in between each
section. I'm looking for the best way to grab one section at a time.
Will I have to read the entire file to a string and parse it further
or is it possible to grab the section directly when doing a read? I'm
guessing regex is the best possible way. Any help is greatly
appreciated.
Seems like something along these line will do:
_file_ = "filepart.txt"
begin_tag = '#VS:CMD:command:START'
end_tag = '#VS:CMD:command:STOP'
sections = []
new_section = []
for line in open(_file_):
line = line.strip()
if begin_tag in line:
new_section = []
elif end_tag in line:
sections.append(new_section)
else:
if line: new_section.append(line)
for s in sections: print s
If your want more control, perhaps flagging "inside_section",
"outside_section" is an idea.

Click to expand...

Click to expand...

You probably don't want to use regex for something this simple; it's
likely to make things even more complicated. Is there a space between
the begin_tag and the first word of a section (same question with the
end_tag)?

Click to expand...

Sent the post too soon. What is the endline character for the file
type? What type of file is it? An example section would be nice
too. Cheers.

Ok, regex was my first thought because I used to use grep with Perl
and shell scripting to grab everything from one pattern to another
pattern. The file is just an unformatted file. What is below is
exactly what is in the file. There are no spaces between the beginning
and ending tags and the content. Would you recommend using spaces
there? And if so, why?

A sample of the file:

#VS:COMMAND:df:START
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/vzfs 20971520 517652 20453868 3% /
tmpfs 2016032 44 2015988 1% /var/run
tmpfs 2016032 0 2016032 0% /var/lock
tmpfs 2016032 0 2016032 0% /dev/shm
tmpfs 2016032 44 2015988 1% /var/run
tmpfs 2016032 0 2016032 0% /var/lock
#VS:COMMAND:df:STOP

#VS:FILE:/proc/loadavg:START
0.00 0.00 0.00 1/32 14543
#VS:FILE:/proc/loadavg:STOP

#VS:FILE:/proc/meminfo:START
MemTotal: 524288 kB
MemFree: 450448 kB
Buffers: 0 kB
Cached: 0 kB
SwapCached: 0 kB
Active: 0 kB
Inactive: 0 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 524288 kB
LowFree: 450448 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
Mapped: 73840 kB
Slab: 0 kB
CommitLimit: 0 kB
Committed_AS: 248704 kB
PageTables: 0 kB
VmallocTotal: 0 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
#VS:FILE:/proc/meminfo:STOP

#VS:FILE:/proc/stat:START
cpu 67188 0 26366 391669264 656686 0 0
cpu0 24700 0 10830 195807826 373309 0 0
cpu1 42488 0 15536 195861438 283376 0 0
intr 0
swap 0 0
ctxt 18105366807
btime 1171391058
processes 26501285
procs_running 1
procs_blocked 0
#VS:FILE:/proc/stat:STOP

#VS:FILE:/proc/uptime:START
1962358.88 1577059.05
#VS:FILE:/proc/uptime:STOP

attn.steven.kuo · Mar 9, 2007

On Mar 8, 10:35 am, (e-mail address removed) wrote:

(snipped)

Ok, regex was my first thought because I used to use grep with Perl
and shell scripting to grab everything from one pattern to another
pattern. The file is just an unformatted file. What is below is
exactly what is in the file. There are no spaces between the beginning
and ending tags and the content. Would you recommend using spaces
there? And if so, why?

A sample of the file:

You can use iterators:

import StringIO
import itertools

def group(line):
if line[-6:-1] == 'START':
group.current = group.current + 1
return group.current

group.current = 0

data = """
#VS:COMMAND:df:START
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/vzfs 20971520 517652 20453868 3% /
tmpfs 2016032 44 2015988 1% /var/run
tmpfs 2016032 0 2016032 0% /var/lock
tmpfs 2016032 0 2016032 0% /dev/shm
tmpfs 2016032 44 2015988 1% /var/run
tmpfs 2016032 0 2016032 0% /var/lock
#VS:COMMAND:df:STOP

#VS:FILE:/proc/loadavg:START
0.00 0.00 0.00 1/32 14543
#VS:FILE:/proc/loadavg:STOP

#VS:FILE:/proc/meminfo:START
MemTotal: 524288 kB
MemFree: 450448 kB
Buffers: 0 kB
Cached: 0 kB
SwapCached: 0 kB
Active: 0 kB
Inactive: 0 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 524288 kB
LowFree: 450448 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
Mapped: 73840 kB
Slab: 0 kB
CommitLimit: 0 kB
Committed_AS: 248704 kB
PageTables: 0 kB
VmallocTotal: 0 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
#VS:FILE:/proc/meminfo:STOP

#VS:FILE:/proc/stat:START
cpu 67188 0 26366 391669264 656686 0 0
cpu0 24700 0 10830 195807826 373309 0 0
cpu1 42488 0 15536 195861438 283376 0 0
intr 0
swap 0 0
ctxt 18105366807
btime 1171391058
processes 26501285
procs_running 1
procs_blocked 0
#VS:FILE:/proc/stat:STOP

#VS:FILE:/proc/uptime:START
1962358.88 1577059.05
#VS:FILE:/proc/uptime:STOP
""".lstrip("\n");

fh = StringIO.StringIO(data)

sections = itertools.groupby(itertools.ifilter(lambda line: len(line)

1, fh),

lambda line: group(line))

for key, section in sections:
for line in section:
print key, line,

Vulcanius · Mar 9, 2007

Here is the code I've come up with. Please feel free to critique it
and let me know what you would change. Also, as you can see I call
"open(SERVER,'r')" twice; but I want to only call it once, what would
the best way to do this be?

------------------------------------------------------------

import re

SERVER = "192.168.1.60"

# Pull all data from server file.
FILE = open(SERVER,'r')
ALLINFO = FILE.read()

# Grab a list of all sections in the server file.
SECTIONS = re.findall("(?m)^\#VS:\w*:.*:", ALLINFO)

# Remove duplicates from the list.
if SECTIONS:
SECTIONS.sort()
LAST = SECTIONS[-1]
for I in range(len(SECTIONS)-2, -1, -1):
if LAST==SECTIONS: del SECTIONS
else: LAST=SECTIONS

# Pull data from each section and assign it a dictionary item.
# Data can be called using SECTIONDICT['section'] i.e
SECTIONDICT['df']
SECTIONDICT = {}
for SECT in SECTIONS:
PRESECTNAME1 = SECT[9:len(SECT) - 1]
PRESECTNAME2 = PRESECTNAME1.split("/")
SECTNAME = PRESECTNAME2[len(PRESECTNAME1.split("/")) - 1]
START = SECT + "START"
STOP = SECT + "STOP"
for LINE in open(SERVER,'r'):
LINE = LINE.strip()
if START in LINE:
SECTIONLISTTEMP = []
elif STOP in LINE:
SECTIONDICT[SECTNAME] = SECTIONLISTTEMP
SECTIONLISTTEMP = []
print "-" * 80
print "SECTION: %s" % SECTNAME
print SECTIONDICT[SECTNAME]
else:
if LINE:
SECTIONLISTTEMP.append(LINE)

FILE.close()

------------------------------------------------------------

Gabriel Genellina · Mar 10, 2007

Here is the code I've come up with. Please feel free to critique it
and let me know what you would change. Also, as you can see I call
"open(SERVER,'r')" twice; but I want to only call it once, what would
the best way to do this be?

You got yesterday a reply from rune.strand@g... without regexps that looks
pretty functional, have you seen it?

SECTIONDICT = {}
for SECT in SECTIONS:
PRESECTNAME1 = SECT[9:len(SECT) - 1]
PRESECTNAME2 = PRESECTNAME1.split("/")

Ugh... don't use UPPERCASE names for variables, please!
Better to follow this style guide: http://www.python.org/dev/peps/pep-0008/

Add a text file that a user specified the name of in a program to a directory	0	Apr 28, 2022
How can I train a neural network by reading different csv files	0	Nov 24, 2022
Reading only a specific portion of XML file.	3	Dec 14, 2009
Reading File Into 2D List	2	Jul 9, 2013
Want to host websites that I will probably be the only user from home. Sacrilege, I know, but it has always been a dream of mine. Where do I start?	2	Aug 13, 2024
Getting value of instances of variable.	1	Mar 25, 2023
Padding strings for a clean visual print out...	5	Dec 23, 2023
Reading file issue	5	Jan 28, 2013

Reading a portion of a file

cmfvulcanius

Rune Strand

Jordan

Jordan

cmfvulcanius

attn.steven.kuo

Vulcanius

Gabriel Genellina

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads