how to find a lable quickly?

wang frank · May 4, 2007

Hi,

I am a new user on Python and I really love it.

I have a big text file with each line like:

label 3
teststart 5
endtest 100
newrun 2345

I opened the file by uu=open('test.txt','r') and then read the data as
xx=uu.readlines()

In xx, it contains the list of each line. I want to find a spcefic labels
and read the data. Currently, I
do this by
for ss in xx:
zz=ss.split( )
if zz[0] = endtest:
index=zz[1]

Since the file is big and I need find more lables, this code runs slowly.
Are there anyway to speed up the process? I thought to convert the data xx
from list to a dictionay, so I can get the index quickly based on the
label. Can I do that effeciently?

Thanks

Frank

_________________________________________________________________
$B%a%C%;%s%8%c!<$*M'C#>R2p%W%l%<%s%HBh(B2$BCF3+;O!*%i%9%Y%,%9N99T%W%l%<%s%H(B
http://campaign.live.jp/dizon/

Larry Bates · May 4, 2007

wang said:
Hi,

I am a new user on Python and I really love it.
I have a big text file with each line like:

label 3
teststart 5
endtest 100
newrun 2345

I opened the file by uu=open('test.txt','r') and then read the data as
xx=uu.readlines()

In xx, it contains the list of each line. I want to find a spcefic
labels and read the data. Currently, I
do this by
for ss in xx:
zz=ss.split( )
if zz[0] = endtest:
index=zz[1]

Since the file is big and I need find more lables, this code runs
slowly. Are there anyway to speed up the process? I thought to convert
the data xx from list to a dictionay, so I can get the index quickly
based on the label. Can I do that effeciently?

Thanks

Frank

_________________________________________________________________
$B%a%C%;%s%8%c!<$*M'C#>R2p%W%l%<%s%HBh(B2$BCF3+;O!*%i%9%Y%,%9N99T%W%l%<%s%H(B
http://campaign.live.jp/dizon/

Are the labels unique? That is, labels are never repeated in the file. If
not you are going to need to do some processing because dictionary keys
must be unique.

Do you have control over the format of the test.txt file. If so a small
change would put it into a format that the ConfigParser module can handle
which would make it faster because it uses dictionaries.

[labels]
label=3
teststart=5
endtest=100
newrun=2345

With this you can have different sections [section] with labels under each
section. Use configParser to read this and then get options with
geting(section, option).

-Larry

Miki · May 4, 2007

Hello Frank,

I am a new user on Python and I really love it.

The more you know, the deeper the love

I have a big text file with each line like:

label 3
teststart 5
endtest 100
newrun 2345

I opened the file by uu=open('test.txt','r') and then read the data as
xx=uu.readlines()

This reads the whole file to memory, which might be a problem.

In xx, it contains the list of each line. I want to find a spcefic labels
and read the data. Currently, I
do this by
for ss in xx:
zz=ss.split( )
if zz[0] = endtest:
index=zz[1]

Since the file is big and I need find more lables, this code runs slowly.
Are there anyway to speed up the process? I thought to convert the data xx
from list to a dictionay, so I can get the index quickly based on the
label. Can I do that effeciently?

IMO a better way is either to not load the whole file to memory:
# Untested
labels = {}.fromkeys(["endtest", "other_label"])
for line in open("test.txt"):
label, value = line.split()
if label in labels:
labels[label] = value.strip()

Another option is to use an external fast program (such as egrep):
from os import popen
labels = {}
for line in popen("egrep 'endtest|other_label' test.txt"):
label, value = line.strip().split()
labels[label] = value

HTH,

Duncan Booth · May 4, 2007

wang frank said:
Hi,

I am a new user on Python and I really love it.

I have a big text file with each line like:

label 3
teststart 5
endtest 100
newrun 2345

I opened the file by uu=open('test.txt','r') and then read the data as
xx=uu.readlines()

First suggestion: never use readlines() unless you really want all the
lines in a list. Iterating over the file will probably be faster
(especially if some of the time you can abort the search without reading
all the way to the end).

In xx, it contains the list of each line. I want to find a spcefic
labels and read the data. Currently, I
do this by
for ss in xx:
zz=ss.split( )
if zz[0] = endtest:
index=zz[1]

Ignoring the fact that what you wrote wouldn't compile, you could try:

if ss.startwith('endtest '):
...

Since the file is big and I need find more lables, this code runs
slowly. Are there anyway to speed up the process? I thought to convert
the data xx from list to a dictionay, so I can get the index quickly
based on the label. Can I do that effeciently?

Yes, if you need to do this more than once you want to avoid scanning the
file repeatedly. So long as you are confident that every line in the file
is exactly two fields:

lookuptable = dict(s.split() for s in uu)

is about as efficient as you are going to get.

How to play corresponding sound?	2	Jun 10, 2023
How to use Densenet121 in monai	0	Feb 16, 2024
How to read a flat file quickly	10	May 12, 2009
How to create a tuple quickly with list comprehension?	6	Jun 13, 2007
How to try a range of hex values in C# code ?	0	Nov 19, 2022
how to quickly find a string towards the end of a large io object	11	Nov 6, 2008
Trying to build a SARIMAX model to forecast the S&P500 trend	0	Nov 5, 2023
How to loop in folder through all excel files and all sheets using pandas?	0	Dec 1, 2022

how to find a lable quickly?

wang frank

Larry Bates

Miki

Duncan Booth

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads