Remove empty strings from list

H

Helvin

Hi,

Sorry I did not want to bother the group, but I really do not
understand this seeming trivial problem.
I am reading from a textfile, where each line has 2 values, with
spaces before and between the values.
I would like to read in these values, but of course, I don't want the
whitespaces between them.
I have looked at documentation, and how strings and lists work, but I
cannot understand the behaviour of the following:
line = f.readline()
line = line.lstrip() # take away whitespace at the beginning of the
readline.
list = line.split(' ') # split the str line into a list

# the list has empty strings in it, so now,
remove these empty strings
for item in list:
if item is ' ':
print 'discard these: ',item
index = list.index(item)
del list[index] # remove this item from the list
else:
print 'keep this: ',item
The problem is, when my list is : ['44', '', '', '', '', '',
'0.000000000\n']
The output is:
len of list: 7
keep this: 44
discard these:
discard these:
discard these:
So finally the list is: ['44', '', '', '0.000000000\n']
The code above removes all the empty strings in the middle, all except
two. My code seems to miss two of the empty strings.

Would you know why this is occuring?

Regards,
Helvin
 
C

Chris Rebert

Hi,

Sorry I did not want to bother the group, but I really do not
understand this seeming trivial problem.
I am reading from a textfile, where each line has 2 values, with
spaces before and between the values.
I would like to read in these values, but of course, I don't want the
whitespaces between them.
I have looked at documentation, and how strings and lists work, but I
cannot understand the behaviour of the following:
                       line = f.readline()
                       line = line.lstrip() # take away whitespace at the beginning of the
readline.
                       list = line.split(' ') # split the str line into a list

                       # the list has empty strings in it, so now,
remove these empty strings
                       for item in list:
                               if item is ' ':
                                       print 'discard these: ',item
                                       index = list.index(item)
                                       del list[index]         # remove this item from the list
                               else:
                                       print 'keep this: ',item
The problem is, when my list is :  ['44', '', '', '', '', '',
'0.000000000\n']
The output is:
   len of list:  7
   keep this:  44
   discard these:
   discard these:
   discard these:
So finally the list is:   ['44', '', '', '0.000000000\n']
The code above removes all the empty strings in the middle, all except
two. My code seems to miss two of the empty strings.

Would you know why this is occuring?

Block quoting from http://effbot.org/zone/python-list.htm
"""
Note that the for-in statement maintains an internal index, which is
incremented for each loop iteration. This means that if you modify the
list you’re looping over, the indexes will get out of sync, and you
may end up skipping over items, or process the same item multiple
times.
"""

Thus why your code is skipping over some elements and not removing them.
Moral: Don't modify a list while iterating over it. Use the loop to
create a separate, new list from the old one instead.

Cheers,
Chris
 
D

Dave Angel

Helvin said:
Hi,

Sorry I did not want to bother the group, but I really do not
understand this seeming trivial problem.
I am reading from a textfile, where each line has 2 values, with
spaces before and between the values.
I would like to read in these values, but of course, I don't want the
whitespaces between them.
I have looked at documentation, and how strings and lists work, but I
cannot understand the behaviour of the following:
line = f.readline()
line = line.lstrip() # take away whitespace at the beginning of the
readline.
list = line.split(' ') # split the str line into a list

# the list has empty strings in it, so now,
remove these empty strings
for item in list:
if item is ' ':
print 'discard these: ',item
index = list.index(item)
del list[index] # remove this item from the list
else:
print 'keep this: ',item
The problem is, when my list is : ['44', '', '', '', '', '',
'0.000000000\n']
The output is:
len of list: 7
keep this: 44
discard these:
discard these:
discard these:
So finally the list is: ['44', '', '', '0.000000000\n']
The code above removes all the empty strings in the middle, all except
two. My code seems to miss two of the empty strings.

Would you know why this is occuring?

Regards,
Helvin
(list already is a defined name, so you really should call it something
else.


As Chris says, you're modifying the list while you're iterating through
it, and that's undefined behavior. Why not do the following?

mylist = line.strip().split(' ')
mylist = [item for item in mylist if item]

DaveA
 
D

Dennis Lee Bieber

Hi,

Sorry I did not want to bother the group, but I really do not
understand this seeming trivial problem.
I am reading from a textfile, where each line has 2 values, with
spaces before and between the values.
I would like to read in these values, but of course, I don't want the
whitespaces between them.
I have looked at documentation, and how strings and lists work, but I
cannot understand the behaviour of the following:

<snip>

All of which can be condensed into a simple

for ln in f:
wrds = ln.strip()
# do something with the words -- no whitespace to be seen
 
S

Steven D'Aprano

I have looked at documentation, and how strings and lists work, but I
cannot understand the behaviour of the following: ....
                       for item in list:
                               if item is ' ':
                                       print 'discard these: ',item
                                       index = list.index(item)
                                       del list[index]
....

Moral: Don't modify a list while iterating over it. Use the loop to
create a separate, new list from the old one instead.


This doesn't just apply to Python, it is good advice in every language
I'm familiar with. At the very least, if you have to modify over a list
in place and you are deleting or inserting items, work *backwards*:

for i in xrange(len(alist), -1, -1):
item = alist
if item == 'delete me':
del alist


This is almost never the right solution in Python, but as a general
technique, it works in all sorts of situations. (E.g. when varnishing a
floor, don't start at the doorway and varnish towards the end of the
room, because you'll be walking all over the fresh varnish. Do it the
other way, starting at the end of the room, and work backwards towards
the door.)

In Python, the right solution is almost always to make a new copy of the
list. Here are three ways to do that:


newlist = []
for item in alist:
if item != 'delete me':
newlist.append(item)


newlist = [item for item in alist if item != 'delete me']

newlist = filter(lambda item: item != 'delete me', alist)



Once you have newlist, you can then rebind it to alist:

alist = newlist

or you can replace the contents of alist with the contents of newlist:

alist[:] = newlist


The two have a subtle difference in behavior that may not be apparent
unless you have multiple names bound to alist.
 
B

Bruno Desthuilliers

Helvin a écrit :
Hi,

Sorry I did not want to bother the group, but I really do not
understand this seeming trivial problem.
I am reading from a textfile, where each line has 2 values, with
spaces before and between the values.
I would like to read in these values, but of course, I don't want the
whitespaces between them.
I have looked at documentation, and how strings and lists work, but I
cannot understand the behaviour of the following: line = f.readline()
line = line.lstrip() # take away whitespace at the beginning of the
readline.

file.readline returns the line with the ending newline character (which
is considered whitespace by the str.strip method), so you may want to
use line.strip instead of line.lstrip
list = line.split(' ')

Slightly OT but : don't use builtin types or functions names as
identifiers - this shadows the builtin object.

Also, the default behaviour of str.split is to split on whitespaces and
remove the delimiter. You would have better results not specifying the
delimiters here:
>>> " a a a a ".split(' ') ['', 'a', '', 'a', '', 'a', '', 'a', '']
>>> " a a a a ".split() ['a', 'a', 'a', 'a']
>>>
# the list has empty strings in it, so now,
remove these empty strings

A problem you could have avoided right from the start !-)
for item in list:
if item is ' ':

Don't use identity comparison when you want to test for equality. It
happens to kind of work in your above example but only because CPython
implements a cache for _some_ small strings, but you should _never_ rely


Also, this is surely not your actual code : ' ' is not an empty string,
it's a string with a single space character. The empty string is ''. And
FWIW, empty strings (like most empty sequences and collections, all
numerical zeros, and the None object) have a false value in a boolean
context, so you can just test the string directly:

for s in ['', 0, 0.0, [], {}, (), None]:
if not s:
print "'%s' is empty, so it's false" % str(s)

print 'discard these: ',item
index = list.index(item)
del list[index] # remove this item from the list

And then you do have a big problem : the internal pointer used by the
iterator is not in sync with the list anymore, so the next iteration
will skip one item.

As general rule : *don't* add / remove elements to/from a sequence while
iterating over it. If you really need to modify the sequence while
iterating over it, do a reverse iteration - but there are usually better
solutions.
else:
print 'keep this: ',item
The problem is,

Make it a plural - there's more than 1 problem here !-)
when my list is : ['44', '', '', '', '', '',
'0.000000000\n']
The output is:
len of list: 7
keep this: 44
discard these:
discard these:
discard these:
So finally the list is: ['44', '', '', '0.000000000\n']
The code above removes all the empty strings in the middle, all except
two. My code seems to miss two of the empty strings.

Would you know why this is occuring?


cf above... and below:
>>> alist = ['44', '', '', '', '', '', '0.000000000']
>>> for i, it in enumerate(alist):
.... print 'i : %s - it : "%s"' % (i, it)
.... if not it:
.... del alist[idx]
.... print "alist is now %s" % alist
....
i : 0 - it : "44"
alist is now ['44', '', '', '', '', '', '0.000000000']
i : 1 - it : ""
alist is now ['44', '', '', '', '', '0.000000000']
i : 2 - it : ""
alist is now ['44', '', '', '', '0.000000000']
i : 3 - it : ""
alist is now ['44', '', '', '0.000000000']

Ok, now for practical answers:

1/ in the above case, use line.strip().split(), you'll have no more
problem !-)

2/ as a general rule, if you need to filter a sequence, don't try to do
it in place (unless it's a *very* big sequence and you run into memory
problems but then there are probably better solutions).

The common idioms for filtering a sequence are:

* filter(predicate, sequence):

the 'predicate' param is callback function which takes an item from the
sequence and returns a boolean value (True to keep the item, False to
discard it). The following example will filter out even integers:

def is_odd(n):
return n % 2

alist = range(10)
odds = filter(is_odd, alist)
print alist
print odds

Alternatively, filter() can take None as it's first param, in which case
it will filter out items that have a false value in a boolean context, ie:

alist = ['', 'a', 0, 1, [], [1], None, object, False, True]
result = filter(None, alist)
print result


* list comprehensions

Here you directly build the result list:

alist = range(10)
odds = [n for n in alist if n % 2]

alist = ['', 'a', 0, 1, [], [1], None, object, False, True]
result = [item for item in alist if item]
print result



HTH
 
B

Bruno Desthuilliers

Dave Angel a écrit :
(snip)
As Chris says, you're modifying the list while you're iterating through
it, and that's undefined behavior. Why not do the following?

mylist = line.strip().split(' ')
mylist = [item for item in mylist if item]

Mmmm... because the second line is plain useless when calling
str.split() without a delimiter ?-)

will already do the RightThing(tm).
 
B

Bruno Desthuilliers

Dennis Lee Bieber a écrit :
(snip)
All of which can be condensed into a simple

for ln in f:
wrds = ln.strip()
# do something with the words -- no whitespace to be seen


I assume you meant:
wrds = ln.strip().split()

?-)
 
R

Rhodri James

Hi,

Sorry I did not want to bother the group, but I really do not
understand this seeming trivial problem.
I am reading from a textfile, where each line has 2 values, with
spaces before and between the values.
I would like to read in these values, but of course, I don't want the
whitespaces between them.
I have looked at documentation, and how strings and lists work, but I
cannot understand the behaviour of the following:
line = f.readline()
line = line.lstrip() # take away whitespace at
the beginning of the
readline.
list = line.split(' ') # split the str line into
a list

# the list has empty strings in it, so now,
remove these empty strings
[snip]

Block quoting from http://effbot.org/zone/python-list.htm
"""
Note that the for-in statement maintains an internal index, which is
incremented for each loop iteration. This means that if you modify the
list you’re looping over, the indexes will get out of sync, and you
may end up skipping over items, or process the same item multiple
times.
"""

Thus why your code is skipping over some elements and not removing them.
Moral: Don't modify a list while iterating over it. Use the loop to
create a separate, new list from the old one instead.

In this case, your life would be improved by using

l = line.split()

instead of

l = line.split(' ')

and not getting the empty strings in the first place.
 
B

Bruno Desthuilliers

Sion Arrowsmith a écrit :
So will

mylist = line.split()
Yeps, it's at least the second time someone reminds me that the call to
str.strip is just useless here... Pity my poor old neuron :(
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top