Fixed length lists from .split()?

B

Bob Greschke

I'm reading a file that has lines like

bcsn; 1000000; 1223
bcsn; 1000001; 1456
bcsn; 1000003
bcsn; 1000010; 4567

The problem is the line with only the one semi-colon.
Is there a fancy way to get Parts=Line.split(";") to make Parts always
have three items in it, or do I just have to check the length of Parts
and loop to add the required missing items (this one would just take
Parts+=[""], but there are other types of lines in the file that have
about 10 "fields" that also have this problem)?

Thanks!

Bob
 
D

Duncan Booth

Bob Greschke said:
Is there a fancy way to get Parts=Line.split(";") to make Parts always
have three items in it, or do I just have to check the length of Parts
and loop to add the required missing items (this one would just take
Parts+=[""], but there are other types of lines in the file that have
about 10 "fields" that also have this problem)?
return (s.split(sep) + [""]*n)[:n]
nsplit("bcsn; 1000001; 1456", ";", 3) ['bcsn', ' 1000001', ' 1456']
nsplit("bcsn; 1000001", ";", 3) ['bcsn', ' 1000001', '']
 
B

Bob Greschke

Bob Greschke said:
Is there a fancy way to get Parts=Line.split(";") to make Parts always
have three items in it, or do I just have to check the length of Parts
and loop to add the required missing items (this one would just take
Parts+=[""], but there are other types of lines in the file that have
about 10 "fields" that also have this problem)?
def nsplit(s, sep, n):
return (s.split(sep) + [""]*n)[:n]
nsplit("bcsn; 1000001; 1456", ";", 3) ['bcsn', ' 1000001', ' 1456']
nsplit("bcsn; 1000001", ";", 3)
['bcsn', ' 1000001', '']

That's fancy enough. :) I didn't know you could do [""]*n. I never
thought about it before.

Thanks!

Bob
 
D

Dennis Lee Bieber

That's fancy enough. :) I didn't know you could do [""]*n. I never
thought about it before.
My first thought was getting it from the other side...
.... return (st + (sp*n)).split(sp)[:n]
.... ['this', 'is', 'a', 'sample', '', '', '', '', '', '']

To the string to be split, append enough separators to ensure the
desired number of fields, perform the split, and return the desired
number of resultant parts.

Of course, if the string is longer than "n", it will only return the
leftmost "n" parts.
nsplit("this;is;a;sample", ";", 4) ['this', 'is', 'a', 'sample']
nsplit("this;is;a;sample", ";", 3) ['this', 'is', 'a']
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
B

bearophileHUGS

Duncan Booth:
def nsplit(s, sep, n):
return (s.split(sep) + [""]*n)[:n]

Another version, longer:

from itertools import repeat

def nsplit(text, sep, n):
"""
>>> nsplit("bcsn; 1000001; 1456", ";", 3) ['bcsn', ' 1000001', ' 1456']
>>> nsplit("bcsn; 1000001", ";", 3) ['bcsn', ' 1000001', '']
>>> nsplit("bcsn", ";", 3) ['bcsn', '', '']
>>> nsplit("", ".", 4) ['', '', '', '']
>>> nsplit("ab.ac.ad.ae", ".", 2)
['ab', 'ac', 'ad', 'ae']
"""
result = text.split(sep)
nparts = len(result)
result.extend(repeat("", n-nparts))
return result

if __name__ == "__main__":
import doctest
doctest.testmod()

Bye,
bearophile
 
S

Steven Bethard

I'm reading a file that has lines like

bcsn; 1000000; 1223
bcsn; 1000001; 1456
bcsn; 1000003
bcsn; 1000010; 4567

The problem is the line with only the one semi-colon.
Is there a fancy way to get Parts=Line.split(";") to make Parts always
have three items in it

In Python 2.5 you can use the .partition() method which always returns
a three item tuple:
.... bcsn; 1000000; 1223
.... bcsn; 1000001; 1456
.... bcsn; 1000003
.... bcsn; 1000010; 4567
.... '''.... bcsn, _, rest = line.partition(';')
.... num1, _, num2 = rest.partition(';')
.... print (bcsn, num1, num2)
....
(' bcsn', ' 1000000', ' 1223')
(' bcsn', ' 1000001', ' 1456')
(' bcsn', ' 1000003', '')
(' bcsn', ' 1000010', ' 4567')Help on method_descriptor:

partition(...)
S.partition(sep) -> (head, sep, tail)

Searches for the separator sep in S, and returns the part before
it,
the separator itself, and the part after it. If the separator is
not
found, returns S and two empty strings.


STeVe
 
B

Bob Greschke

This idiom is what I ended up using (a lot it turns out!):

Parts = Line.split(";")
Parts += (x-len(Parts))*[""]

where x knows how long the line should be. If the line already has
more parts than x (i.e. [""] gets multiplied by a negative number)
nothing seems to happen which is just fine in this program's case.

Bob
 
G

George Sakkis

This idiom is what I ended up using (a lot it turns out!):

Parts = Line.split(";")
Parts += (x-len(Parts))*[""]

where x knows how long the line should be. If the line already has
more parts than x (i.e. [""] gets multiplied by a negative number)
nothing seems to happen which is just fine in this program's case.

Bob

Here's a more generic padding one liner:

from itertools import chain,repeat

def ipad(seq, minlen, fill=None):
return chain(seq, repeat(fill, minlen-len(seq)))
['one', 'two', 'three', 'four', '', '', '']
(1, 2, 3, 4, None, None, None)


George
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top