pythonic way to sort

M

micklee74

hi
I have a file with columns delimited by '~' like this:

1SOME STRING ~ABC~12311232432D~20060401~00000000
2SOME STRING ~DEF~13534534543C~20060401~00000000
3SOME STRING ~ACD~14353453554G~20060401~00000000

......

What is the pythonic way to sort this type of structured text file?
Say i want to sort by 2nd column , ie ABC, ACD,DEF ? so that it becomes

1SOME STRING ~ABC~12311232432D~20060401~00000000
3SOME STRING ~ACD~14353453554G~20060401~00000000
2SOME STRING ~DEF~13534534543C~20060401~00000000
?
I know for a start, that i have to split on '~', then append all the
second columns into a list, then sort the list using sort(), but i am
stuck with how to get the rest of the corresponding columns after the
sort....

thanks...
 
R

Robert Kern

hi
I have a file with columns delimited by '~' like this:

1SOME STRING ~ABC~12311232432D~20060401~00000000
2SOME STRING ~DEF~13534534543C~20060401~00000000
3SOME STRING ~ACD~14353453554G~20060401~00000000

.....

What is the pythonic way to sort this type of structured text file?
Say i want to sort by 2nd column , ie ABC, ACD,DEF ? so that it becomes

1SOME STRING ~ABC~12311232432D~20060401~00000000
3SOME STRING ~ACD~14353453554G~20060401~00000000
2SOME STRING ~DEF~13534534543C~20060401~00000000
?
I know for a start, that i have to split on '~', then append all the
second columns into a list, then sort the list using sort(), but i am
stuck with how to get the rest of the corresponding columns after the
sort....

In Python 2.4 and up, you can use the key= keyword to list.sort(). E.g.

In [2]: text = """1SOME STRING ~ABC~12311232432D~20060401~00000000
...: 2SOME STRING ~DEF~13534534543C~20060401~00000000
...: 3SOME STRING ~ACD~14353453554G~20060401~00000000"""

In [3]: lines = text.split('\n')

In [4]: lines
Out[4]:
['1SOME STRING ~ABC~12311232432D~20060401~00000000',
'2SOME STRING ~DEF~13534534543C~20060401~00000000',
'3SOME STRING ~ACD~14353453554G~20060401~00000000']

In [5]: lines.sort(key=lambda x: x.split('~')[1])

In [6]: lines
Out[6]:
['1SOME STRING ~ABC~12311232432D~20060401~00000000',
'3SOME STRING ~ACD~14353453554G~20060401~00000000',
'2SOME STRING ~DEF~13534534543C~20060401~00000000']

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
J

Jay Parlar

hi
I have a file with columns delimited by '~' like this:

1SOME STRING ~ABC~12311232432D~20060401~00000000
2SOME STRING ~DEF~13534534543C~20060401~00000000
3SOME STRING ~ACD~14353453554G~20060401~00000000

.....

What is the pythonic way to sort this type of structured text file?
Say i want to sort by 2nd column , ie ABC, ACD,DEF ? so that it becomes

1SOME STRING ~ABC~12311232432D~20060401~00000000
3SOME STRING ~ACD~14353453554G~20060401~00000000
2SOME STRING ~DEF~13534534543C~20060401~00000000
?
I know for a start, that i have to split on '~', then append all the
second columns into a list, then sort the list using sort(), but i am
stuck with how to get the rest of the corresponding columns after the
sort....

thanks...

A couple ways. Assume that you have the lines in a list called 'lines',
as follows:

lines = [
"1SOME STRING ~ABC~12311232432D~20060401~00000000",
"3SOME STRING ~ACD~14353453554G~20060401~00000000",
"2SOME STRING ~DEF~13534534543C~20060401~00000000"]


The more traditional way would be to define your own comparison
function:

def my_cmp(x,y):
return cmp( x.split("~")[1], y.split("~")[1])

lines.sort(cmp=my_cmp)


The newer, faster way, would be to define your own key function:

def my_key(x):
return x.split("~")[1]

lines.sort(key=my_key)


The key function is faster because you only have to do the
split("~")[1] once for each line, whereas it will be done many times
for each line if you use a comparison function.

Jay P.
 
B

Boris Borcic

Jay said:
On May 4, 2006, at 12:12 AM, (e-mail address removed) wrote:
[...]
Assume that you have the lines in a list called 'lines',
as follows:

lines = [
"1SOME STRING ~ABC~12311232432D~20060401~00000000",
"3SOME STRING ~ACD~14353453554G~20060401~00000000",
"2SOME STRING ~DEF~13534534543C~20060401~00000000"]


The more traditional way would be to define your own comparison function:

def my_cmp(x,y):
return cmp( x.split("~")[1], y.split("~")[1])

lines.sort(cmp=my_cmp)


The newer, faster way, would be to define your own key function:

def my_key(x):
return x.split("~")[1]

lines.sort(key=my_key)

and if the data is in a file rather than a list, you may write eg

lines = sorted(file("/path/tofile"),key=mike)

to create it sorted.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top