UNIX-style sort in Python?

K

Kotlin Sam

For a while at least I have to work in Windows rather than UNIX, which
is more familiar. I'm trying to do with Python some of the things that
I've done for years in shell, in particular, sort. The shell sort is
pretty easy to use:
% sort -t, +2 +5 imputfilename <return>

where -t is the field separator, in this case a comma, , and +2 and
+4 are the fields to be sorted, in that order. Actually, the fields are
zero-based, so the first and third fields would be the sorted.

So, is there a module or function already available that does this?

Lance
 
A

Andrew Dalke

Kotlin said:
% sort -t, +2 +5 imputfilename <return>
So, is there a module or function already available that does this?

In newer Pythons (CVS and beta-1 for 2.4) you can do

def get_fields(line):
fields = line.split("\t")
return fields[1], fields[4]

sorted_lines = sorted(open("imputfilename"), key=get_fields)

For older Pythons you'll need to do the "decorate-sort-undecorate"
("DSU") yourself, like this

lines = [get_fields(line), line for line in open("imputfilename")]
lines.sort()
sorted_lines = [x[1] for x in lines]

There is a slight difference between these two. If fields[1]
and fields[4] are the same between two lines in the comparison
then the first of these sorts by position of each line (it's
a "stable sort") while the latter sorts by the content of the
line.

Andrew
(e-mail address removed)
 
G

Grant Edwards

For a while at least I have to work in Windows rather than UNIX, which
is more familiar. I'm trying to do with Python some of the things that
I've done for years in shell, in particular, sort. The shell sort is
pretty easy to use:

Sounds like you need to install Cygwin so you have a real bash
shell and all of the normal shell utilities.
 
A

Alex Martelli

Andrew Dalke said:
Kotlin said:
% sort -t, +2 +5 imputfilename <return>
So, is there a module or function already available that does this?

In newer Pythons (CVS and beta-1 for 2.4) you can do

def get_fields(line):
fields = line.split("\t")
return fields[1], fields[4]

sorted_lines = sorted(open("imputfilename"), key=get_fields)

Quite right -- and, of course, if Katlin needs get_fields to depend on
the sys.argv parameters that's easy to arrange.

For older Pythons you'll need to do the "decorate-sort-undecorate"
("DSU") yourself, like this

lines = [get_fields(line), line for line in open("imputfilename")]

Wrong syntax -- needs to be:

lines = [(get_fields(line), line) for line in open("imputfilename")]
lines.sort()
sorted_lines = [x[1] for x in lines]

There is a slight difference between these two. If fields[1]
and fields[4] are the same between two lines in the comparison
then the first of these sorts by position of each line (it's
a "stable sort") while the latter sorts by the content of the
line.

....and to get exactly the same stable-sort semantics in 2.3, just change
the first one of the three statements to:

lines = [ (get_fields(line), i, line)
for i, line in enumerate(open("imputfilename")) ]


Alex
 
A

Alex Martelli

Grant Edwards said:
Sounds like you need to install Cygwin so you have a real bash
shell and all of the normal shell utilities.

An excellent piece of advice. Cygwin has occasionally save my sanity in
the past when the weakness of Windows' cmd.exe was getting to me...!-)


Alex
 
T

Tuure Laurinolli

Kotlin said:
For a while at least I have to work in Windows rather than UNIX, which
is more familiar. I'm trying to do with Python some of the things that
I've done for years in shell, in particular, sort. The shell sort is
pretty easy to use:

Why don't you just install the UNIX utils on windows? There are native
ports of most of them at http://unxutils.sourceforge.net/
 
A

Andrew Dalke

Alex said:
Wrong syntax -- needs to be:

lines = [(get_fields(line), line) for line in open("imputfilename")]

Bah! I all too often forget that () on the LHS of the list
comprehension. :(

Andrew
(e-mail address removed)
 
M

Michael Hoffman

Andrew said:
Alex said:
lines = [(get_fields(line), line) for line in open("imputfilename")]

Bah! I all too often forget that () on the LHS of the list
comprehension. :(

Me too. Could the grammar conceivably be changed so that it works
without the parantheses there?
 
A

Andrew Dalke

Michael said:
Me too. Could the grammar conceivably be changed so that it works
without the parantheses there?

Unlikely. As I recall Python deliberately uses only a
lookahead-1 to resolve ambiguities.


Or see PEP 202

] BDFL Pronouncements
]
] - The form [x, y for ...] is disallowed; one is required to write
] [(x, y) for ...].


It could be made an arbitrary lookahead in theory, but
as I recall Guido has also said doesn't want that because
it makes human parsing more complex as well.

Can't find a ready citation for that though.

Andrew
(e-mail address removed)
 
B

Bengt Richter

An excellent piece of advice. Cygwin has occasionally save my sanity in
the past when the weakness of Windows' cmd.exe was getting to me...!-)
Most of my cmd.exe use is to invoke xxx ..args where xxx.cmd in a path directory
is one line like @python c:\util\xxx.cmd %* (I don't like the kludgy windows
first-line trick that requires xxx.py itself to be named xxx.cmd)
;-)

But, have you tried msys/mingw ? I haven't done a lot with it, but it is nice,
and supports most of the basic utilities including compiler/linker, though
I prefer gvim directly over the vim via msys shell (I probably don't have
the latter configured quite right).

A sampling:

[13:59] ~>ls /
bin doc etc home local m.ico mdk mingw msys.bat msys.ico uninstall
[14:00] ~>ls /bin
awk diff.exe ftp libW11.dll mv.exe sed.exe tr.exe
basename.exe diff3.exe gawk.exe ln.exe od.exe sh.exe true.exe
bunzip2 dirname.exe grep.exe lnkcnv patch.exe sleep.exe uname.exe
bzip2.exe echo gunzip ls.exe printf sort.exe uniq.exe
cat.exe egrep gzip.exe m4.exe ps.exe split.exe vi
chmod.exe env.exe head.exe make.exe pwd start view
cmd ex id.exe makeinfo.exe rm.exe tail.exe vim.exe
cmp.exe expr.exe info.exe md5sum.exe rmdir.exe tar.exe wc.exe
comm.exe false.exe infokey.exe mkdir.exe rvi tee.exe which
cp.exe fgrep install-info.exe mount.exe rview texi2dvi xargs.exe
cut.exe find.exe install.exe msys-1.0.dll rvim texindex.exe
date.exe fold.exe less.exe msysinfo rxvt.exe touch.exe
[14:00] ~>which gcc
/mingw/bin/gcc
[14:00] ~>ls /mingw/bin
a2dll dlltool.exe g77.exe mingw32-c++.exe objdump.exe res2coff.exe
addr2line.exe dllwrap.exe gcc.exe mingw32-g++.exe pexports.exe size.exe
ar.exe dos2unix.exe gccbug mingw32-gcc.exe protoize.exe strings.exe
as.exe drmingw.exe gcov.exe mingw32-make.exe ranlib.exe strip.exe
c++.exe dsw2mak gdb.exe mingwm10.dll readelf.exe unix2dos.exe
c++filt.exe exchndl.dll gprof.exe nm.exe redir.exe unprotoize.exe
cpp.exe g++.exe ld.exe objcopy.exe reimp.exe windres.exe
[14:00] ~>

Regards,
Bengt Richter
 
A

Alex Martelli

Bengt Richter said:
But, have you tried msys/mingw ? I haven't done a lot with it, but it is nice,
and supports most of the basic utilities including compiler/linker, though
I prefer gvim directly over the vim via msys shell (I probably don't have
the latter configured quite right).

I've used mingw in the past, never tried msys. As for editors, GVIM
does work just fine on Windows, that's the least of the problems...


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top