Need help sorting fixed length records

L

lbeckm3

I have a rather large file >350 meg that I need to sort on positions
88-89 (State). I've looked at Unix sort command and File::sort in perl
but haven't been able to figure out how to do this. All of those sorts
are based on fields and field delimiters which I don't have as the
fields are defined by positions, not spaces or :'s etc.!! Is there a
way to do this? Thanks in advance!!

-Lyle
 
A

A. Sinan Unur

(e-mail address removed) wrote in @c13g2000cwb.googlegroups.com:
I have a rather large file >350 meg that I need to sort on positions
88-89 (State). I've looked at Unix sort command and File::sort in perl
but haven't been able to figure out how to do this. All of those sorts
are based on fields and field delimiters which I don't have as the
fields are defined by positions, not spaces or :'s etc.!! Is there a
way to do this?

There are many ways. All would involve the use of the substr function:

perldoc -f substr

Make an attempt to solve your problem and post code if you run into
difficulties.

If this is not a one-off job, I would be inclined to write something that
converts the data file to a SQLite database (http://www.sqlite.org/) and
issue queries against that.

Sinan
 
L

lbeckm3

I think I found a command that appears to do what I need;

sort -o sortfile +0.87 orig

of course that takes it from 88 - end of line, but I think that will
work for my purposes. I do know how to use the substr function but I
don't know if the system would handle loading the entire file to memory
at one time (thinking I'd load the file into an array and sort from
there). This job is one of those year-end deals, so not often, but...
Thanks for the swift reply!

-Lyle
 
U

Uri Guttman

l> I have a rather large file >350 meg that I need to sort on positions
l> 88-89 (State). I've looked at Unix sort command and File::sort in perl
l> but haven't been able to figure out how to do this. All of those sorts
l> are based on fields and field delimiters which I don't have as the
l> fields are defined by positions, not spaces or :'s etc.!! Is there a
l> way to do this? Thanks in advance!!

<untested>

use Sort::Maker ;

my $sorter = make_sorter( 'GRT', string => 'substr( $_, 88, 2 )' ) ;

my @sorted = $sorter->( @unsorted ) ;


also unix sort can sort by column positions. rtfm more thoroughly. it
should be faster than any perl sort.

uri
 
L

lbeckm3

Thanks Uri. The unix sort command did work, but I didn't find the
solution in any of the man pages (yes I did read them!) - found another
usenet post that got me on track. Everything there points to using
fields and delimiting the fields etc. and my records are fixed length.
Thanks for your proposed solution, I'll have a go with that and see if
it works since it may be more clean in the future to only sort on just
those two bytes.

-Lyle
 
U

Uri Guttman

l> Thanks Uri. The unix sort command did work, but I didn't find the
l> solution in any of the man pages (yes I did read them!) - found another
l> usenet post that got me on track. Everything there points to using
l> fields and delimiting the fields etc. and my records are fixed length.
l> Thanks for your proposed solution, I'll have a go with that and see if
l> it works since it may be more clean in the future to only sort on just
l> those two bytes.

then you didn't rtfm too well. find out what the field number is for the
state and then use that and optionally the character offset and
ending. the +POS -POS or -k POS,POS arguments control that. this is from
the gnu sort man page. the default field separator is between whitespace
and non-whitespace chars. figure out your fields and use the appropriate
POS args.

uri
 
J

Jürgen Exner

I have a rather large file >350 meg that I need to sort on positions
88-89 (State). I've looked at Unix sort command and File::sort in
perl but haven't been able to figure out how to do this. All of
those sorts are based on fields and field delimiters which I don't
have as the fields are defined by positions, not spaces or :'s etc.!!
Is there a way to do this? Thanks in advance!!

Rather simple.
Just define your compare function to look (and compare) only those two
characters.
You can use substr(), something along the line of (untested, there might be
a one-off error somewhere):

substr($a, 88, 2) cmp substr($b, 88, 2)

jue
 
J

Joe Smith

Thanks Uri. The unix sort command did work, but I didn't find the
solution in any of the man pages (yes I did read them!)

The Linux man page for sort does a lousy job at explaining that.
I learned it from the BSD/SunOS man pages. The old syntax is

sort +0.87 -0.89 file # Start at column 88, stop before column 90.

The -k option, which counts from 1 instead of from 0, is better.
-Joe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,164
Messages
2,570,898
Members
47,440
Latest member
YoungBorel

Latest Threads

Top