sort an array of strings

M

Michael Wojcik

My apologies to Eric; I have snipped completely everything.
Now Rainmaker. Let's see.. 2 terabytes is 2.0e12 bytes, right? 100 bytes
is 1.0e2 I think. Dividing file size by line length give 2.0e10 I think.
That's 20 giga-lines, right?

We're trying to get a grip on what you have and what you are trying to
achieve. Your data set seems over large. There aren't 20 giga-lines in
all the books in the Library of Congress.

This problem description looks a great deal like a "rainbow table" -
an offline dictionary of the hash values for various strings, used
for cracking passwords. The OP's use of "Rainmaker" as a nickname
also suggests that.

There's a bunch of literature on constructing and using rainbow
tables, and I'd suggest that someone at the "what sort should I use?"
stage is not going to beat the published approaches. In other words,
some research seems to be the appropriate next step, and I don't mean
asking OT questions on comp.lang.c.
 
P

pete

Keith said:
qsort() should be reasonably efficient. You might get better
performance by rolling your own sorting function (perhaps based on a
qsort() implementation -- which isn't necessarily Quicksort), which
could avoid the overhead caused by the indirect calls to the
comparison function.

Obviously, sorting pointers rather than shuffling the strings around
is going to save you a lot of time. (I'm not even sure how you'd go
about shuffling the strings themselves if they've of variable
lengths.)

Strings of variable lengths
can be contained in objects of same sufficient size.

char string[][sizeof "three"] = {"one","two","three"};
 
P

pete

1. Ordinary quicksort on *average* performs fewer compares and data
moves than either heap-sort or merge-sort.

Fewer data moves, yes.
But the sift up, sift down version of heapsort
and a simple mergesort,
will each generally uses less comparisons than a quicksort.
Even on small arrays

Followup To: comp.programming
 
P

pete

pete said:
Again.

I have to take that back.
I do in fact have a Lomuto style quicksort
which is lighter on comparisons than my best heapsort
for small arrays.

I'll get back to you on my mergesort measurements.

My mergesort claim didn't pan out either.

Above about 500 elements,
is where my siftdown siftup heapsort
does less comparisons than the Lomuto on disordered arrays.
 
M

Mabden

Rainmaker said:
Can anyone tell me an efficient algorithm to sort an array of strings?
Keep in mind that this array is HUGE and so the algorithm should me
efficient enough to deal with it.

If I tell you, can I get the "A"?
 
M

Mabden

Walter Roberson said:
Taking into account the other information you posted, about the
amount of data and about the size of each string, and the lack of
information about what the strings signify (information that might
potentially allow for a better algorithm based upon the internal
structure of the strings, or based upon string probabilities ):

I wonder whether you really need to -sort- the strings, or if you
only need to be able to -locate- a string efficiently?

Often, the most efficient way for locating particular strings is -not-
through sorting them. Instead, the most efficient way might be
through some kind of hash table. The efficiency of the hash table
would depend upon matters such as whether you need to be able to
update the table without rebuilding it; upon the distribution
of the strings; and upon restrictions you may need to impose about
paging data in from disk.

Jesus, Dude, it's just homework. Calm down.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,172
Messages
2,570,933
Members
47,472
Latest member
blackwatermelon

Latest Threads

Top