Rankordering for nonparametric statistics (Newbie)

B

Ben Fairbank

I have a matrix with many rows(say 1000 to make this concrete) and a
dozen or so columns. One column has numbers ranging from 1 to several
hundred. I have to create a new column with numbers from 1 to 1000
corresponding to the smallest to largest numbers (don't worry about
ties (yet)) in the column of interest. The new number thus indicate
the ordinal or rank order of the values in the given column. I have
been futzing around with argsort, but cannot find an elegant fast way
to do it. Can a reader suggest?

Thank you,

BAFairbank
 
J

Jegenye 2001 Bt

What I could make out of your (not too good) description of the problem is
this:

import copy
#M = [[a1_1,a2_1,....a1000_1],[a1_2,a2_2,...], ...]
# test data
M = [[3,56,444,44,45],[34,32,2,3,4]]

res = []
for col in M:
nc = copy.copy(col)
nc.sort()
res.append([col.index(e)+1 for e in nc])
print res

Could not be that elegant but I think it'll do for 1000 numbers.

Best,
Miklós
 
S

Stephen Horne

I have a matrix with many rows(say 1000 to make this concrete) and a
dozen or so columns. One column has numbers ranging from 1 to several
hundred. I have to create a new column with numbers from 1 to 1000
corresponding to the smallest to largest numbers (don't worry about
ties (yet)) in the column of interest. The new number thus indicate
the ordinal or rank order of the values in the given column. I have
been futzing around with argsort, but cannot find an elegant fast way
to do it. Can a reader suggest?

Thank you,

BAFairbank

I just happen to be looking into stats for psychological studies at
the moment, so I know exactly what you mean. My approach would be...

1. Build a list for the particular column, containing a tuple of
value and subscript (position in the column).

That subscript is there so you can link the results back to the
original column.

2. Sort, giving a list ordered by value.

3. Extend each tuple in the list to add the subscript in the sorted
version, rearranging the tuple so that the subscript from (1) is
now the first item.

4. This would be a good point to handle the ties.

5. Sort again, putting the result back into the same order as the
original column.

So (dropping step 4) this would be...

la = []

for subs, val in enumerate (<column as list> :
la += [(val, subs)]

l1.sort ()

lb = []
for rank, pair in enumerate (la) :
val, subs = pair
lb += [(subs, val, rank+1)] # conventionally, ranks start at '1'

lb.sort ()

# lb is now a list of tuples (subscript, value, rank) in the same
# order as the original column


To handle ties would need a little extra processing in that second
loop, giving...

la = []

for subs, val in enumerate (<column as list> :
la += [(val, subs)]

l1.sort ()

lb = [] # to hold final result
lc = [] # to hold val, subs tuples for current tied group
ranksum = 0 # sum of ranks for current tied group, for averaging

for rank, pair in enumerate (la) :
if len(lc) == 0 : # this should only happen on the first iteration
lc = [pair]
ranksum = (rank + 1)
else :
if lc [0] [0] == pair [0] : # same value so another tie
lc += [pair]
ranksum += (rank + 1)
else :
rankmean = ranksum / len (lc) # note - this is deliberately
# not the integer //

# Transfer tied group to result
for val, subs in lc :
lb += [(subs, val, rankmean)]

# Start new possibly tied group
lc = [pair]
ranksum = rank + 1

# Handle final tied group, if any
# (there always will be unless the original column was empty)
if len(lc) > 0 :
rankmean = ranksum / len (lc)

for val, subs in lc :
lb += [(subs, val, rankmean)]

# back to original column ordering
lb.sort ()


I imagine there are far better examples around, but this should be a
reasonable illustration of the principles involved.

One issue is probably the "if lc [0] [0] == pair [0] :" line. If your
values are floats, this '==' is probably inappropriate - it is
oversensitive to float precision issues. 3.9999999999999... should
probably be treated as equal to 4.0, for instance.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,166
Messages
2,570,907
Members
47,448
Latest member
DeanaQ4445

Latest Threads

Top