suitable key for a hash

ccc31807 · Oct 12, 2010

I have a data file to process that consists of about 25K rows and
about 30 columns. This file contains no column with unique values,
that is, every column contains duplicate values. I am placing the data
in a hash to process it (so I can access the data values by name
rather than position), and the only 'key' I can come up with is the $.
variable for the input line numbers.

Surely someone must have dealt with this problem before. Is there a
better solution?

The processing requires dumping the data into discrete categories,
e.g., level, state, person's name, status, for the purpose of
generating reports, e.g., by level, by state, by name, by status, and
not having a unique key isn't an issue.

CC.

RedGrittyBrick · Oct 12, 2010

I have a data file to process that consists of about 25K rows and
about 30 columns. This file contains no column with unique values,
that is, every column contains duplicate values. I am placing the data
in a hash to process it (so I can access the data values by name
rather than position), and the only 'key' I can come up with is the $.
variable for the input line numbers.

Surely someone must have dealt with this problem before. Is there a
better solution?

A better solution than
... $name{$index} ...
must surely be
... $name[$index] ...

I don't see any point using hashes if the key value is an integer in the
range 1..25000 with no gaps.

The processing requires dumping the data into discrete categories,
e.g., level, state, person's name, status, for the purpose of
generating reports, e.g., by level, by state, by name, by status, and
not having a unique key isn't an issue.

An SSCCE would help.

Jim Gibson · Oct 12, 2010

ccc31807 said:
I have a data file to process that consists of about 25K rows and
about 30 columns. This file contains no column with unique values,
that is, every column contains duplicate values. I am placing the data
in a hash to process it (so I can access the data values by name
rather than position), and the only 'key' I can come up with is the $.
variable for the input line numbers.

Surely someone must have dealt with this problem before. Is there a
better solution?

If you have records with duplicate keys and you want to store the data
in a hash for rapid lookup, use array references as hash values
(untested):

while(<>) {
my( $name, @rest ) = split;
push( @{$data{$name}}, \@rest );
}

The processing requires dumping the data into discrete categories,
e.g., level, state, person's name, status, for the purpose of
generating reports, e.g., by level, by state, by name, by status, and
not having a unique key isn't an issue.

Store the data in an array and create indices for key fields (untested);

while(<>) {
my @fields = split;
push( @data, \@fields );
push( @{$field1_index{$field[0]}}, $#data );
push( @{$field2_index{$field[1]}}, $#data );
...
}

Xho Jingleheimerschmidt · Oct 13, 2010

ccc31807 said:
I have a data file to process that consists of about 25K rows and
about 30 columns. This file contains no column with unique values,
that is, every column contains duplicate values.

Jointly, or just severly?

I am placing the data
in a hash to process it (so I can access the data values by name
rather than position),

If you wish to access it by name, then you must know what the name is.

and the only 'key' I can come up with is the $.
variable for the input line numbers.

Why not just an array, in that case?

Surely someone must have dealt with this problem before. Is there a
better solution?

The processing requires dumping the data into discrete categories,
e.g., level, state, person's name, status, for the purpose of
generating reports, e.g., by level, by state, by name, by status, and
not having a unique key isn't an issue.

Ok, so just stick it directly into those structures.

Xho

Justin C · Oct 13, 2010

I have a data file to process that consists of about 25K rows and
about 30 columns. This file contains no column with unique values,
that is, every column contains duplicate values. I am placing the data
in a hash to process it (so I can access the data values by name
rather than position), and the only 'key' I can come up with is the $.
variable for the input line numbers.

Surely someone must have dealt with this problem before. Is there a
better solution?

The processing requires dumping the data into discrete categories,
e.g., level, state, person's name, status, for the purpose of
generating reports, e.g., by level, by state, by name, by status, and
not having a unique key isn't an issue.

Instead of sticking it into a hash so that you can go over all of it
again, why not process (or part process) it into the relevant discrete
categories as part of the import?

Justin.

ccc31807 · Oct 13, 2010

Thanks for your reply, and for all the others.

I decided to continue to use $. as the hash key. As it turns out, the
key isn't relevant to my application, as I'm not using the key to look
up the hash values. I'm just iterating through the hash, collecting
certain values, so the key is totally superfluous -- the only reason I
need a key is because of the nature of the hash.

I don't want to use an array because I'm creating a number of
different reports, and it's simply a lot easier to use values like:

$data{$key}{firstname}, $data{$key}{lastname}

than it is to use values like

$data[13456][2], $data[23543][3]

An SSCCE would help.

I'm sorry, but I don't know this. What is an SSCCE?

CC

Dr.Ruud · Oct 13, 2010

I decided to continue to use $. as the hash key.

If it smells like an array index ...

As it turns out, the
key isn't relevant to my application, as I'm not using the key to look
up the hash values. I'm just iterating through the hash, collecting
certain values, so the key is totally superfluous -- the only reason I
need a key is because of the nature of the hash.

I don't want to use an array because I'm creating a number of
different reports, and it's simply a lot easier to use values like:

$data{$key}{firstname}, $data{$key}{lastname}

than it is to use values like

$data[13456][2], $data[23543][3]

That is not the proper comparison.

$data[ $row ]{ firstname }

$data[ $row ][ FIRSTNAME ]

(assumes a numeric constant FIRSTNAME)

What is an SSCCE?

JFGI

Jürgen Exner · Oct 13, 2010

ccc31807 said:
I don't want to use an array because I'm creating a number of
different reports, and it's simply a lot easier to use values like:

$data{$key}{firstname}, $data{$key}{lastname}

than it is to use values like

$data[13456][2], $data[23543][3]

And why not use values like

$data[$key]{firstname}, $data[$key]{lastname}

jue

ccc31807 · Oct 13, 2010

And why not use values like

$data[$key]{firstname}, $data[$key]{lastname}

Because I wasn't completely truthful about my processing. I have to
break the data apart on various values, some if which are unique keys,
e.g., identification numbers for individual people. The data includes
clients and counselors, and (obviously) clients can have multiple
counselors and counselors can have multiple clients. Other values are
one of a kind, such as a person's address, regardless of the number of
times the particular person appears in the data. I have to cross
reference these values by unique keys, and I use five hashes to sort
out the data.

I see now that I could use an array for the handful of data elements
for each row that are unique.

Thanks, CC.

matching array element to hash key	2	Sep 17, 2009
dynamically creating a hash from an array	16	Mar 21, 2014
FAQ 4.71 How can I check if a key exists in a multilevel hash?	0	Jan 27, 2011
Seeking co-founders for my company.	3	Sep 8, 2024
FAQ 4.60 How do I sort a hash (optionally by value instead of key)?	0	Mar 14, 2011
Tied hash: Differentiating between assignment of single value andentire hash	0	Apr 24, 2012
Get an arbitrary hash key, quickly.	43	Jan 24, 2008
passing a reference to a hash to another page	2	May 4, 2012

suitable key for a hash

ccc31807

RedGrittyBrick

Jim Gibson

Xho Jingleheimerschmidt

Justin C

ccc31807

Dr.Ruud

Jürgen Exner

ccc31807

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads