differences between hashes and arrays ?

Jack · Sep 22, 2006

Hi folks I am not an expert in perl but correct me if I am wrong -

is it true you can use a hash and work with it just as you would an
array - what are the differences between them (besides in an array you
can have a multidimensional array) ?

why arent folks using hashes instead of arrays since (I believe) they
are faster to access and take up the same or less memory than arrays..

Thanks,

Jack

Mirco Wahab · Sep 22, 2006

Thus spoke Jack (on 2006-09-22 16:55):

is it true you can use a hash and work with it just as you would an
array - what are the differences between them (besides in an array you
can have a multidimensional array) ?

There is a similar interface, but thats almost all.

==> http://en.wikipedia.org/wiki/Hash_table

In short, a 'hash table' is a complicated organism
which allows you to find an item (say: key, stored
in there) in very very short time, which is the point.

An "array" means otherwise "access by an index number",
so array elements must be searched in order to
find a particular item.

compare:

my $item = $hash{ 'ITEM_OF_INTEREST' }; # gets the item

vs.

SEARCH: foreach $element (@array) {
if($element eq 'ITEM_OF_INTEREST') {
$item = $element; # gets the item
last SEARCH;
}
}

why arent folks using hashes instead of arrays since (I believe) they
are faster to access and take up the same or less memory than arrays..

No, folks using hashes if hash tables are appropriate
and arrays if arrays are ...

Say, you have ten million 3D coordinate points
which are in some order already (an order that
means something to you). Why would you bother to
put them in a hash table, if all you need is sub-
sequent access to the 13244'st or 10'th to 1923331'th?

Regards

Mirco

Tad McClellan · Sep 22, 2006

Jack said:
is it true you can use a hash and work with it just as you would an
array -

No.

what are the differences between them

Arrays are ordered and indexed with numbers.

Hashes are UNordered and indexed with strings.

(besides in an array you
can have a multidimensional array) ?

You can also have a multidimensional hash, so that is not
one of the differences.

Paul Lalli · Sep 22, 2006

Jack said:
Hi folks I am not an expert in perl but correct me if I am wrong -

You're wrong.

is it true you can use a hash and work with it just as you would an
array
No.

- what are the differences between them

A hash is by definition unordered. There is no ordering of any kind to
the key/value pairs in a hash. An array is ordered - there is a very
well defined sequence of which item follows which item. Therefore, you
can treat an array like a stack or a queue (using the push/pop
shift/unshift set of functions), while this is not possible with a
hash. With a hash you can associate to distinct pieces of data in one
structure, and can retrieve a specific value in very efficient time.

An array is indexed by positive integers. A hash is keyed by any
string, whether its contents are numeric or not.

(besides in an array you
can have a multidimensional array) ?

Ironically, that is *not* one of the differences. You can have
multidimensional hashes as well. In fact, you can have hashes of
arrays, arrays of hashes, arrays of hashes of hashes, hashes of arrays
of hashes, or any other combination you'd like.

why arent folks using hashes instead of arrays since (I believe) they
are faster to access and take up the same or less memory than arrays..

Faster to access, yes. Same or less memory, no.

These are not apples to apples. You use a hash and an array for
completely different purposes.

If you have an example of a program or algorithm you're working on, we
could help you get rid of some of your misperceptions. . . .

Paul Lalli

xhoster · Sep 22, 2006

Jack said:
Hi folks I am not an expert in perl but correct me if I am wrong -

is it true you can use a hash and work with it just as you would an
array

That would depend on what you are doing. For some things, you could easily
use them interchangably. For most things, you can't. If youre indices are
not integers, you can't use arrays. When your indices are integers but are
not densely packed but need to accessed as though they were, you cannot
readily use hashes.

- what are the differences between them (besides in an array you
can have a multidimensional array) ?

That is not a difference between them (in Perl).

why arent folks using hashes instead of arrays since (I believe) they
are faster to access and take up the same or less memory than arrays..

Wrong on both counts. When arrays and hashes are interchangable, hashes
are much slower and take substantially more memory.

Xho

Jack · Sep 22, 2006

Michele said:
Hi folks I am not an expert in perl but correct me if I am wrong -

Click to expand...

Seeing your question and your considerations below, indeed you seem
not to be an expert in perl. So I guess you're not wrong.

is it true you can use a hash and work with it just as you would an
array - what are the differences between them (besides in an array you
can have a multidimensional array) ?

Click to expand...

You can also have multidimensional hashes. But both are not "real"
multidimentional beasts. However for the moment don't bother...

Really hashes are associative arrays, i.e. they are a finite mapping
between a set of _keys_ and a set _values_. The keys are generic
strings. An array is a mapping between a finite set of natural numbers
of the form {0,... , n} (if you don't set $[ to something other than
0, but again, don't bother!) and a set of values.

why arent folks using hashes instead of arrays since (I believe) they
are faster to access and take up the same or less memory than arrays..

Click to expand...

You will certainly know that in perl conversions between numbers and
strings happen automagically all the time. So indeed you may use a
hash to implement an array. However hashes are called like that
because their implementation is based on hash tables that provide
means to quickly search through the keys. There's no particular
general advantage when the keys are simple numbers instead.

So use an array when you need a mapping from a finite set of numbers
to some values, except possibly when the mapping is very sparse. How
much is "very", may depend on many factors, and I don't know the lower
limit exactly. Surely I would use a hash if I need to *explicitly* set
the mapping amongst say 100 numbers uniformely distribuited in
0..100_000_000 and some values. I don't have the vaguest idea how well
the hashing algorithm would hash such keys, but the thing can be
investigated...

Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
.'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,

Thanks for the comments all. For these applications, which is better
suited, array or hash, and assume a very large volume of data and
consider memory :
1- iterate through a huge list (hash or array?) of values and compare
each value in the list, 1 by 1, to a match condition test like
m/^\d.*?\d$/ (all digits), and then keep the count of the values that
pass the match test
2- compare all values in column 1, file 1 (one value at a time ; assume
10M records ) to determine if the value exists in the target list (hash
or array?) from column 2 file 2 (assume column 2 is 20M records)

Thank you,
Jack

Paul Lalli · Sep 22, 2006

Jack said:
Thanks for the comments all. For these applications, which is better
suited, array or hash, and assume a very large volume of data and
consider memory :
1- iterate through a huge list (hash or array?) of values and compare
each value in the list, 1 by 1, to a match condition test like
m/^\d.*?\d$/ (all digits), and then keep the count of the values that
pass the match test

array. You just want to iterate over all of them, in sequence. No
reason for the overhead of a hash.

my $count = grep /^\d.*?\d$/ @great_big_array;

(by the way, you do know that your pattern does not match "all digits",
right? It matches anything that starts with and ends with a digit, of
at least length two. The "inside" of the value can be literally
anything.)

2- compare all values in column 1, file 1 (one value at a time ; assume
10M records ) to determine if the value exists in the target list (hash
or array?) from column 2 file 2 (assume column 2 is 20M records)

column1, file1 - array. You want to iterate over all of them, one at a
time, in sequence.

column2, file2 - hash. You want to access a specific element of your
set, without knowing or caring where in the set that element falls.

for my $value (@column1){
if (exists $column2{$value}) {
print "$value exists in column2 file2\n";
} else {
print "$vlaue does not exist in column2 file2\n";
}
}

Paul Lalli

Brian McCauley · Sep 22, 2006

Wrong on both counts. When arrays and hashes are interchangable, hashes
are much slower and take substantially more memory.

Except in the rare and special case of a very sparsely populated array.

Henry Law · Sep 22, 2006

Paul said:
You're wrong.

No.

Of course all the material about the difference in the ways that hashes
and arrays is perfectly correct, but there is one sense in which the OP
is correct - or at least has some method in his madness. Look:

# -------------------------------------------
#! /usr/bin/perl

use strict; use warnings;

use Data:

umper;

my @array = ("one",111,"two",22);
# Or we could even have coded my %hash = ("one",111,"two",22);
my %hash = @array;

print Dumper(\%hash);
# --------------------------------------------
F:\>tryit.pl
$VAR1 = {
'one' => 111,
'two' => 22
};

Paul Lalli · Sep 22, 2006

Henry said:
Of course all the material about the difference in the ways that hashes
and arrays is perfectly correct, but there is one sense in which the OP
is correct - or at least has some method in his madness. Look:

# -------------------------------------------
#! /usr/bin/perl

use strict; use warnings;

use Data:umper;

my @array = ("one",111,"two",22);
# Or we could even have coded my %hash = ("one",111,"two",22);
my %hash = @array;

None of this shows "using a hash and array in the same way". This is
populating an array with a list of values, and then populating a hash
with the list of values that are currently contained in an array.
%hash and @array have nothing to do with each other before or after
this statement.

Make sure you read and understand
perldoc -q difference
"What is the difference between a list and an array"

Paul Lalli

Tad McClellan · Sep 22, 2006

Henry Law said:
Of course all the material about the difference in the ways that hashes
and arrays is perfectly correct, but there is one sense in which the OP
is correct - or at least has some method in his madness. Look:

# -------------------------------------------
#! /usr/bin/perl

use strict; use warnings;

use Data:umper;

my @array = ("one",111,"two",22);

Change that line to:

my @array = ("two",22,"one",111);

and run the program again.

# Or we could even have coded my %hash = ("one",111,"two",22);
my %hash = @array;

print Dumper(\%hash);
# --------------------------------------------
F:\>tryit.pl
$VAR1 = {
'one' => 111,
'two' => 22
};

You get the _same_ output!

Uri Guttman · Sep 22, 2006

PL> Faster to access, yes. Same or less memory, no.

hmm, in general arrays have faster lookups than hashes. me thinks you
confused the OP's statement. and arrays definitely use less storage
(even excluding the key itself) for each element.

PL> These are not apples to apples. You use a hash and an array for
PL> completely different purposes.

they are apples and bananas.

PL> If you have an example of a program or algorithm you're working on, we
PL> could help you get rid of some of your misperceptions. . . .

or he should learn more perl before asking such deep questions.

uri

Paul Lalli · Sep 22, 2006

Uri said:
PL> Faster to access, yes. Same or less memory, no.

hmm, in general arrays have faster lookups than hashes. me thinks you
confused the OP's statement.

Or I confused my answer.

What I meant was that:

my $there = $hash{$key} ? 1 : 0;
is faster than:
my $there = 0;
for my $elem (@array) {
if ($elem eq $value) {
$there = 1;
next;
}
}

Paul Lalli

Jürgen Exner · Sep 23, 2006

Jack said:
is it true you can use a hash and work with it just as you would an
array - what are the differences between them (besides in an array you
can have a multidimensional array) ?

Both are mappings. Arrays map from a contiguous list of natural numbers,
starting with 0 (assuming you didn't fool around with $[) into scalars.
Hashes are a generalization and map arbitrary strings to scalars.

why arent folks using hashes instead of arrays since (I believe) they
are faster to access and take up the same or less memory than arrays..

Very often you need organize your data in a sequence. This sequencing is
naturally achieved by the natural order of number when they are used as
domain of your mapping. For hashes it is possible to sort the keys of a
hash and then access the values in sequence, but this operation is much more
expensive.

jue

Jürgen Exner · Sep 23, 2006

Mirco said:
An "array" means otherwise "access by an index number",
so array elements must be searched in order to
find a particular item.

Not really. To access the 23rd element of an array in a typical
implementation you would simply access
StartOfArray + 23 * SizeOfArrayElement
which is just adding a constant that is independant of the size of the
array. No need to search (which would be O(n log n)).

jue

Uri Guttman · Sep 23, 2006

PL> Or I confused my answer.

What I meant was that:

PL> my $there = $hash{$key} ? 1 : 0;
PL> is faster than:
PL> my $there = 0;
PL> for my $elem (@array) {
PL> if ($elem eq $value) {
PL> $there = 1;
PL> next;
PL> }
PL> }

well then, why didn't you say that!! string key lookup is a very diff
beast than element lookup.

uri

Uri Guttman · Sep 23, 2006

CM> That's true in most languages because in most languages arrays
CM> are required to have all elements be the same type, making
CM> looking up an element a simple matter of pointer arithmetic
CM> (you see this in its most naked form in a higher-level language
CM> in C, where arrays and pointers are the same thing, and array
CM> lookups are defined in the language as pointer arithmetic.
CM> This results in a lot of C's less cuddly quirks, but that's
CM> another subject...). But in Perl, each element can be
CM> entirely different and each element has its own size. There
CM> are several different ways Perl can handle this problem; I
CM> don't know Perl internals so I don't know how it does it.
CM> But however it's done, it's going to take more cycles than
CM> a simple homogenous array (and probably more space, too).

perl's arrays ARE homogeneous but at a different level than scalar
values. the SV's of an array are in a real c array and are indexed just
like you say. otherwise how would perl lookup an array element? do you
think it scans the array in some linked list format for the nth element?

but my quoted statement is still true - perl arrays are faster and use
less ram than hashes. this is easily tested with benchmark.pm and
the devel::size (iirc the name correctly) module that shows ram
usage. so you don't need to know perl internals to learn about array vs
hash resource usage.

uri

Peter J. Holzer · Sep 23, 2006

That's true in most languages because in most languages arrays
are required to have all elements be the same type, making
looking up an element a simple matter of pointer arithmetic
(you see this in its most naked form in a higher-level language
in C, where arrays and pointers are the same thing,

It seems people are just as confused about the difference between arrays
and pointers in C as they are about the difference between arrays and
lists in Perl

Arrays and pointers in C are definitely not the same thing, An array is
a group of elements of the same type, while a pointer, well, points to
an element of a certain type. You notice the difference when you try to
assign to an array (doesn't work) or when you use sizeof on a pointer
(probably not what you wanted), and very spectacularly if you define a
variable as "int a[10]" in one source file and declare it as "extern int
*a" in another. However:

and array lookups are defined in the language as pointer arithmetic.

This is true. (Actually it's the other way round: Pointer arithmetic is
defined in terms as accesses to array elements)

When you use an array as an rvalue in an expression, it is converted to
a pointer to its first element. Thus you can use an array as an rvalue
whereever you can use a pointer.

(There is also the syntactic quirk that you cannot use arrays as
parameters, but it's not an error to try - the compiler will just
silently convert the array declaration to a pointer declaration)

hp

Dr.Ruud · Sep 23, 2006

Chris Mattern schreef:

Jack:

Um, no, arrays are much more efficient than hashes.

No, they are and they aren't. It all depends on what you use them for.

<example type="bad">

my %hary = (
0 => q{foo},
1 => q{bar},
) ;

my @ash = (
[q{dgd vggw ggg tgregdtg trgregfdsggt gtdeg trg}, q{foo}] ,
[q{wgdeg ergrewg rewgrew grewg ewg ewgew gewrgewg}, q{bar}] ,
) ;

</example>

Josef Moellers · Sep 25, 2006

Jürgen Exner said:
Not really. To access the 23rd element of an array in a typical
implementation you would simply access
StartOfArray + 23 * SizeOfArrayElement
which is just adding a constant that is independant of the size of the
array. No need to search (which would be O(n log n)).

How then do you know it's item #23?

Iterating hashes	11	May 14, 2013
arrays and hashes	13	Sep 15, 2006
FAQ 4.54 Why does defined() return true on empty arrays and hashes?	0	Apr 21, 2011
FAQ 4.44 How do I test whether two arrays or hashes are equal?	2	Apr 20, 2011
FAQ 4.62 What's the difference between "delete" and "undef" with hashes?	0	Feb 5, 2011
A remark about 'field hashes'	2	Jun 18, 2013
Some sort questions - especially hashes	4	Oct 10, 2013
Differences Between Arrays and Matrices in Numpy	1	Jul 29, 2009

differences between hashes and arrays ?

Jack

Mirco Wahab

Tad McClellan

Paul Lalli

xhoster

Jack

Paul Lalli

Brian McCauley

Henry Law

Paul Lalli

Tad McClellan

Uri Guttman

Paul Lalli

Jürgen Exner

Jürgen Exner

Uri Guttman

Uri Guttman

Peter J. Holzer

Dr.Ruud

Josef Moellers

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads