Arrays Vs Pointers

N

Nick Keighley

I am currently exploring the world of pointers and have encounter some
inconsistent information regarding the best way to reference an
array: array[1]   OR    *(array +1).

which is shortest and clearest? I'm much more of a fan of array
notation. Simple, clear and less typing. Modern compilers (anything in
the last 20 years or so) generate identical code anyway.

The only time I tend to use pointers is say picking a comms packet
apart.

pkt_ptr = &packet [0];
type = *pkt_ptr++;
length = (*pkt_ptr) * 256 + *pkt_ptr;
pkt_ptr += 2;

Each time you read a byte you advance the pointer you don't have to
keep track and count bytes yourself. So is the CRC field in byte 21 or
byte 22? Well it depends if the Qualifier option was selected.
One book says that you should not use the syntax array[1] because
of performance reason.

bad reason. It likely makes no difference.

1. write clear, correct code
2. test it thoughly
3. if its too slow make it faster

step 3 is not necessary far more often tahn many people imagine.
People often *start* at step 3. "wow that's /really/ fast, all you
have to do now is make it give the right answer!"
Another book says that the syntax array[1] is
only used by FORTRAN programmers who do not understand c pointers.

burn the book.

blacklist the place you bought it

tell your friends

hell, tell us! What was this book?
So
what is the true oh wizards? Does it make a difference?

WRITE SIMPLE CODE
 
J

John Bode

John said:
I am currently exploring the world of pointers and have encounter
some inconsistent information regarding the best way to reference
an array: array[1]   OR    *(array +1).
One book says that you should not use the syntax array[1] because
of performance reason.
FWIW, subscript notation *may* result in a few more instructions at
the assembly level than pointer notation.  Here are some examples
compiled with gcc -g -Wa,aldh:

[examples snipped]
Hmp.  Subscripting with a variable results in more instructions
compared to the manual dereference, at least under these
circumnstances (gcc compiler, debugging turned on, no optimization).

Looking at the code generated with no optimization is pointless--if you
care about performance, you need to turn on optimizations; it's as
simple as that. GCC generates really horrible code with optimizations
off, as exemplified by the *two* "mov eax, eax" instructions in your
sample code.

Right. I was trying to make two different points and wound up munging
them together.

First, just because "a" is defined as "*(a+i)" doesn't mean that a
given compiler will always generate the same machine code for both
under all conditions (your point is valid, but orthogonal to the
argument I'm trying to make).

Second, *even if* array notation results in less efficient code, use
it in preference of pointer notation *unless* you're failing to meet a
hard performance requirement and you've exhausted all other sources of
optimization (including compiler settings).
 
8

88888 Dihedral

在 2012å¹´2月9日星期四UTC+8下åˆ4æ—¶20分23秒,Nick Keighley写é“:
I am currently exploring the world of pointers and have encounter some
inconsistent information regarding the best way to reference an
array: array[1]   OR    *(array +1).

which is shortest and clearest? I'm much more of a fan of array
notation. Simple, clear and less typing. Modern compilers (anything in
the last 20 years or so) generate identical code anyway.

The only time I tend to use pointers is say picking a comms packet
apart.

pkt_ptr = &packet [0];
type = *pkt_ptr++;
length = (*pkt_ptr) * 256 + *pkt_ptr;
pkt_ptr += 2;

Each time you read a byte you advance the pointer you don't have to
keep track and count bytes yourself. So is the CRC field in byte 21 or
byte 22? Well it depends if the Qualifier option was selected.
One book says that you should not use the syntax array[1] because
of performance reason.

bad reason. It likely makes no difference.

1. write clear, correct code
2. test it thoughly
3. if its too slow make it faster

step 3 is not necessary far more often tahn many people imagine.
People often *start* at step 3. "wow that's /really/ fast, all you
have to do now is make it give the right answer!"
Another book says that the syntax array[1] is
only used by FORTRAN programmers who do not understand c pointers.

burn the book.

blacklist the place you bought it
Don't burn the book. That is not the way to learn black magics in manny
books in the sophomore or above levels.

tell your friends

hell, tell us! What was this book?


WRITE SIMPLE CODE

Learn to write more black spells in the graduate school levels.
Don't be so innocent like a child in the elementary school.
 
P

Philip Lantz

John said:
Looking at the code generated with no optimization is pointless--
if you care about performance, you need to turn on optimizations;
it's as simple as that.

Right. I was trying to make two different points and wound up munging
them together.

First, just because "a" is defined as "*(a+i)" doesn't mean that a
given compiler will always generate the same machine code for both
under all conditions (your point is valid, but orthogonal to the
argument I'm trying to make).

Second, *even if* array notation results in less efficient code, use
it in preference of pointer notation *unless* you're failing to meet a
hard performance requirement and you've exhausted all other sources of
optimization (including compiler settings).


Yup, I totally agree with both of those. Thanks for clarifying.
 
G

gwowen

WRITE SIMPLE CODE

Amen.

"Programs must be written for people to read, and only incidentally
for machines to execute." -- Abelson & Sussman, Structure and
Interpretation of Computer Programs
 
D

David Thompson

On Feb 7, 9:28 pm, peter <[email protected]> wrote:
The only time I tend to use pointers is say picking a comms packet
apart.

pkt_ptr = &packet [0];
type = *pkt_ptr++;
length = (*pkt_ptr) * 256 + *pkt_ptr;
pkt_ptr += 2;
I really hope you meant
length = pkt_ptr[0] * 256 + pkt_ptr[1]; pkt_ptr += 2;
or one of the several equivalent vairants. Or possibly
len = *pkt_ptr++;
len = len * 256 + *pkt_ptr++;
or again variants, but critically with a sequence point somehow.
Each time you read a byte you advance the pointer you don't have to
keep track and count bytes yourself. So is the CRC field in byte 21 or
byte 22? Well it depends if the Qualifier option was selected.
That works with an index variable also.
int i = 0; // or unsigned or size_t
type = pkt[i++];
len = pkt * 256 + pkt[i+1]; i += 2;
or other variants similarly.

I've even been known to write
len = pkt[i+0] * 256 + pkt[i+1; i += 2;
to show the symmetry, and count on the compiler optimizing +0.

And FW(L)IW it's theoretically safer to check
if( i + len > size_read ) /* invalid */
than
if( ptr + len > buf + size_read ) /* invalid */
since merely forming an invalid pointer, even without dereferencing
it, is Undefined Behavior, while an out-of-expected-range integer is
not, except for signed overflow which is often easier to avoid.

<snip rest>
 
K

Kaz Kylheku

On Feb 7, 9:28 pm, peter <[email protected]> wrote:
The only time I tend to use pointers is say picking a comms packet
apart.

pkt_ptr = &packet [0];
type = *pkt_ptr++;
length = (*pkt_ptr) * 256 + *pkt_ptr;
pkt_ptr += 2;
I really hope you meant
length = pkt_ptr[0] * 256 + pkt_ptr[1]; pkt_ptr += 2;
or one of the several equivalent vairants. Or possibly
len = *pkt_ptr++;
len = len * 256 + *pkt_ptr++;
or again variants, but critically with a sequence point somehow.

If we could throw away sequence points and have left-to-right evaluation,
we could simply write:

length = 256 * *pkt_ptr++ + *pkt_ptr++;

Thus:
Each time you read a byte you advance the pointer you don't have to
keep track and count bytes yourself. So is the CRC field in byte 21 or
byte 22? Well it depends if the Qualifier option was selected.
Exactly.

That works with an index variable also.
int i = 0; // or unsigned or size_t
type = pkt[i++];
len = pkt * 256 + pkt[i+1]; i += 2;
or other variants similarly.


When I write this kind of code, I have the sense that I'm being the "human
compiler".

In my head I wrote this:

length = 256 * *pkt_ptr++ + *pkt_ptr++;

But the nice, well-ordered language in my well-ordered mind has to be compiled
by hand into the disordered language in the machine, e.g.:

length = 256 * *pkt_ptr++;
length += *pkt_ptr++;
 
W

Willem

peter wrote:
) I am currently exploring the world of pointers and have encounter some
) inconsistent information regarding the best way to reference an
) array: array[1] OR *(array +1).
)
) One book says that you should not use the syntax array[1] because
) of performance reason.

That book is completely and utterly wrong.
In C, "array[index]" is *exactly the same* as "*(array + index)"
I would go so far as to call it syntactic sugar.
Hell, it's even commutative. "array[index]" means the same as "index[array]" !

) Another book says that the syntax array[1] is
) only used by FORTRAN programmers who do not understand c pointers.

That sounds like nothing but evangelism.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
M

Markus Wichmann

If we could throw away sequence points and have left-to-right evaluation,
we could simply write:

length = 256 * *pkt_ptr++ + *pkt_ptr++;

But we don't, so we can't. I don't get what your difficulties are with
that. C is the way it is, Java is different. In case you didn't notice,
Java is built on different principles: C's directive is "trust the
programmer" (though the inclusion of complex and imaginary numbers
somewhat goes against this) and Java's directive is "protect the
programmer from his own mistakes".

Where Java falls short is that the really bad mistakes programmers make
aren't the kind a compiler could find. But OK, there are nearly-as-bad
mistakes that Java takes care of nicely (memory-leaks, free-after-use,
access violations with corruption, ...). Only sometimes it is too
obsessive with that. Like, it doesn't let you use variables that may be
uninitialized. Which would be a lot more helpful if the compiler
actually made a proper data-flow analysis instead of the half-assed
stuff it's doing currently. Try the following test class:

public class Test {
public static void main(String[] args)
{
int i;

if (args.length > 3)
i = 3;
if (args.length <= 3)
i = 2;
System.out.println(i);
}
}

and compare the output of the compiler with what you get from a decent C
compiler at highest warning level when fed this source code:

#include <stdio.h>

int main(int argc, char* argv[])
{
int i;

(void) argv;
if (argc > 3)
i = 3;
if (argc <= 3)
i = 2;

printf("%d\n", i);
return 0;
}

And I don't get what is so hard about

length = (pkt_ptr[0] << 8) + pkt_ptr[1];
pkt_ptr += 2;

either.
Thus:
Each time you read a byte you advance the pointer you don't have to
keep track and count bytes yourself. So is the CRC field in byte 21 or
byte 22? Well it depends if the Qualifier option was selected.
Exactly.

That works with an index variable also.
int i = 0; // or unsigned or size_t
type = pkt[i++];
len = pkt * 256 + pkt[i+1]; i += 2;
or other variants similarly.


When I write this kind of code, I have the sense that I'm being the "human
compiler".

In my head I wrote this:

length = 256 * *pkt_ptr++ + *pkt_ptr++;


If you want to write a line like this, you can! Just don't feed it a C
compiler, but Java, if that's what you want.

Otherwise, why don't you just build a C compiler with that rule
implemented? The behaviour is undefined by C which means that your
compiler would still be conforming.
But the nice, well-ordered language in my well-ordered mind has to be compiled
by hand into the disordered language in the machine, e.g.:

length = 256 * *pkt_ptr++;
length += *pkt_ptr++;

You really don't know what compiling by hand is. If you were ever in a
situation where operator overloading of C++ truly was helpful, then
you'll know if you had to rewrite that stuff into C. (Yeah, I think the line

y = 2 * x + 15 * a + (b - c) / d;

was better than the block

{
Rational t1, t2, t3;
rat_set_i(&t1, 2, 1);
rat_mul(&t1, &t1, &x);
rat_set_i(&t2, 15, 1);
rat_mul(&t2, &t2, &a);
rat_add(&t1, &t1, &t2);
rat_sub(&t3, &b, &c);
rat_div(&t3, &t3, &d);
rat_add(&y, &t1, &t3);
}

)

Ciao,
Markus
 
W

Willem

Markus Wichmann wrote:
) You really don't know what compiling by hand is. If you were ever in a
) situation where operator overloading of C++ truly was helpful, then
) you'll know if you had to rewrite that stuff into C. (Yeah, I think the line
)
) y = 2 * x + 15 * a + (b - c) / d;
)
) was better than the block
)
) {
) Rational t1, t2, t3;
) rat_set_i(&t1, 2, 1);
) rat_mul(&t1, &t1, &x);
) rat_set_i(&t2, 15, 1);
) rat_mul(&t2, &t2, &a);
) rat_add(&t1, &t1, &t2);
) rat_sub(&t3, &b, &c);
) rat_div(&t3, &t3, &d);
) rat_add(&y, &t1, &t3);
) }

That's a bit unfair (just a bit though):

y = rat_add(rat_mul(rat_int(2), x),
rat_add(rat_mul(rat_int(15), a),
rat_div(rat_sub(b, c), d)));


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
S

Stefan Ram

Markus Wichmann said:
if (args.length > 3)
i = 3;
if (args.length <= 3)
i = 2;

But in this special case, an if-then-else would be nicer.
And then you can even declare »i« as constant in Java:

{ final int i; if( args.length > 3 )i = 3; else i = 2; ... }

Though I'd always prefer

final int i =( args.length > 3 )3 : 2;
 
K

Keith Thompson

Willem said:
peter wrote:
) I am currently exploring the world of pointers and have encounter some
) inconsistent information regarding the best way to reference an
) array: array[1] OR *(array +1).
)
) One book says that you should not use the syntax array[1] because
) of performance reason.

That book is completely and utterly wrong.
In C, "array[index]" is *exactly the same* as "*(array + index)"
I would go so far as to call it syntactic sugar.
Hell, it's even commutative. "array[index]" means the same as "index[array]" !
[...]

Yes, but I doubt that that's what the book actually said. I'm only
guessing, but the book was probably advocating code like this:

void func(char *s) {
while (*s != '\0') {
do_something_with(*s);
s++;
}
}

over this:

void func(char *s) {
size_t i = 0;
while (s != '\0') {
do_something_with(s);
i++;
}
}

The former, by updating the pointer on each iteration, avoids
redoing the index calculation, which might be moderately expenstive,
especially for arrays of something bigger than char.

It's a largely obsolete argument, since modern optimizing compilers can
generate the same code for both, but it's not "completely and utterly
wrong".

I'd be interested in knowing which book the previous poster was
referring to, how old it is, and what it actually says.
 
W

Willem

Keith Thompson wrote:
)> peter wrote:
)> ) I am currently exploring the world of pointers and have encounter some
)> ) inconsistent information regarding the best way to reference an
)> ) array: array[1] OR *(array +1).
)> )
)> ) One book says that you should not use the syntax array[1] because
)> ) of performance reason.
)>
)> That book is completely and utterly wrong.
)> In C, "array[index]" is *exactly the same* as "*(array + index)"
)> I would go so far as to call it syntactic sugar.
)> Hell, it's even commutative. "array[index]" means the same as "index[array]" !
) [...]
)
) Yes, but I doubt that that's what the book actually said.

Granted.

) I'm only
) guessing, but the book was probably advocating code like this:
)
) void func(char *s) {
) while (*s != '\0') {
) do_something_with(*s);
) s++;
) }
) }
)
) over this:
)
) void func(char *s) {
) size_t i = 0;
) while (s != '\0') {
) do_something_with(s);
) i++;
) }
) }
)
) The former, by updating the pointer on each iteration, avoids
) redoing the index calculation, which might be moderately expenstive,
) especially for arrays of something bigger than char.

Or which might not be expensive at all, especially given special indexing
instructions such as those on the x86. Also for types bigger than char.

A long time ago, I did some speed tests on an x86 machine, and it turned
out that there was no speed difference whatsoever between an indexed loop
and a pointer loop.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
B

Ben Bacarisse

Willem said:
Keith Thompson wrote:
) I'm only
) guessing, but the book was probably advocating code like this:
)
) void func(char *s) {
) while (*s != '\0') {
) do_something_with(*s);
) s++;
) }
) }
)
) over this:
)
) void func(char *s) {
) size_t i = 0;
) while (s != '\0') {
) do_something_with(s);
) i++;
) }
) }
)
) The former, by updating the pointer on each iteration, avoids
) redoing the index calculation, which might be moderately expenstive,
) especially for arrays of something bigger than char.

Or which might not be expensive at all, especially given special indexing
instructions such as those on the x86. Also for types bigger than char.

A long time ago, I did some speed tests on an x86 machine, and it turned
out that there was no speed difference whatsoever between an indexed loop
and a pointer loop.


With a relatively recent gcc, there's no need to test, at least for this
code. You get the same instructions for the loop body (with -O2) for
both loops.
 
M

Markus Wichmann

But in this special case, an if-then-else would be nicer.
And then you can even declare »i« as constant in Java:

{ final int i; if( args.length > 3 )i = 3; else i = 2; ... }

Though I'd always prefer

final int i =( args.length > 3 )3 : 2;

.


The example was chosen deliberately to show the flaws in the Java
data-flow analyzer: With if-else the code compiles, without it, the
compiler aborts on account of "i may not be initialized".

gcc does not generate a "may not be initialized" warning in the C code I
wrote below that (yes, I know that I have to activate the optimizer to
see those). Which means it is entirely possible for a program to see
that there is no code path to a read on i without prior initialization.

And by now we got totally off-topic...

Ciao,
Markus
 
J

James Kuyper

On 02/20/2012 05:32 PM, pete wrote:
....
Teach Yourself C,
by Herbert Schildt, copyright 1990,
Chapter 6.3, Use Pointers With Arrays,
page 197:

For somewhat complex reasons,
a C compiler will generally create faster executable code
for an expression such as

*(p+3)

than it will for the comparable array index

p[3]

Schildt is a notoriously unreliable source of information about C. It's
possible that what he said was true for some of the compilers he was
most familiar with at the time he wrote that - but I wouldn't recommend
assuming that this was the case. It's certainly not generally true, and
it's far less likely to be the case for modern compiler than it was for
those available in 1990.
 
K

Keith Thompson

pete said:
Keith said:
Willem said:
peter wrote:
) I am currently exploring the world of pointers and have encounter some
) inconsistent information regarding the best way to reference an
) array: array[1] OR *(array +1).
)
) One book says that you should not use the syntax array[1] because
) of performance reason.

That book is completely and utterly wrong.
In C, "array[index]" is *exactly the same* as "*(array + index)"
I would go so far as to call it syntactic sugar.
Hell, it's even commutative. "array[index]" means the same as "index[array]" !
[...]

Yes, but I doubt that that's what the book actually said.

Teach Yourself C,
by Herbert Schildt, copyright 1990,
Chapter 6.3, Use Pointers With Arrays,
page 197:

For somewhat complex reasons,
a C compiler will generally create faster executable code
for an expression such as

*(p+3)

than it will for the comparable array index

p[3]

Had I known it was one of Schildt's books, I wouldn't have assumed
that it said something sensible. (Do we know that that's the book to
which peter was referring?)

Willem is right: the book is completely and utterly wrong.
There's probably a valid point somewhere in the vicinity of what
Schildt wrote (as I discussed upthread), but Schildt missed it.
 
K

Keith Thompson

Keith Thompson said:
pete said:
Keith said:
peter wrote:
) I am currently exploring the world of pointers and have encounter some
) inconsistent information regarding the best way to reference an
) array: array[1] OR *(array +1).
)
) One book says that you should not use the syntax array[1] because
) of performance reason.

That book is completely and utterly wrong.
In C, "array[index]" is *exactly the same* as "*(array + index)"
I would go so far as to call it syntactic sugar.
Hell, it's even commutative. "array[index]" means the same as "index[array]" !
[...]

Yes, but I doubt that that's what the book actually said.

Teach Yourself C,
by Herbert Schildt, copyright 1990,
Chapter 6.3, Use Pointers With Arrays,
page 197:

For somewhat complex reasons,
a C compiler will generally create faster executable code
for an expression such as

*(p+3)

than it will for the comparable array index

p[3]

Had I known it was one of Schildt's books, I wouldn't have assumed
that it said something sensible. (Do we know that that's the book to
which peter was referring?)

Willem is right: the book is completely and utterly wrong.
There's probably a valid point somewhere in the vicinity of what
Schildt wrote (as I discussed upthread), but Schildt missed it.

Let me amend that slightly. It's possible that some compilers might
generate faster code for *(p+3) than for p[3]. There's no good reason
for that, since the two expressions are equivalent by definition, but
it's possible. It's also possible Schildt encountered such a compiler
and reached an overly general conclusion.
 
K

Keith Thompson

pete said:
I recall a thread about a year or so ago,
where the controling expression in a loop
was shown to have been rendered
with a register, when {!= 0} was omitted.
That is to say that there was a compiler for which
while (x)
was faster than
while ((x) != 0)
I think it was a gcc compiler
and I remember being disappointed
that the compiler had translated
two semantically equal expressions differently.

That would be relevant only if it behaved that way with optimization
enabled.
 
R

Richard Sanders

Let me amend that slightly. It's possible that some compilers might
generate faster code for *(p+3) than for p[3]. There's no good reason
for that, since the two expressions are equivalent by definition, but
it's possible. It's also possible Schildt encountered such a compiler
and reached an overly general conclusion.

I had that book at one time. The book was explicitly for Borland C++.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,082
Messages
2,570,589
Members
47,211
Latest member
Shamestone

Latest Threads

Top