beginner question

M

Michael

In the past I have developed many micro-controller products using
assembler. I have recently started using C, and so far I love it! I am
very motivated to learn and become accustomed to the language.
I have a couple of simple questions; just to make sure I'm not going
about things the wrong way.

When writing functions that handle four bytes, I use the 'unsigned
long' variable type. At first to access the individual bytes of an
unsigned long, I would create a union as such:

union {
unsigned long longword;
char byte[4];
} varname;

Then use varname.longword or varname.byte[0-3] as required.

But while experimenting, I found a better way: (&(char)longword)[0-3]

Is it ok for code to be dependant on big or little endian? Or does
this indicate bad programming?
There are so many ways to do things in C. I'd appreciate any a
guidance you might have to offer.


Regards, Michael.
 
A

Arthur J. O'Dwyer

In the past I have developed many micro-controller products using
assembler. I have recently started using C, and so far I love it! I am
very motivated to learn and become accustomed to the language.

It's fun, isn't it? :)
I have a couple of simple questions; just to make sure I'm not going
about things the wrong way.

When writing functions that handle four bytes, I use the 'unsigned
long' variable type. At first to access the individual bytes of an
unsigned long, I would create a union as such:

union {
unsigned long longword;
char byte[4];
} varname;

Then use varname.longword or varname.byte[0-3] as required.

This works. Of course, you're not guaranteed to be able to put values
in through 'byte' and get anything sensible out through 'longword', but
given that you say you're doing microcontroller stuff, you probably can
find out exactly what happens on your platform when you do that.

The portable way to extract the bytes of an 'unsigned long' is to
look at it as an array of unsigned char, like this:

unsigned long lu = 42uL;
unsigned char *lu_bytes = &lu;
int i;

puts("The bytes of lu, from low to high address, are:\n");
for (i=0; i < sizeof lu; ++i)
printf("0x%x\n", (unsigned) lu_bytes);

Note the use of 'sizeof lu' in place of '4'. On many machines, this
will print 2A 00 00 00 (with proper linebreaks, of course); but it's
not guaranteed to. Other machines will print 00 00 00 2A, or
2A 00 00 00 00 00 00 00, or (on the DS9000, which uses a highly
sophisticated system of padding bits) DEADBEEF DEADBEEF 0 DEADBEEF.
But while experimenting, I found a better way: (&(char)longword)[0-3]

This is wrong. I think you meant to say,

((char *)&longword)[0-3]

Even now that the syntax is correct, I strongly recommend the use of
'unsigned char' instead of 'char' when dealing with bytes. Save plain
'char' for dealing with actual CHARacters, as the name implies.
Is it ok for code to be dependant on big or little endian? Or does
this indicate bad programming?

A lot of the time, it indicates non-portable programming. Here in
c.l.c, we strive for portability all the time; so if you want to learn
C targeted at your specific compiler/embedded system, I recommend you
find a different newsgroup. That said, Non-Portable is not Bad,
necessarily; it just means your code would be harder to port to a
new system, should you ever decide to try it.
There are so many ways to do things in C. I'd appreciate any a
guidance you might have to offer.

See the welcome messages posted here occasionally, and Google for
'c.l.c FAQ'. The FAQ is pretty long and sometimes dense, but it will
give you a *lot* of "guidance." That's what it's for. :)

HTH,
-Arthur
 
H

Henri Manson

Michael said:
In the past I have developed many micro-controller products using
assembler. I have recently started using C, and so far I love it! I am
very motivated to learn and become accustomed to the language.
I have a couple of simple questions; just to make sure I'm not going
about things the wrong way.

When writing functions that handle four bytes, I use the 'unsigned
long' variable type. At first to access the individual bytes of an
unsigned long, I would create a union as such:

union {
unsigned long longword;
char byte[4];
} varname;

Then use varname.longword or varname.byte[0-3] as required.

But while experimenting, I found a better way: (&(char)longword)[0-3]

Is it ok for code to be dependant on big or little endian? Or does
this indicate bad programming?
There are so many ways to do things in C. I'd appreciate any a
guidance you might have to offer.


Regards, Michael.

not bad programming as long as you document your code it is dependent on
processor architecture :). if you want the bytes in the longword in to
represent lsb to msb bytes in a system independent way you can get them
by using the shift operators e.g

#include <stdio.h>

main()
{
long x = 0xABCD1234;
long y;

/* get bytes out of x. b1 is LSB, b4 is MSB */
unsigned char b1 = (unsigned char) x;
unsigned char b2 = (unsigned char) (x >> 8);
unsigned char b3 = (unsigned char) (x >> 16);
unsigned char b4 = (unsigned char) (x >> 24);

printf("b1 = %02x, b2 = %02x, b3 = %02x, b4 = %02x\n", b1, b2, b3, b4);
/* create a long value out of the bytes */
y = (long) b1 | (long) (b2 << 8) | (long) (b3 << 16) | (long) (b4
<< 24);
printf("y = %x\n", y);
return 0;
}

HTH

Henri Manson
 
J

Jack Klein

In the past I have developed many micro-controller products using
assembler. I have recently started using C, and so far I love it! I am
very motivated to learn and become accustomed to the language.

It's fun, isn't it? :)
I have a couple of simple questions; just to make sure I'm not going
about things the wrong way.

When writing functions that handle four bytes, I use the 'unsigned
long' variable type. At first to access the individual bytes of an
unsigned long, I would create a union as such:

union {
unsigned long longword;
char byte[4];
} varname;

Then use varname.longword or varname.byte[0-3] as required.

This works. Of course, you're not guaranteed to be able to put values
in through 'byte' and get anything sensible out through 'longword', but
given that you say you're doing microcontroller stuff, you probably can
find out exactly what happens on your platform when you do that.

The portable way to extract the bytes of an 'unsigned long' is to
look at it as an array of unsigned char, like this:

unsigned long lu = 42uL;
unsigned char *lu_bytes = &lu;

ITYM unsigned char *lu_bytes = (unsigned char *)&lu;

....to avoid the constraint violation and required diagnostic.
 
J

Jack Klein

In the past I have developed many micro-controller products using
assembler. I have recently started using C, and so far I love it! I am
very motivated to learn and become accustomed to the language.

I've been doing embedded system programming for 25 years, using C in
that type of work for 20.

You are making some non-portable assumptions, and ones that are more
likely to trip you up in embedded systems then they are in desktop
type programming.
I have a couple of simple questions; just to make sure I'm not going
about things the wrong way.

When writing functions that handle four bytes, I use the 'unsigned
long' variable type. At first to access the individual bytes of an
unsigned long, I would create a union as such:

The first mistake you are making is assuming that there are always 8
bits in a byte. That is not how C defines a byte, nor any real
authoritarian source. A quantity of exactly 8 bits is an "octet". C
defines a byte as the smallest addressable unit of storage, and also
the size of an object that can contain characters. A byte in C must
contain at least 8 bits, but can contain more.
union {
unsigned long longword;
char byte[4];
} varname;

Most of the code I have written for the past few months, and also for
the next few months as well, has been for a Texas Instruments 2812,
sort of a hybrid microcontroller and Digital Signal Processor. It has
a good C compiler, but it doesn't do 8 bits, not at all, the hardware
does not support it.

A byte on this processor has 16 bits. All memory access are 16 bits.
In its C compiler, the standard C macro CHAR_BIT, defined in
is 16. sizeof(char) == sizeof(short) == sizeof(int) == 1 said:
Then use varname.longword or varname.byte[0-3] as required.

But while experimenting, I found a better way: (&(char)longword)[0-3]

Is it ok for code to be dependant on big or little endian? Or does
this indicate bad programming?
There are so many ways to do things in C. I'd appreciate any a
guidance you might have to offer.


Regards, Michael.

In the past I have written C for an Analog Devices SHARC DSP as well.
That is a 32 bit only architecture, where all the integer types, char
through long, have sizeof 1 and 32 bits.

I would suggest using Henri's suggestion and using shift and mask.
Then endianness does not matter.

It is actually not hard in general to write code for a platform where
chars have more than 8 bits.
 
M

Michael

Arthur J. O'Dwyer said:
In the past I have developed many micro-controller products using
assembler. I have recently started using C, and so far I love it! I am
very motivated to learn and become accustomed to the language.

It's fun, isn't it? :)
I have a couple of simple questions; just to make sure I'm not going
about things the wrong way.

When writing functions that handle four bytes, I use the 'unsigned
long' variable type. At first to access the individual bytes of an
unsigned long, I would create a union as such:

union {
unsigned long longword;
char byte[4];
} varname;

Then use varname.longword or varname.byte[0-3] as required.

This works. Of course, you're not guaranteed to be able to put values
in through 'byte' and get anything sensible out through 'longword', but
given that you say you're doing microcontroller stuff, you probably can
find out exactly what happens on your platform when you do that.

The portable way to extract the bytes of an 'unsigned long' is to
look at it as an array of unsigned char, like this:

unsigned long lu = 42uL;
unsigned char *lu_bytes = &lu;
int i;

puts("The bytes of lu, from low to high address, are:\n");
for (i=0; i < sizeof lu; ++i)
printf("0x%x\n", (unsigned) lu_bytes);

Note the use of 'sizeof lu' in place of '4'. On many machines, this
will print 2A 00 00 00 (with proper linebreaks, of course); but it's
not guaranteed to. Other machines will print 00 00 00 2A, or
2A 00 00 00 00 00 00 00, or (on the DS9000, which uses a highly
sophisticated system of padding bits) DEADBEEF DEADBEEF 0 DEADBEEF.
But while experimenting, I found a better way: (&(char)longword)[0-3]

This is wrong. I think you meant to say,

((char *)&longword)[0-3]

Even now that the syntax is correct, I strongly recommend the use of
'unsigned char' instead of 'char' when dealing with bytes. Save plain
'char' for dealing with actual CHARacters, as the name implies.
Is it ok for code to be dependant on big or little endian? Or does
this indicate bad programming?

A lot of the time, it indicates non-portable programming. Here in
c.l.c, we strive for portability all the time; so if you want to learn
C targeted at your specific compiler/embedded system, I recommend you
find a different newsgroup. That said, Non-Portable is not Bad,
necessarily; it just means your code would be harder to port to a
new system, should you ever decide to try it.
There are so many ways to do things in C. I'd appreciate any a
guidance you might have to offer.

See the welcome messages posted here occasionally, and Google for
'c.l.c FAQ'. The FAQ is pretty long and sometimes dense, but it will
give you a *lot* of "guidance." That's what it's for. :)

HTH,
-Arthur


Thanks Arthur!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,141
Messages
2,570,817
Members
47,367
Latest member
mahdiharooniir

Latest Threads

Top