Strange behaviour of simple code

D

Dario de Judicibus

I'm getting crazy. Look at this code:

#include <string.h>
#include <stdio.h>
#include <iostream.h>

using namespace std ;

char ini_code[2] = {0xFF, 0xFE} ;
char line_sep[2] = {0x20, 0x28} ;
char para_sep[2] = {0x20, 0x29} ;
char end_code[2] = {0xFF, 0xFF} ;
char tab_code[2] = {0x00, 0x09} ;
char alf_code[2] = {0x00, 0x0A} ;
char acr_code[2] = {0x00, 0x0D} ;

int main ()
{
char code[2] ;
bool gotCR = false ;

cin.read(&code[0], 2) ;

code[0] = ini_code[1] ;
code[1] = ini_code[0] ;
printf("0x%02X%02X\n",code[0],code[1]);
while (cin.read(&code[0], 2))
{
if (code[0] == tab_code[1] && code[1] == tab_code[0])
{
code[0] = line_sep[1] ;
code[1] = line_sep[0] ;
printf("0x%02X%02X\n",code[0],code[1]);
}
else if (code[0] == acr_code[1] && code[1] == acr_code[0])
{
gotCR = true ;
}
else if (code[0] == alf_code[1] && code[1] == alf_code[0])
{
if (gotCR)
{
code[0] = para_sep[1] ;
code[1] = para_sep[0] ;
printf("0x%02X%02X\n",code[0],code[1]);
}
else
{
gotCR = false ;
}
}
else
{
printf("0x%02X%02X\n",code[0],code[1]);
}
}

code[0] = end_code[1] ;
code[1] = end_code[0] ;
printf("0x%02X%02X\n",code[0],code[1]);

return 0 ;
}

I expect a list of

0xNNNN
0xNNNN
....
0xNNNN

Instead I obtain stuff like:

0x004B
0x006F
0x0072
0x0065
0x0061
0x006E
0x0009
0x00FFFFFFC6
0xFFFFFFC5FFFFFFC8
0xFFFFFFC5FFFFFFB5
0xFFFFFFC2FFFFFFC8
0xFFFFFFB2FFFFFFE4
0xFFFFFFB209
0x0009
0x0009
0x0000
0x4800
0x6500
0x6200
0x7200
0x6500

Why??????

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dr. Dario de Judicibus
http://www.dejudicibus.it/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
?

=?iso-8859-1?Q?Juli=E1n?= Albo

Dario de Judicibus escribió:
char ini_code[2] = {0xFF, 0xFE} ;
char line_sep[2] = {0x20, 0x28} ;
char para_sep[2] = {0x20, 0x29} ;
char end_code[2] = {0xFF, 0xFF} ;
char tab_code[2] = {0x00, 0x09} ;
char alf_code[2] = {0x00, 0x0A} ;
char acr_code[2] = {0x00, 0x0D} ;

Use unsigned char.

Regards.
 
H

Howard

char ini_code[2] = {0xFF, 0xFE} ;
char line_sep[2] = {0x20, 0x28} ;
char para_sep[2] = {0x20, 0x29} ;
char end_code[2] = {0xFF, 0xFF} ; <= what should happen when you detect this?
char tab_code[2] = {0x00, 0x09} ;
char alf_code[2] = {0x00, 0x0A} ;
char acr_code[2] = {0x00, 0x0D} ;

int main ()
{
char code[2] ;
bool gotCR = false ;

cin.read(&code[0], 2) ;

code[0] = ini_code[1] ;
code[1] = ini_code[0] ;
printf("0x%02X%02X\n",code[0],code[1]);
while (cin.read(&code[0], 2))
{
if (code[0] == tab_code[1] && code[1] == tab_code[0])
{
code[0] = line_sep[1] ;
code[1] = line_sep[0] ;
printf("0x%02X%02X\n",code[0],code[1]);
}
else if (code[0] == acr_code[1] && code[1] == acr_code[0])
{
gotCR = true ;
}
else if (code[0] == alf_code[1] && code[1] == alf_code[0])
{
if (gotCR)
{
code[0] = para_sep[1] ;
code[1] = para_sep[0] ;
printf("0x%02X%02X\n",code[0],code[1]);
}
else
{
gotCR = false ; <= this is ALREADY false here!
}
}
else
{
printf("0x%02X%02X\n",code[0],code[1]);
}
}

code[0] = end_code[1] ;
code[1] = end_code[0] ;
printf("0x%02X%02X\n",code[0],code[1]);

return 0 ;
}

I expect a list of

0xNNNN
0xNNNN
...
0xNNNN

Instead I obtain stuff like:

0x004B
0x006F
0x0072
0x0065
0x0061
0x006E
0x0009
0x00FFFFFFC6
0xFFFFFFC5FFFFFFC8
0xFFFFFFC5FFFFFFB5
0xFFFFFFC2FFFFFFC8
0xFFFFFFB2FFFFFFE4
0xFFFFFFB209
0x0009
0x0009
0x0000
0x4800
0x6500
0x6200
0x7200
0x6500

Why??????

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dr. Dario de Judicibus
http://www.dejudicibus.it/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Without seeing your input, it's hard to tell. I have a question: why do
you define the codes in the reverse order that you expect to see them? Is
it intentional (for some reason I can't imagine), or is your code doing the
checks wrong?

I do see at least one clear problem: The handling of GotCR is not
correct. You never set it to false after the first time it gets set to
true. Your code to set it to false is in the else of an "if (GotCR)", which
means it only gets set to false when it is ALREADY false!

It also looks like you're getting those "end" codes, which probably
means you have to handle them differently, but there's no code to detect and
handle them. Same with the other codes, like the tab, etc..

But again, with no input to go by, we can't tell how the output gets
generated for sure. Try walking through your app in the debugger and see
what the variable values are at each step. You might also try doing it on
paper to check your design.

-Howard
 
H

Howard

char ini_code[2] = {0xFF, 0xFE} ;
char line_sep[2] = {0x20, 0x28} ;
char para_sep[2] = {0x20, 0x29} ;
char end_code[2] = {0xFF, 0xFF} ;
char tab_code[2] = {0x00, 0x09} ;
char alf_code[2] = {0x00, 0x0A} ;
char acr_code[2] = {0x00, 0x0D} ;
Use unsigned char.


Why? The char type is neither unsigned nor signed unless explicitly
stated, and he's not doing any math or '>' or '<' comparisons where signed
vs. unsigned might make a difference.

The problems, I think, are that his logic is incorrect and incomplete.
(He's not handling all cases, and he's handling the CR incorrectly.)

-Howard
 
?

=?iso-8859-1?Q?Juli=E1n?= Albo

Howard escribió:
char ini_code[2] = {0xFF, 0xFE} ;
char line_sep[2] = {0x20, 0x28} ;
char para_sep[2] = {0x20, 0x29} ;
char end_code[2] = {0xFF, 0xFF} ;
char tab_code[2] = {0x00, 0x09} ;
char alf_code[2] = {0x00, 0x0A} ;
char acr_code[2] = {0x00, 0x0D} ;
Use unsigned char.
Why? The char type is neither unsigned nor signed unless explicitly
stated, and he's not doing any math or '>' or '<' comparisons where signed
vs. unsigned might make a difference.

The char type is a separated type at many effects, but or it has sign or
it has not. If is has sign, 0xFF when converted to int and outputted in
hex gives many more F, as the ouput of the OP shows. Them I suppose that
is the case,

Regards.
 
R

Ron Natalie

Howard said:
char ini_code[2] = {0xFF, 0xFE} ;
Use unsigned char.


Why? The char type is neither unsigned nor signed unless explicitly
stated, and he's not doing any math or '>' or '<' comparisons where signed
vs. unsigned might make a difference.

If char is 8 bits and signed, 0xFF isn't a defined initializer.
 
R

Ron Natalie

Dario de Judicibus said:
0x00FFFFFFC6
0xFFFFFFC5FFFFFFC8
0xFFFFFFC5FFFFFFB5
0xFFFFFFC2FFFFFFC8
0xFFFFFFB2FFFFFFE4
0xFFFFFFB209

Classic sign extension bug. Your signed char gets expanded to int (standard procedure for
vararg'd function like printf). For example 0xFF most likely initialized the char value as -1.
"%X", -1 prints 0xFFFFFFFFF.

You either should use unsigned char or you will have to mask off the sign extensions.
 
H

Howard

Ron Natalie said:
char ini_code[2] = {0xFF, 0xFE} ;
Use unsigned char.


Why? The char type is neither unsigned nor signed unless explicitly
stated, and he's not doing any math or '>' or '<' comparisons where signed
vs. unsigned might make a difference.

If char is 8 bits and signed, 0xFF isn't a defined initializer.

???

But I thought char was *neither* signed nor unsigned, unlike int, which
is signed by default. Are there some implementations that treat assigning
255 to a char as undefined behavior? (That would kind of screw up a lot of
code that uses "extended" ASCII characters, wouldn't it?)

-Howard
 
H

Howard

Ron Natalie said:
Classic sign extension bug. Your signed char gets expanded to int (standard procedure for
vararg'd function like printf). For example 0xFF most likely initialized the char value as -1.
"%X", -1 prints 0xFFFFFFFFF.

You either should use unsigned char or you will have to mask off the sign extensions.

Oh, I see, said the blind man! :) That's pretty poor behavior, in my
opinion. I don't recall ever using unsigned char to store C-style arrays of
characters. I've always used just char. Of course, I don't think I've ever
used printf on such an array either, so I guess I wouldn't have noticed this
strange effect.

-Howard
 
R

Ron Natalie

Howard said:
But I thought char was *neither* signed nor unsigned,

It is a distinct type from signed char or unsigned char, but it will have
the representation of one of those two (it's clearly signed in the original
poster's case).
Are there some implementations that treat assigning
255 to a char as undefined behavior?

Implementation-defined. Attempting to convert numbers that
are larger than can be represented into signed values is implmentation
defined. Unsigneds on the hand are required to wrap module 2**number of bits.

(That would kind of screw up a lot of
code that uses "extended" ASCII characters, wouldn't it?)

The problem is not the char representation of "FF" but the fact
that using an integer 0xFF to initialize a signed char may not yield
the right value.
 
T

tom_usenet

But I thought char was *neither* signed nor unsigned, unlike int, which
is signed by default. Are there some implementations that treat assigning
255 to a char as undefined behavior? (That would kind of screw up a lot of
code that uses "extended" ASCII characters, wouldn't it?)

It isn't undefined behaviour, only implementation defined (as signed
integer overflow generally is). On ASCII platforms, it generally "does
the right thing".

There are 3 distinct types - char, unsigned char and signed char.
Although char is a separate type, it can take on the same values as
either unsigned char or signed char. Some compilers offer a switch to
choose which you prefer.

Tom
 
D

Dario de Judicibus

FIRST OF ALL, the problem was UNSIGNED char. Solved.
Without seeing your input, it's hard to tell. I have a question: why do
you define the codes in the reverse order that you expect to see them? Is
it intentional (for some reason I can't imagine), or is your code doing the
checks wrong?

I'm reading a Little Endian file. Code is missing of some encoding I'll do
between reading and writing files, of course.
I do see at least one clear problem: The handling of GotCR is not
correct. You never set it to false after the first time it gets set to
true. Your code to set it to false is in the else of an "if (GotCR)", which
means it only gets set to false when it is ALREADY false!

First time only. That's just a reset.
It also looks like you're getting those "end" codes, which probably
means you have to handle them differently, but there's no code to detect and
handle them. Same with the other codes, like the tab, etc..

That's just the foundation of code. I have to add other code, but first I
had to ensure that the basic code works. I'm writing a pipe. First stem is
to ensure that what's center in pipe go out correctly. Then I'll add some
other logics in the middle.

DdJ
 
H

Howard

FIRST OF ALL, the problem was UNSIGNED char. Solved.

Cool. I wouldn't have seen that sign problem, either. :)

First time only. That's just a reset.

I don't understand what the first time has to do with it. My point was that
this else block of code accomplishes nothing (ever):

if (gotCR)
{...}
else
{
gotCR = false ;
}

The else block *only*gets called if gotCR is false, and all it does is set
gotCR to false, but it already *is* false. Therefore, either you don't need
the else statement at all, or else your design is incorrect and you meant to
be doing something different.

Glad you got the real problem solved.

-Howard
 
D

Dario de Judicibus

I don't understand what the first time has to do with it. My point was that
this else block of code accomplishes nothing (ever):

if (gotCR)
{...}
else
{
gotCR = false ;
}

There is another point previously in code where gotCR is set:

else if (code[0] == acr_code[1] && code[1] == acr_code[0])
{
gotCR = true ;
}

You probably missed it.

DdJ
 
H

Howard

I don't understand what the first time has to do with it. My point was that
this else block of code accomplishes nothing (ever):

if (gotCR)
{...}
else
{
gotCR = false ;
}

There is another point previously in code where gotCR is set:

else if (code[0] == acr_code[1] && code[1] == acr_code[0])
{
gotCR = true ;
}

You probably missed it.

That's not relevant to my point at all. The else portion of the code I'm
talking about does nothing, ever! That else clause is *only* executed when
gotCR is false already. And all it does is set gotCR to false. What is it
that you think it does, and when? Walk through it. Suppose that gotCR is
false. In that case, the "if (gotCR)" condition wil fail, and the "else"
clause wil execute. But the else clause only sets gotCR to false. But we
just said that gotCR is false, so what does it mean to set gotCR to false if
it is already false? On the other hand, if gotCR is true, then the else
block will *not* execute. Therefore, the else block of code will either do
nothing (since setting a variable to false when it is already false is the
same as doing nothing), or else it will not execute at all. There's no
other condition here, regardless of any code anywhere else in the program.
In ALL cases, the following two pieces of code do exactly the same thing:

if (boolVariable)
{ doSomething; };
else
{ boolVariable = false; };

- and -

if (boolVariable)
{doSomething; };


(Try it out if you don't believe me. Put a breakpoint in your else clause
and you'll see that the line "gotCR = false;" is only executed when gotCR is
already false.)

-Howard
 
D

Dario de Judicibus

You are right.

My

if () {
} else if() {
if () {
} else {
}
}

must become

if () {
} else if() {
} else if () {
} else {
}

(some condition must be midified too).

Thank you

DdJ
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,147
Messages
2,570,833
Members
47,380
Latest member
AlinaBlevi

Latest Threads

Top