Printing a char* which is not a string: I do not understand this

H

Hendrik Maryns

Hi group,

I am working on a JNI project. However, since things didn’t work the
way I wanted them, I wrote some utility functions. One of them is a
function to print the following struct to stdout:

typedef unsigned mgState;
typedef unsigned mgId;
typedef char *mA;

typedef struct mgTreeNode { /* tree node */
mA a; /* alphabet element */
struct mgTreeNode *left, *right; /* successors */
mgId id; /* state space id */
mgState state; /* automaton state */
} mgTreeNode;

Important to notice here is that, although mA is typedef’ed to char*, it
is not a string, but really an array of char, containing 0 and 1 (so
some primitive form of bit vector). However, when printing it, I want
to see '0' and '1', of course.

My first try was the following:

void printTreeNode(mgTreeNode * node, int labelLength) {
int i;
char *label = malloc(labelLength);
for (i = 0; i < labelLength; ++i) {
label = node->a + '0';
}
printf("a: %s, id: %d, state: %d, left: [", label, node->id, node->state);
if (node->left) {
printTreeNode(node->left, labelLength);
} else {
printf("nil");
}
printf("], right: [");
if (node->right) {
printTreeNode(node->right, labelLength);
} else {
printf("nil");
}
printf("]");
fflush(stdout);
free(label);
}

There definitely are more elegant ways to do it, but hey, I’m a real
noob in C.

Now, when trying this out in a C program, it gave the expected results:
a: 000000, id: 0, state: 0, left: [a: 100000, id: 0, state: 0, left: [a:
010001, id: 0, state: 0, … (continued recursively)

However, I wrapped this function with SWIG (http://www.swig.org/), and
when calling it from Java, I get to see the following stuff:
a: 0�\Ӫ*, id: 0, state: 0, left: [a: 0�\Ӫ*, id: 0, state: 0, left: [a:
0�\Ӫ*, …

So some encoding problem is getting in the way. Maybe this is the time
to say that I am on 64-bit Linux, where UTF-8 is the system standard.
Note that those characters are being produced on the C side, the string
is not passed to Java (I tried that as well, with the same result).

I then messed around a bit and came up with the following:

void printTreeNode(mgTreeNode * node, int labelLength) {
int i;
printf("a: ");
for (i = 0; i < labelLength; ++i) {
printf("%c", node->a + '0');
}
printf(", id: %d, state: %d, left: [", node->id, node->state);
if (node->left) {
printTreeNode(node->left, labelLength);
} else {
printf("nil");
}
printf("], right: [");
if (node->right) {
printTreeNode(node->right, labelLength);
} else {
printf("nil");
}
printf("]");
fflush(stdout);
}

You’ll notice a little difference in that the ‘label’ variable is no
longer used and the chars are printed directly.

My question: what the hell causes this strange stuff?

Grateful for any clarifications, H.
--
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
http://aouw.org
Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFHmLl1e+7xMGD3itQRAu/kAJ9MEd80nzicyq8r/g96lBcNHB+ctACeKC8K
J5jK2Uy3OYV7OmqOm2Qx8xM=
=UfwX
-----END PGP SIGNATURE-----
 
F

fred.l.kleinschmidt

Hi group,

I am working on a JNI project. However, since things didn't work the
way I wanted them, I wrote some utility functions. One of them is a
function to print the following struct to stdout:

typedef unsigned mgState;
typedef unsigned mgId;
typedef char *mA;

typedef struct mgTreeNode { /* tree node */
mA a; /* alphabet element */
struct mgTreeNode *left, *right; /* successors */
mgId id; /* state space id */
mgState state; /* automaton state */

} mgTreeNode;

Important to notice here is that, although mA is typedef'ed to char*, it
is not a string, but really an array of char, containing 0 and 1 (so
some primitive form of bit vector). However, when printing it, I want
to see '0' and '1', of course.

My first try was the following:

void printTreeNode(mgTreeNode * node, int labelLength) {
int i;
char *label = malloc(labelLength);
for (i = 0; i < labelLength; ++i) {
label = node->a + '0';
}
printf("a: %s, id: %d, state: %d, left: [", label, node->id, node->state);

<snip>

You do not show the code for storing information in node->a,
so we can't tell whether a is even valid.

The constant '0' is the character zero, which for ASCII
is the integer 48. So you are adding 48 to each character??
What is the purpose of this?

Then you print using %s. But that requires a NULL-terminated
array of characters. '\0' and '0' are two very different things.
 
H

Harald van Dijk

Important to notice here is that, although mA is typedef'ed to char*,
it is not a string, but really an array of char, containing 0 and 1 (so
some primitive form of bit vector). However, when printing it, I want
to see '0' and '1', of course.

My first try was the following:

void printTreeNode(mgTreeNode * node, int labelLength) {
int i;
char *label = malloc(labelLength);
for (i = 0; i < labelLength; ++i) {
label = node->a + '0';
}
printf("a: %s, id: %d, state: %d, left: [", label, node->id,
node->state);

<snip>

You do not show the code for storing information in node->a, so we can't
tell whether a is even valid.

The constant '0' is the character zero, which for ASCII is the integer
48. So you are adding 48 to each character?? What is the purpose of
this?


a is stated to contain 0 or 1, and a+'0' converts it to '0' or '1',
as intended.
Then you print using %s. But that requires a NULL-terminated array of
characters. '\0' and '0' are two very different things.

This is correct, and exactly the problem. The argument to malloc should
be labellength+1, and label[labellength] should be set to '\0'.
 
V

vippstar

Hi group,
I am working on a JNI project. However, since things didn't work the
way I wanted them, I wrote some utility functions. One of them is a
function to print the following struct to stdout:
typedef unsigned mgState;
typedef unsigned mgId;
typedef char *mA;
typedef struct mgTreeNode { /* tree node */
mA a; /* alphabet element */
struct mgTreeNode *left, *right; /* successors */
mgId id; /* state space id */
mgState state; /* automaton state */
} mgTreeNode;
Important to notice here is that, although mA is typedef'ed to char*, it
is not a string, but really an array of char, containing 0 and 1 (so
some primitive form of bit vector). However, when printing it, I want
to see '0' and '1', of course.
My first try was the following:
void printTreeNode(mgTreeNode * node, int labelLength) {
int i;
char *label = malloc(labelLength);
for (i = 0; i < labelLength; ++i) {
label = node->a + '0';
}
printf("a: %s, id: %d, state: %d, left: [", label, node->id, node->state);


<snip>

You do not show the code for storing information in node->a,
so we can't tell whether a is even valid.

The constant '0' is the character zero, which for ASCII
is the integer 48. So you are adding 48 to each character??

'0' to '9' are guaranteed to be sequential.
Therefore, '0' + 2 == '2' et cetera
 
P

Peter Nilsson

<snip>
Then you print using %s. But that requires a NULL-terminated

'null terminated' would be better; 'null byte terminated' is
better still.
array of characters.

Strictly speaking, it is possible to print a non string
sequence of characters with the s conversion specifier,
if you don't exceed the length of it...

#include <stdio.h>

int main(void)
{
const char huh[4] = { 'H', 'u', 'h', '!' };
printf("%.4s\n", huh);
return 0;
}
 
R

Richard Tobin

Peter Nilsson said:
Strictly speaking, it is possible to print a non string
sequence of characters with the s conversion specifier,
if you don't exceed the length of it...

#include <stdio.h>

int main(void)
{
const char huh[4] = { 'H', 'u', 'h', '!' };
printf("%.4s\n", huh);
return 0;
}

And even more usefully, the length does not need to be constant:

printf("%.*s\n", len, buf);

I have seen code from a certain well-known large software company that
used malloc()/memcpy()/printf()/free() repeatedly to achieve this effect.

-- Richard
 
H

Hendrik Maryns

Richard Tobin schreef:
Peter Nilsson said:
Strictly speaking, it is possible to print a non string
sequence of characters with the s conversion specifier,
if you don't exceed the length of it...

#include <stdio.h>

int main(void)
{
const char huh[4] = { 'H', 'u', 'h', '!' };
printf("%.4s\n", huh);
return 0;
}

And even more usefully, the length does not need to be constant:

printf("%.*s\n", len, buf);

I have seen code from a certain well-known large software company that
used malloc()/memcpy()/printf()/free() repeatedly to achieve this effect.

Now that is a nice trick I will remember.

Thank you all for your answers. We’re a little smarter now :)

H.
--
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
http://aouw.org
Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFHmf/we+7xMGD3itQRAvESAJ9GjPHdDLnvimfgUBiAOSMZp4NDhQCfQ9i5
1LQ2zS+IKU+hHvAwpP+sC6U=
=rEN2
-----END PGP SIGNATURE-----
 
H

Hendrik Maryns

Harald van Dijk schreef:
Important to notice here is that, although mA is typedef'ed to char*,
it is not a string, but really an array of char, containing 0 and 1 (so
some primitive form of bit vector). However, when printing it, I want
to see '0' and '1', of course.

My first try was the following:

void printTreeNode(mgTreeNode * node, int labelLength) {
int i;
char *label = malloc(labelLength);
for (i = 0; i < labelLength; ++i) {
label = node->a + '0';
}
printf("a: %s, id: %d, state: %d, left: [", label, node->id,
node->state);

<snip>

You do not show the code for storing information in node->a, so we can't
tell whether a is even valid.


Just suppose it is. As I said, there is a way to know the length of the
array.
The constant '0' is the character zero, which for ASCII is the integer
48. So you are adding 48 to each character?? What is the purpose of
this?

a is stated to contain 0 or 1, and a+'0' converts it to '0' or '1',
as intended.
Then you print using %s. But that requires a NULL-terminated array of
characters. '\0' and '0' are two very different things.

This is correct, and exactly the problem. The argument to malloc should
be labellength+1, and label[labellength] should be set to '\0'.


I corrected the code as follows:

char *printTreeNode(mgTreeNode * node, int labelLength) {
int i;
char *buffer;
char *leftDaughter = 0;
char *rightDaughter = 0;
char *label = malloc(labelLength+1);
for (i = 0; i < labelLength; ++i) {
label = node->a + '0';
}
label[labelLength] = 0;
if (node->left) {
leftDaughter = printTreeNode(node->left, labelLength);
}
if (node->right) {
rightDaughter = printTreeNode(node->right, labelLength);
}
buffer = malloc(500);
sprintf(buffer, "a: %s, id: %d, state: %d, left: [%s], right: [%s]",
label, node->id, node->state, leftDaughter, rightDaughter);
free(label);
if (leftDaughter) {
free(leftDaughter);
}
if (rightDaughter) {
free(rightDaughter);
}
return buffer;
}

Again, this gives correct output if I invoke it as a C program, but now
the output I get when wrapping this function through JNI is the following:

a: 0pš, id: 0, state: 0, left: [a: 0Ϳ, id: 0, state: 0, left:
[(null)], right: [(null)]], right: [(null)

So there is still something going wrong, and the output is chopped off
arbitrarily. Do I have to make sure buffer is null-terminated as well?
Maybe I should use calloc(labelLength + 1, sizeof(char)) for label?

Thanks, H.
--
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
http://aouw.org
Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFHmhL3e+7xMGD3itQRArGuAJsHjem1EKZVsA4hAwHNRD007WtwygCffPRC
ZseDrb68Us6e9zaQ/JDsqdI=
=7W6g
-----END PGP SIGNATURE-----
 
H

Hendrik Maryns

Hendrik Maryns schreef:
Harald van Dijk schreef:
Important to notice here is that, although mA is typedef'ed to char*,
it is not a string, but really an array of char, containing 0 and 1 (so
some primitive form of bit vector). However, when printing it, I want
to see '0' and '1', of course.

My first try was the following:

void printTreeNode(mgTreeNode * node, int labelLength) {
int i;
char *label = malloc(labelLength);
for (i = 0; i < labelLength; ++i) {
label = node->a + '0';
}
printf("a: %s, id: %d, state: %d, left: [", label, node->id,
node->state);
<snip>

You do not show the code for storing information in node->a, so we can't
tell whether a is even valid.


Just suppose it is. As I said, there is a way to know the length of the
array.
The constant '0' is the character zero, which for ASCII is the integer
48. So you are adding 48 to each character?? What is the purpose of
this?

a is stated to contain 0 or 1, and a+'0' converts it to '0' or
'1', as intended.
Then you print using %s. But that requires a NULL-terminated array of
characters. '\0' and '0' are two very different things.

This is correct, and exactly the problem. The argument to malloc
should be labellength+1, and label[labellength] should be set to '\0'.


I corrected the code as follows:

char *printTreeNode(mgTreeNode * node, int labelLength) {
int i;
char *buffer;
char *leftDaughter = 0;
char *rightDaughter = 0;
char *label = malloc(labelLength+1);
for (i = 0; i < labelLength; ++i) {
label = node->a + '0';
}
label[labelLength] = 0;
if (node->left) {
leftDaughter = printTreeNode(node->left, labelLength);
}
if (node->right) {
rightDaughter = printTreeNode(node->right, labelLength);
}
buffer = malloc(500);
sprintf(buffer, "a: %s, id: %d, state: %d, left: [%s], right: [%s]",
label, node->id, node->state, leftDaughter, rightDaughter);
free(label);
if (leftDaughter) {
free(leftDaughter);
}
if (rightDaughter) {
free(rightDaughter);
}
return buffer;
}

Again, this gives correct output if I invoke it as a C program, but now
the output I get when wrapping this function through JNI is the following:

a: 0pš, id: 0, state: 0, left: [a: 0Ϳ, id: 0, state: 0, left:
[(null)], right: [(null)]], right: [(null)

So there is still something going wrong, and the output is chopped off
arbitrarily. Do I have to make sure buffer is null-terminated as well?
Maybe I should use calloc(labelLength + 1, sizeof(char)) for label?


I am sorry, please ignore this, I was invoking the function with an
incorrect length parameter. Everything works fine now.

H.
--
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
http://aouw.org
Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.4-svn0 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFHmhRpe+7xMGD3itQRAv6lAJwI4DYJZBPhG/83akal5TSdg0HUbgCeKsyB
E+Fz4I1s6ZaQHH0a5rLs7iI=
=N+gh
-----END PGP SIGNATURE-----
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top