Finding a string in executable

A

Angus

I have a very simple program as below:

int main(){
char* mystring = "ABCDEF";
return 0;
}

I have built this program without any debugging symbols included. If
I open the program in a hex editor I cannot find the string ABCDEF.
Should this string not be stored sequentially in some area of the
executable?
 
A

Angel

I have a very simple program as below:

int main(){
char* mystring = "ABCDEF";
return 0;
}

I have built this program without any debugging symbols included. If
I open the program in a hex editor I cannot find the string ABCDEF.
Should this string not be stored sequentially in some area of the
executable?

That is machine- and compiler-dependant, the C standard says nothing
about it. Perhaps on some exotic platform, the string might be compressed,
encrypted or fragmented.

On a 32 bit Linux system with gcc, I can see the string with the
'strings' program.

# strings blah
/lib/ld-linux.so.2
__gmon_start__
libc.so.6
_IO_stdin_used
__libc_start_main
GLIBC_2.0
PTRh
[^_]
ABCDEF
 
M

Mark Bluemel

I have a very simple program as below:

int main(){
   char* mystring = "ABCDEF";
   return 0;

}

I have built this program without any debugging symbols included.  If
I open the program in a hex editor I cannot find the string ABCDEF.
Should this string not be stored sequentially in some area of the
executable?

You don't use the string, so there's no reason why the compiler
shouldn't optimise it away, is there?
 
A

Angus

I have a very simple program as below:
int main(){
   char* mystring = "ABCDEF";
   return 0;
}
I have built this program without any debugging symbols included.  If
I open the program in a hex editor I cannot find the string ABCDEF.
Should this string not be stored sequentially in some area of the
executable?

That is machine- and compiler-dependant, the C standard says nothing
about it. Perhaps on some exotic platform, the string might be compressed,
encrypted or fragmented.

On a 32 bit Linux system with gcc, I can see the string with the
'strings' program.

# strings blah
/lib/ld-linux.so.2
__gmon_start__
libc.so.6
_IO_stdin_used
__libc_start_main
GLIBC_2.0
PTRh
[^_]
ABCDEF

I think the compiler optimised the string away - ie the string wasn't
used so it just removed it. If you follow with puts(mystring) then
you do see the string in the exe.

Reason for question is to work out why declaration of string seems to
show different behaviour (on MS compiler anyway).

Refined question is:
#include <stdio.h>

int main(){
char* mystring = "ABCDEFGHIJKLMNO";
puts(mystring);

char otherstring[15];
otherstring[0] = 'a';
otherstring[1] = 'b';
otherstring[2] = 'c';
otherstring[3] = 'd';
otherstring[4] = 'e';
otherstring[5] = 'f';
otherstring[6] = 'g';
otherstring[7] = 'h';
otherstring[8] = 'i';
otherstring[9] = 'j';
otherstring[10] = 'k';
otherstring[11] = 'l';
otherstring[12] = 'm';
otherstring[13] = 'n';
otherstring[14] = 'o';
puts(otherstring);

return 0;
}


Compiler was MS VC++.

Whether I build this program with or without optimisations I can find
the string "ABCDEFGHIJKLMNO" in the executable using a hex editor.

However, I cannot find the string "abcdefghijklmno"

What is the compiler doing that is different for otherstring?


The hex editor I used was Hexedit - but tried others and still
couldn't find otherstring. Anyone any ideas why not or how to find?

By the way I am not doing this for hacking reasons.
 
M

Mark Bluemel

That is machine- and compiler-dependant, the C standard says nothing
about it. Perhaps on some exotic platform, the string might be compressed,
encrypted or fragmented.
On a 32 bit Linux system with gcc, I can see the string with the
'strings' program.
# strings blah
/lib/ld-linux.so.2
__gmon_start__
libc.so.6
_IO_stdin_used
__libc_start_main
GLIBC_2.0
PTRh
[^_]
ABCDEF

I think the compiler optimised the string away - ie the string wasn't
used so it just removed it.  If you follow with puts(mystring) then
you do see the string in the exe.

Reason for question is to work out why declaration of string seems to
show different behaviour (on MS compiler anyway).

Refined question is:
#include <stdio.h>

int main(){
   char* mystring = "ABCDEFGHIJKLMNO";
   puts(mystring);

  char otherstring[15];
  otherstring[0]  = 'a';
  otherstring[1]  = 'b';
  otherstring[2]  = 'c';
  otherstring[3]  = 'd';
  otherstring[4]  = 'e';
  otherstring[5]  = 'f';
  otherstring[6]  = 'g';
  otherstring[7]  = 'h';
  otherstring[8]  = 'i';
  otherstring[9]  = 'j';
  otherstring[10] = 'k';
  otherstring[11] = 'l';
  otherstring[12] = 'm';
  otherstring[13] = 'n';
  otherstring[14] = 'o';
  puts(otherstring);

  return 0;

}

Compiler was MS VC++.

Whether I build this program with or without optimisations I can find
the string "ABCDEFGHIJKLMNO" in the executable using a hex editor.

However, I cannot find the string "abcdefghijklmno"

What is the compiler doing that is different for otherstring?
What is otherstring initialised to (note, I didn't ask what value is
eventually assigned to otherstring)?

Note also that otherstring is not a legal C string as it is not
(guaranteed to be) null-terminated.
 
B

Ben Bacarisse

Angus said:
I have a very simple program as below:

int main(){
char* mystring = "ABCDEF";
return 0;
}

I have built this program without any debugging symbols included. If
I open the program in a hex editor I cannot find the string ABCDEF.
Should this string not be stored sequentially in some area of the
executable?

The string is not used so at least one explanation presents itself: the
optimiser has noticed that there is not need to store the string.

Other explanations are possible. Because the string is ABCDEF I am
worried you might be looking for hexadecimal ABCDEF which is not at all
the same as looking for the hexadecimal representation of the characters
ABCDEF.
 
P

puppi

That is machine- and compiler-dependant, the C standard says nothing
about it. Perhaps on some exotic platform, the string might be compressed,
encrypted or fragmented.
On a 32 bit Linux system with gcc, I can see the string with the
'strings' program.
# strings blah
/lib/ld-linux.so.2
__gmon_start__
libc.so.6
_IO_stdin_used
__libc_start_main
GLIBC_2.0
PTRh
[^_]
ABCDEF

I think the compiler optimised the string away - ie the string wasn't
used so it just removed it.  If you follow with puts(mystring) then
you do see the string in the exe.

Reason for question is to work out why declaration of string seems to
show different behaviour (on MS compiler anyway).

Refined question is:
#include <stdio.h>

int main(){
   char* mystring = "ABCDEFGHIJKLMNO";
   puts(mystring);

  char otherstring[15];
  otherstring[0]  = 'a';
  otherstring[1]  = 'b';
  otherstring[2]  = 'c';
  otherstring[3]  = 'd';
  otherstring[4]  = 'e';
  otherstring[5]  = 'f';
  otherstring[6]  = 'g';
  otherstring[7]  = 'h';
  otherstring[8]  = 'i';
  otherstring[9]  = 'j';
  otherstring[10] = 'k';
  otherstring[11] = 'l';
  otherstring[12] = 'm';
  otherstring[13] = 'n';
  otherstring[14] = 'o';
  puts(otherstring);

  return 0;

}

Compiler was MS VC++.

Whether I build this program with or without optimisations I can find
the string "ABCDEFGHIJKLMNO" in the executable using a hex editor.

However, I cannot find the string "abcdefghijklmno"

What is the compiler doing that is different for otherstring?

The hex editor I used was Hexedit - but tried others and still
couldn't find otherstring.  Anyone any ideas why not or how to find?

By the way I am not doing this for hacking reasons.

The other string, "abcdefghijklmno", simply doesn't exist in the .exe!
You are setting the elements of otherstring[] with the values 'a',
'b', 'c', ... . That's different from an initialization as in the case
of *mystring. In the latter, the compiler allocates space for
"ABCDEFGHIJKLMNO" and places it there. In the former, it simply
allocates space for it and that's it. The characters are "inserted" in
the string at runtime. The characters 'a', 'b', 'c', ... are probably
there in the .exe, but they are not pure data. They are part of code.
That's why you don't simply see a string.
What will be seen is truly compiler and architecture dependant. In my
PC, with an AMD64 processor, running Linux and compiling with gcc in
ELF64 executable format, I see under a hex editor:
E.a.E.b.E.c.E.d.E.e.E.f.E.g.E.h.E.i.E.j.E.k.E.l.E.m.E.n.E.o
(where the dot represents a non-printable character).
Each of those "E.[character]" is an operation to set a character of
otherstring[].
I said these characters are PROBABLY there in the .exe because the
compiler could have used a different approach. Instead, it could store
the value 'a' somewhere (in a register probably), and at each
attribution increment that value, producing the next (since they are
sequential letters).
Anyway, that's not really a C question. You'd be better off posting
this question in comp.lang.asm.x86
 
T

Thad Smith

Refined question is:
#include<stdio.h>

int main(){
char* mystring = "ABCDEFGHIJKLMNO";
puts(mystring);

char otherstring[15];
otherstring[0] = 'a';
otherstring[1] = 'b';
otherstring[2] = 'c';
otherstring[3] = 'd';
otherstring[4] = 'e';
otherstring[5] = 'f';
otherstring[6] = 'g';
otherstring[7] = 'h';
otherstring[8] = 'i';
otherstring[9] = 'j';
otherstring[10] = 'k';
otherstring[11] = 'l';
otherstring[12] = 'm';
otherstring[13] = 'n';
otherstring[14] = 'o';
puts(otherstring);

return 0;
}


Compiler was MS VC++.

Whether I build this program with or without optimisations I can find
the string "ABCDEFGHIJKLMNO" in the executable using a hex editor.

However, I cannot find the string "abcdefghijklmno"

What is the compiler doing that is different for otherstring?

For starters, your program doesn't define a string "abcdefghijklmno". If you
want some insight, I suggest looking at an assembly listing of the generated
code (or disassembly in a debugger) to see what your compiler generates. Other
compilers may generate different code.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,952
Messages
2,570,111
Members
46,695
Latest member
Juliane58C

Latest Threads

Top