How to retrieve the name of the file from a FILE *

C

CBFalconer

Jarno said:
. snip ...

How do you hash pointers portably?

You cast them into integers (allowable, but not the inverse) and
use hashing methods for integers (see the references in my hashlib
package and its tests). The operation of the hash table only
requires equal/non-equal comparisons of pointers, which is always
allowable.

Please refrain from snipping attributes for material you quote.
 
C

CBFalconer

Dan said:
Your method doesn't cover the predefined streams, which are the
interesting cases. If I open a file myself, I already know its name,
but it is sometimes helpful to know where your stdin data comes from.

Actually it is worse than that, because any stream may be connected
to one (or more) i/o devices or disk files, which also have names
or designators in most systems. Then that device may be connected
to something else, which in turn has its own cascade of names.

So it is better to simply cut the Gordian knot, and say that the
names are unknown to the program.
 
J

jacob navia

Under linux making

ls -l /proc/self/fd

will print a nice table with each integer file descriptor linked
to the real file it is using (/dev/pts1 for a console file),
or /home/jacob/getfilename.c for a real file

So, that's how you do it under linux.

Maybe other OSes will follow.
 
J

Jonathan Adams

jacob navia said:
Under linux making

ls -l /proc/self/fd

will print a nice table with each integer file descriptor linked
to the real file it is using (/dev/pts1 for a console file),
or /home/jacob/getfilename.c for a real file

So, that's how you do it under linux.

Maybe other OSes will follow.

<OT>
Solaris 10 has similar functionality:

% ls -l /proc/self/path/[0-9]*
</OT>

Cheers,
- jonathan
 
J

Jarno A Wuolijoki

You cast them into integers (allowable, but not the inverse) and
use hashing methods for integers (see the references in my hashlib
package and its tests). The operation of the hash table only
requires equal/non-equal comparisons of pointers, which is always
allowable.

Does the standard guarantee that pointers that compare equal convert to
integers that do so as well?
(think of x86 real mode, b000:8000 vs b800:0000)

Please refrain from snipping attributes for material you quote.

Oops. I accidentally followed my own queer 'netiquette' instead of ng's.

(That is, I tend to think that only first level attributions are really
relevant in the context of my reply. I learned this ugly habit in
BBS's where it was typical to nest much farther than here)
 
K

Keith Thompson

You alias them with an array of unsigned char of size sizeof(FILE *).

That was my thought as well, but I can imagine an implementation in
which two FILE* values have the same value (as pointers) but different
representations (as arrays of unsigned char). Realistically, an
implementation is unlikely to generate two such distinct
representations for the same value, but I think a conforming
implementation could do so.
 
C

CBFalconer

Jarno said:
Does the standard guarantee that pointers that compare equal convert
to integers that do so as well?
(think of x86 real mode, b000:8000 vs b800:0000)

It doesn't matter. You are working only with the value that was
returned from fopen.
 
F

Fao, Sean

jacob said:
Yes, it is based on the win32 API. That is why I did not publish the
code here, just giving a pointer to the code

I suppose you think that made it right?
 
G

goose

jacob navia said:
Recently there was a discussion in this group about
how to retrieve the file name given a FILE *.

The question raised my curiosity, and after some
research I have come up with a good implementation.

The solution is in the tutorial for lcc-win32
(http://www.cs.virginia.edu/~lcc-win32) page
331.

It piqued my curiosity too, but I doubt that
a good implementation of this exists. I've not read
your solution; make it plain text and I'll read it,
I dont download binaries (at all!).

Here is my solution for the group to pick at :)

#include <stdio.h>
#include <stdlib.h>

#define fopen(x,y) (save_name(x, y))

FILE *save_name (char *name, char *mode) {
FILE *t = (fopen) (name, mode);
if (t) {
/* here we save t and filename
and mode somwhere; in an array
maybe?
*/
printf ("%s saved\n", name);
}
return t;
}

int main (void) {
FILE *test = fopen ("test.txt", "w");

if (test==NULL) {
printf ("failure\n");
} else {
printf ("success\n");
fprintf (test, "success\n");
fclose (test);
}

return EXIT_SUCCESS;

}

goose,
I suspect that the above is not allowed, please comment.
 
D

Dan Pop

In said:
You cast them into integers (allowable, but not the inverse)

Wrong. The cast is allowed in both directions, but the results are not
guaranteed to be meaningful in any direction.

For maximal portability, you have to use the unsigned char array approach.
Even on C99, intptr_t is an optional typedef.

Dan
 
D

Dan Pop

In said:
That was my thought as well, but I can imagine an implementation in
which two FILE* values have the same value (as pointers) but different
representations (as arrays of unsigned char). Realistically, an
implementation is unlikely to generate two such distinct
representations for the same value, but I think a conforming
implementation could do so.

It doesn't matter: you get only one representation from fopen() and you
keep using it. There is no way for that representation to metamorphose
into the other.

Dan
 
M

Michael Wojcik

That was my thought as well, but I can imagine an implementation in
which two FILE* values have the same value (as pointers) but different
representations (as arrays of unsigned char). Realistically, an
implementation is unlikely to generate two such distinct
representations for the same value, but I think a conforming
implementation could do so.

I'm dubious about that, but another option is converting the value
to a string with sprintf and the %p conversion specifier. fscanf
requires that converting the result of a *printf %p generated by
the same program execution using the *scanf %p conversion specifier
produce a void* that compares equal to the original pointer; thus, %p
must produce unique strings for each distinct pointer value (during a
given execution of that program). (C90 7.9.6.2)
 
K

Keith Thompson

It doesn't matter: you get only one representation from fopen() and you
keep using it. There is no way for that representation to metamorphose
into the other.

You may be right, but I'm still not quite sure of that. Could a
pointer assignment change its representation? Similarly, can a
floating-point assignment change the representation (without changing
the represented value)? I *think* it can; for example, loading a
floating-point value into a register might automatically normalize it.
The same thing could happen with an address register. As long as the
before and after values compare equal, I don't see a problem.

Realistically, though, if automatic pointer normalization happens so
easily, it's unlikely that a non-normalized pointer could survive long
enough to be returned from fopen().

If my guess is right, hashing pointers by converting them to arrays of
unsigned char will probably work reliably on every system other than
the DS9000.
 
C

CBFalconer

Dan said:
CBFalconer said:
You cast them into integers (allowable, but not the inverse)

Wrong. The cast is allowed in both directions, but the results are
not guaranteed to be meaningful in any direction.

For maximal portability, you have to use the unsigned char array
approach. Even on C99, intptr_t is an optional typedef.


I have my doubts. Consider that the representation of a pointer
may contain trap bits, which are accessed by the unsigned char
attack. There is no guarantee that those trap bits do not change
with time and/or actual storage location (of the pointer). The
cast technique eliminates those trap bits. If it doesn't convert
back to the pointer, so what, it is just one phase of the hashing
mechanism.

So I claim that the cast makes the pointer to hash function single
valued, while the unsigned char approach does not. I would be hard
put to find a system where the unsigned char method would not work,
but it is not guaranteed.
 
K

Keith Thompson

CBFalconer said:
Dan Pop wrote: [...]
For maximal portability, you have to use the unsigned char array
approach. Even on C99, intptr_t is an optional typedef.


I have my doubts. Consider that the representation of a pointer
may contain trap bits, which are accessed by the unsigned char
attack. There is no guarantee that those trap bits do not change
with time and/or actual storage location (of the pointer). The
cast technique eliminates those trap bits. If it doesn't convert
back to the pointer, so what, it is just one phase of the hashing
mechanism.


Did you mean padding bits rather than trap bits? A type can have trap
*representations*, but a valid pointer value (of the kind that we're
interested in hashing) won't be one of them.

I don't believe that the cast necessarily eliminates padding bits.

Assume the following:

void *p1 = foo();
void *p2 = bar();
uintptr_t u1 = uintptr_t(p1);
uintptr_t u2 = uintptr_t(p1);

Assume that p1 == p2 (they point to the same address), but that they
have different internal representations (perhaps one is normalized and
the other is not).

We know from C99 7.18.1.4 that (void*)u1 == (void*)u2, but we don't
know that u1 == u2. For example, if the cast simply copies the bits,
the values of u1 and u2 would reflect the difference in
representations of the two pointer values; converting back to void*
yields two pointers that have different representations, but compare
equal to each other.

The cast *might* normalize the representation, but it doesn't have to.
 
C

Chris Torek

[regarding hashing pointers by first converting them to uintptr_t]

Assume the following:

void *p1 = foo();
void *p2 = bar();
uintptr_t u1 = uintptr_t(p1);
uintptr_t u2 = uintptr_t(p1);

Minor nit: this is (old) C++ syntax; you mean:

uintptr_t u1 = (uintptr_t)p1;

and so on.
Assume that p1 == p2 (they point to the same address), but that they
have different internal representations (perhaps one is normalized and
the other is not).

We know from C99 7.18.1.4 that (void*)u1 == (void*)u2, but we don't
know that u1 == u2. For example, if the cast simply copies the bits,
the values of u1 and u2 would reflect the difference in
representations of the two pointer values; converting back to void*
yields two pointers that have different representations, but compare
equal to each other.

The cast *might* normalize the representation, but it doesn't have to.

Indeed, consider the historical implementations that are the very
reason the C standards are full of this kind of weirdness with
pointer arithmetic. In other words, think back to the 1980s and
C compilers for the IBM PC that ran under MS-DOS with its various
"extender" schemes to access more than 64K and 640K of memory.

One of the models under which code ran had 20-bit pointers, so that
uintptr_t would have to be defined as "unsigned long" ("int" being
only 16 bits on these compilers). If functions foo() and bar()
returned "un-normalized" pointers, and you assigned these to u1 and
u2 via casts, you get -- unnormalized integers. The "normalization"
operation was done by the "==" operators (only). Relational
comparisons ("<" and ">", and their "<=" and ">=" variants) compared
only offsets. This led to the peculiar case that:

printf("p1 is %sequal to p2\n", p1 == p2 ? "" : "not ");
printf("p1 is %sless than p2\n", p1 < p2 ? "" : "not ");

would sometimes print:

p1 is equal to p2
p1 is less than p2

In other words, p1 < p2 && p1 == p2, both at the same time.

(The only things that behave this way on modern CPUs are floating
point numbers. :) If x is set to NaN, a surprising number of
comparisons all produce "false" as their result.)
 
K

Keith Thompson

Chris Torek said:
[regarding hashing pointers by first converting them to uintptr_t]

Assume the following:

void *p1 = foo();
void *p2 = bar();
uintptr_t u1 = uintptr_t(p1);
uintptr_t u2 = uintptr_t(p1);

Minor nit: this is (old) C++ syntax; you mean:

uintptr_t u1 = (uintptr_t)p1;

and so on.

D'oh! (It wasn't (deliberately) C++ syntax, it was just a mistake;
I'm not going to admit to the thought process that led to it.) And I
used the wrong variable on the last line. What I meant, of course,
was:

void *p1 = foo();
void *p2 = bar();
uintptr_t u1 = (uintptr_t)p1;
uintptr_t u2 = (uintptr_t)p2;

[snip]

Thanks for confirming (somewhat to my surprise) that there are
real-world examples of what I was talking about.
 
M

Mark McIntyre

Obviously I have just written it, so it is not well tested. Can you give
any examples for your assertion? In which circumstances it doesn't work?

When I run it through my C interpreter on my Palmpilot, or compile it on my
Vax 8800, and on my IBM S/360. And it also fails on my spare PC, on my Mac,
on my Atari, on my Symbian phone, etc etc....

But I think you probably knew that !
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,147
Messages
2,570,834
Members
47,382
Latest member
MichaleStr

Latest Threads

Top