this function being a great example of CLC pedantics...
personally, I would just be like "to hell with it" and just use
fseek/ftell,
since there is almost no way that a text file is going to be larger than
2GB
and expected to just be read into memory like this (it wouldn't fit on 32
bit systems anyways...).
<--
Unless you have a data file that's over 2GB and you convert it to an
ASCII representation for 'grep-ability' and now it's ballooned to over
9GB. Not saying that it's a good thing, but I have done it before
when I've been in a pinch.
-->
note: by "and" I meant T&T=T, not T|T=T.
but, otherwise, potentially...
this depends on the app then.
I suspect this is likely to be a very rare case though, where most text one
deals with is typically in the kB or low MB range...
typically, on a 32-bit system, even 500MB malloc's are prone to fail,
since
often some random DLL will end up mapped right into the middle of an
otherwise clean 1GB span of memory, of the theoretically 2GB or 3GB or so
available to the app. (for those few apps which try to allocate memory in
such large chunks...).
<--
I experience similar issues with 500MB malloc's failure, and I've not
been able to determine why. I don't allocate memory that large, but I
have some test programs that try to blow up memory and noticed the max
was around 500MB.
-->
this is due mostly to address-space fragmentation...
as the size gets larger, the probability of there being an unbroken run of a
given size becomes increasingly smaller (although, the specifics depend on
the exact memory layout and allocation behaviors).
but, 500MB seems to be a good breakoff, as:
it reprsents, typically, around 1/4 of the total app-usable address space;
typically, the OS will load libraries, ... in ways which are not typically
convinient for maintaining an unbroken space.
there is much less of an issue on 64-bit systems, but there are still
issues, such as trying to allocate a huge chunk of memory (several GB or
more) will often horridly bog down the system IME (or, at least on Windows).
on Windows, a better strategy was to use reserved/uncommitted memory (via
VirtualAlloc), and then to commit it piecewise...
in this particular case, I had allocated a 4GB chunk of memory, mostly as I
needed a region of memory with a "window" where any point in the space could
be reached via a 32-bit offset, whereas with the wider (64-bit) address
space, I couldn't make any guerantees about distance (2GB would be maybe
better though, as 2GB allows any point to be reached from any other via a
signed 32-bit offset).
I haven't tested on Linux to see what happens with large malloc or mmap
calls.
but, memory usage patterns is its own complicated set of issues.
however, given that the average size of most "sane" text files is
measurable
in kB, with a few larger ones being a few MB, usually there is little risk
of size overflows just using the fseek/ftell strategy...
(although, yes, if the file *is* larger than 2GB or so, then the size may
come back negative or otherwise completely wrong...).
but, oh well, whatever...
<--
As soon as you say getting the size of a file, afaik it's outside the
scope of the standard. I know in my environment I can't use ftell to
get a 2GB+ NTFS file size since sizeof( long int ) is 4, and thus I
have to resort to using a system specific method using uint64_t and
GetFileAttributeExA. Unfortunately there is no completely portable
method to get a file size. The best thing in my opinion is to make a
wrapper function that encapsulates the non-portable behavior so that
the non-portability is centralized in one spot if at all possible.
-->
agreed...
IME, one typically ends up with a lot of OS-specific wrappers, usually
dedicated to their own source files (like, having various OS-specific source
files, and then swapping them out for the OS in question).
this is typically much cleaner than the use of fine-grained ifdef's
(although, sometimes I have used coarse-grained ifdef's in these files,
typically so that I don't have to jerk off the Makefile so much, and so the
entire file contents disappear if not on the OS in question).
<--
Of course, I wouldn't fault someone for using the ftell/fseek combo if
that is all that is needed to handle his range of input file sizes as
long as that person doesn't claim it's portable.
-->
one can claim it is portable (since it *is* part of the standard), so long
as they also don't claim it will work with larger files...