C++ programming challenge

Chris M. Thomasson · Jun 11, 2009

blargg said:
:
[...]

#define GET_COUNT(mp_self, mp_index) ( \
(mp_self)[(mp_index) + 'a'] + \
(mp_self)[(mp_index) + 'A'] \
)

Click to expand...

[...]

Why do you keep posting non-portable programs that assume ASCII?

I have no good reason why.

toupper/tolower are your friends; use them! If you think they pose a
performance problem, you're doing the algorithm wrong, as you should only
combine lower and uppercase counts AFTER you've counted frequencies, not
during.

Actually, I am combining the counts after processing the data:

/* processing phase */
while ((size = fread(buf, 1, BUFFER, file))) {
size_t tmpsize = size;

while (tmpsize) {
++counts[buf[--tmpsize]];
}

if (size < BUFFER) break;
}

/* combine to acquire total */
for (i = 0; i < 26; ++i) {
total += GET_COUNT(counts, i);
}

Perhaps I will code something up that will handle something other than
ASCII.

Ioannis Vranos · Jun 11, 2009

Peter said:
On Thu, 11 Jun 2009, Alf P. Steinbach wrote:

/.../
/.../

You can download the file via a web based torrent relay, e.g.
http://www.torrentrelay.com

Sincerely,
Peter Jansson

Alf is a troll with persistent storage (for years).

--
Ioannis A. Vranos

C95 / C++03 Developer

http://www.cpp-software.net

Alf P. Steinbach · Jun 11, 2009

* Ioannis Vranos:

Alf is a troll with persistent storage (for years).

As I read it you're trying to make a personal attack.

That reflects badly on you.

- Alf

Thomas J. Gritzan · Jun 11, 2009

Ioannis said:
Thomas said:

Ioannis said:

Thomas J. Gritzan wrote:
Ioannis Vranos schrieb:
do
{
inputFile.read(buffer, sizeof(buffer));

for(streamsize i= 0; i< inputFile.gcount(); ++i)
++characterFrequencies[ buffer ];

}while(not inputFile.eof());

Click to expand...

By the way, this "not" instead of "!" is horrible. I would write:

Click to expand...

Horrible? "not" is a *built in* keyword, as "and" and "or". They are
more user readable.

You don't have to use it just because it's a keyword. register also is a
keyword, and you shouldn't use it, because it doesn't have any meaning
with current compilers.

About "not" or "!", C++ programmers are more used to the symbol. I
rarely see the use of not/and/or in this newsgroup. It might be more
readable to you, but it's not more readable in general.

My code doesn't take into account hardware failures. Also if I am not
wrong, while(inputFile) does not check for EOF, neither while(cin>>
something) that the FAQ uses.

Click to expand...

while(inputFile) doesn't check for EOF, and that's necessary.
For example:

int i;
while (file >> i)
{
// process i
}

If the extraction reads a number at the end of the file, the EOF bit
will be set. If the condition would check EOF, the last number wouldn't
be processed. But the next time you try to extract a number, failbit (or
badbit?) will be set and the loop ends correctly.

If there's an idiomatic usage, use it.

Ioannis Vranos · Jun 12, 2009

Thomas said:
About "not" or "!", C++ programmers are more used to the symbol. I
rarely see the use of not/and/or in this newsgroup. It might be more
readable to you, but it's not more readable in general.

It is more human readable. Also things eventually progress. Since it is not bad style to use "and", "or", and
"not" for conditions, but instead it makes the code more readable, it is a good style to use them.

while(inputFile) doesn't check for EOF, and that's necessary.
For example:

int i;
while (file >> i)
{
// process i
}

If the extraction reads a number at the end of the file, the EOF bit
will be set. If the condition would check EOF, the last number wouldn't
be processed. But the next time you try to extract a number, failbit (or
badbit?) will be set and the loop ends correctly.

If there's an idiomatic usage, use it.

If the extraction reads the last number stored in the file, the EOF bit will not be set. At the next read, EOF
will be set, however the condition will continue to be true. That means that although EOF has been reached, i
will still be processed. At the next read, EOF will be read again, and the condition will fail.

So I think the EOF check should be placed in the condition.

--
Ioannis A. Vranos

C95 / C++03 Developer

http://www.cpp-software.net

Chris M. Thomasson · Jun 12, 2009

Peter Jansson said:
Dear news group,

I have created a small programming challenge for those of you who are
interested in challenging your Standard C++ programming skills. The
challenge is about counting character frequency in large texts,
perhaps useful for spam filtering or classical crypto analysis. You
can read more about it here:

http://blog.p-jansson.com/2009/06/programming-challenge-letter-frequency.html

Here is what I think will be my final submission... It should please some
people simply because it does not rely on pure ASCII. It can operate on
other character sets as well (e.g., EBCDIC). It should maintain its speed
wrt my other submission that is currently getting the best timing results.

One note, this version caculates to types of frequences. It gives output
like:

287924224 | 226951168
___________________________________________________
a %5.454 | %6.920
b %0.916 | %1.162
c %3.315 | %4.205
d %2.615 | %3.317
e %9.179 | %11.645
f %2.017 | %2.559
g %1.494 | %1.895
h %3.013 | %3.823
i %6.163 | %7.818
j %0.080 | %0.101
k %0.504 | %0.639
l %2.677 | %3.397
m %1.866 | %2.368
n %5.412 | %6.865
o %7.395 | %9.381
p %2.208 | %2.801
q %0.100 | %0.126
r %6.200 | %7.865
s %4.780 | %6.064
t %6.954 | %8.822
u %2.344 | %2.974
v %0.930 | %1.180
w %1.181 | %1.498
x %0.159 | %0.202
y %1.838 | %2.332
z %0.031 | %0.040

This was computed using the textfile I downloaded from the torrent on Peter
Jansson's blog. The first column is the displays the total number of
characters in the file, and the relative frequencies for that number. The
second column displays the total number of "valid" characters, and the
relative frequencies for that particular number. A valid character passes
the following constraint:

bool is_valid(int c) {
return (isalpha(c) && isprint(c));
}

Okay, here is the actual code:
_____________________________________________________________________
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#include <assert.h>
#include <limits.h>
#include <ctype.h>

#if ! defined (FILE_MODE)
# define FILE_MODE "rb"
#endif

#define BUFFER 65536U

#define CALC_FREQ(mp_count, mp_total) ( \
(mp_total) ? ((double)(mp_count) / \
(double)(mp_total)) * 100.0 : 0.0 \
)

int main(int argc, char** argv) {
if (argc > 1) {
clock_t const start = clock();
FILE* file = fopen(argv[1], FILE_MODE);

if (file) {
size_t size;
int status = EXIT_SUCCESS;
unsigned long total_chars = 0;
unsigned long total_valid_chars = 0;
unsigned long counts[UCHAR_MAX + 1] = { 0 };
static unsigned char buf[BUFFER];

while ((size = fread(buf, 1, BUFFER, file))) {
size_t tmpsize = size;

while (tmpsize) {
++counts[buf[--tmpsize]];
}

if (size < BUFFER) break;
}

if (ferror(file)) status = EXIT_FAILURE;
if (fclose(file)) status = EXIT_FAILURE;

if (status == EXIT_SUCCESS) {
size_t i;
clock_t stop;

for (i = 0; i <= UCHAR_MAX; ++i) {
total_chars += counts;
if (counts && isalpha(i) && isprint(i)) {
total_valid_chars += counts;
if (isupper(i)) {
counts[(unsigned char)tolower(i)] += counts;
if ((unsigned char)tolower(i) > i) {
total_chars -= counts;
total_valid_chars -= counts;
}
}
}
}

printf("%lu | %lu\n"
"___________________________________________________\n",
total_chars, total_valid_chars);

for (i = 0; i <= UCHAR_MAX; ++i) {
if (isalpha(i) && isprint(i) && islower(i)) {
printf("%c %%%.3f | %%%.3f\n",
(char)tolower(i), CALC_FREQ(counts, total_chars),
CALC_FREQ(counts, total_valid_chars));
}
}

stop = clock();

printf("\nelapsed time: %lums\n",
(unsigned long)((((double)stop - (double)start)
/ CLOCKS_PER_SEC) * 1000.0));

return EXIT_SUCCESS;
}
}
}

fprintf(stderr, "file error...");

assert(0);

return EXIT_FAILURE;
}
_____________________________________________________________________

Any thoughts?

Chris M. Thomasson · Jun 12, 2009

[...]
(Also, the proposal says nothing about the encoding of the
input file. If he's running under Ubantu, the default
encoding is UTF-8, and properly handling case folding for
UTF-8 could be significantly more time consuming than a
naïve implementation which ignores the reality of accented
characters.)

Click to expand...

Well, then perhaps one can attempt to create a local table and
map in unsigned shorts/UNICODE values directly, or whatever as
concrete indexes into said table?

Click to expand...

How to fold UTF-8 input rapidly would be the challenge. There
are over a million code points (and UTF-8 actually supports over
4 billion), so a Unicode character doesn't fit into a short on
most machines, and the tables would be very, very big. And very
sparce, since very few characters (proprotionally) actually have
a case difference.

[...]

Keeping in line with my previous attitude, if the requirements of the test
dictate that I should provide support for UNICODE, then I would make quick
and dirty example which works on some platforms as a premature optimization
for said platforms wrt future "portable" implementation.

Yikes!

Chris M. Thomasson · Jun 12, 2009

Chris M. Thomasson said:
[...]
(Also, the proposal says nothing about the encoding of the
input file. If he's running under Ubantu, the default
encoding is UTF-8, and properly handling case folding for
UTF-8 could be significantly more time consuming than a
naïve implementation which ignores the reality of accented
characters.)
Well, then perhaps one can attempt to create a local table and
map in unsigned shorts/UNICODE values directly, or whatever as
concrete indexes into said table?

Click to expand...

Click to expand...

How to fold UTF-8 input rapidly would be the challenge. There
are over a million code points (and UTF-8 actually supports over
4 billion), so a Unicode character doesn't fit into a short on
most machines, and the tables would be very, very big. And very
sparce, since very few characters (proprotionally) actually have
a case difference.

[...]

Click to expand...

Keeping in line with my previous attitude, if the requirements of the test
dictate that I should provide support for UNICODE, then I would make quick
and dirty example which works on some platforms as a premature
optimization for said platforms wrt future "portable" implementation.

Yikes!

Well, if I knew that an optimization would apply to most customers, then why
not go ahead and make it for them in an non-portable extension?

fifoforlifo · Jun 12, 2009

I thought this competition sounded fun, so I gave it a shot. The
following is a 2-threaded program that overlaps reads and
computation. I had to use boost::thread for this, but this should all
be doable in C++0x -- but I'll let you be the final judge of whether
it's an acceptable entry. As many pointed out, the problem is I/O
bound, so that's what I tackled first.
The computation part is nearly identical to Chris' code, I could not
beat the simple accumulate-into-256-wide-table. So cheers for finding
the best solution there

FYI timings on my machine were:
Read the entire 280MB file, do no processing : 0.25 seconds
Chris' program : 0.6 seconds
This program : 0.4 seconds

/// This program counts plain ol' alphabet characters' frequency in a
text file.
/// Text file's encoding is assumed to be a superset of ASCII
/// (for example SHIFT-JIS or UTF-8 would work).

/*
CStopWatch s;
s.startTimer();
The_Stuff_You_Want_To_Time_Executes_Here();
s.stopTimer();
You may then get the elapsed time in seconds
via the method getElapsedTime.
*/
#ifdef WIN32
#include <windows.h>
class CStopWatch
{
private:
typedef struct {
LARGE_INTEGER start;
LARGE_INTEGER stop;
} stopWatch;
stopWatch timer;
LARGE_INTEGER frequency;
double LIToSecs( LARGE_INTEGER & L)
{
return ((double)L.QuadPart/(double)frequency.QuadPart);
}
public:
CStopWatch()
{
timer.start.QuadPart=0;
timer.stop.QuadPart=0;
QueryPerformanceFrequency( &frequency );
}
void startTimer() { QueryPerformanceCounter(&timer.start); }
void stopTimer() { QueryPerformanceCounter(&timer.stop); }
double getElapsedTime()
{
LARGE_INTEGER time;
time.QuadPart=timer.stop.QuadPart-timer.start.QuadPart;
return LIToSecs( time );
}
};
#else
#include <sys/time.h>
class CStopWatch
{
private:
typedef struct {
timeval start;
timeval stop;
} stopWatch;
stopWatch timer;
public:
void startTimer( ) { gettimeofday(&(timer.start),0); }
void stopTimer( ) { gettimeofday(&(timer.stop),0); }
double getElapsedTime()
{
timeval res;
timersub(&(timer.stop),&(timer.start),&res);
return res.tv_sec + res.tv_usec/1000000.0;
}
};
#endif

#include <stdio.h>
#include <boost/cstdint.hpp>
#include <boost/thread.hpp>
#include <boost/thread/mutex.hpp>
#include <boost/thread/condition.hpp>

#define BLOCK_SIZE 0x10000u
/// BLOCK_COUNT must be >= 3 for the FIFO to work properly
#define BLOCK_COUNT 0x10u
#define INBUF_SIZE (BLOCK_SIZE * BLOCK_COUNT)

static unsigned char inbuf[INBUF_SIZE];
static volatile int inbufAmt[BLOCK_COUNT];
static volatile size_t readCursor = 0;
static volatile size_t writeCursor = 0;
static size_t totalRead = 0;
static CStopWatch stopWatch;
/// fileReadDone is protected by ioMutex -- otherwise you get a race
condition at EOF
static volatile int fileReadDone = 0;

static boost::mutex ioMutex;
static boost::condition readAvailable;
static boost::condition writeAvailable;

static int totals[256] = {0};

struct Reader {
const char* const pFileName;

Reader(const char* pFileName_) : pFileName(pFileName_) {}

void operator()() {
stopWatch.startTimer();
FILE* pFile = fopen(pFileName, "rb");
if (!pFile)
return;

while (!feof(pFile) && !ferror(pFile)) {
inbufAmt[writeCursor] = fread((void*)&inbuf[BLOCK_SIZE *
writeCursor],
1, BLOCK_SIZE, pFile);
totalRead += inbufAmt[writeCursor];

const size_t nextWriteCursor = (writeCursor + 1) %
BLOCK_COUNT;
while (nextWriteCursor == readCursor) {
boost::mutex::scoped_lock lk(ioMutex);
readAvailable.notify_one();
writeAvailable.wait(lk);
}
writeCursor = nextWriteCursor;
readAvailable.notify_one();
}

{
boost::mutex::scoped_lock lk(ioMutex);
fileReadDone = 1;
readAvailable.notify_one();
}

fclose(pFile);
}
};

static void AccumulateTotals(const unsigned char* pBuffer, size_t
size) {
const unsigned char* pc = pBuffer;
const unsigned char* const pcEnd = pc + size;
for (; pc != pcEnd; ++pc) {
const unsigned char c = *pc;
++totals[c];
}
}

int main(int argc, char** argv) {
if (argc < 2) {
printf("\nusage:\n\t%s <text_file_name>\n", argv[0]);
return -1;
}

// launch a reader thread
Reader reader(argv[1]);
boost::thread readerThread(reader);

// accumulate totals from buffers as they are
while (!fileReadDone) {
while (writeCursor == readCursor) {
boost::mutex::scoped_lock lk(ioMutex);
if (fileReadDone)
break;
writeAvailable.notify_one();
readAvailable.wait(lk);
}
if (fileReadDone)
break;

AccumulateTotals(&inbuf[BLOCK_SIZE * readCursor], inbufAmt
[readCursor]);
readCursor = (readCursor + 1) % BLOCK_COUNT;
writeAvailable.notify_one();
}

long totalAlphaChars = 0;
for (size_t u = 0; u < 26; ++u) {
totalAlphaChars += totals['A' + u] + totals['a' + u];
}

for (size_t u = 0; u < 26; ++u) {
double result = (totals['A' + u] + totals['a' + u]) * 100.0 /
totalAlphaChars;
printf("%c %%%.3f\n", 'a' + u, result);
}

stopWatch.stopTimer();

const double elapsed = stopWatch.getElapsedTime();
printf("\nelapsed time: %f [seconds]\n", elapsed);

return 0;
}

Bart van Ingen Schenau · Jun 12, 2009

Ioannis said:
If the extraction reads the last number stored in the file, the EOF
bit will not be set.

That is not true. If the very last character in the stream is a digit,
then ios_base::eofbit will be set when the last number is successfully
read from the stream.

At the next read, EOF will be set, however the
condition will continue to be true. That means that although EOF has
been reached, i will still be processed. At the next read, EOF will be
read again, and the condition will fail.

So I think the EOF check should be placed in the condition.

That will give you a loop that has an off-by-one error under certain
conditions. Depending on where the test is located, you either process
one item too few or one too many.

Bart v Ingen Schenau

Ioannis Vranos · Jun 12, 2009

Bart said:
That is not true. If the very last character in the stream is a digit,
then ios_base::eofbit will be set when the last number is successfully
read from the stream.

That will give you a loop that has an off-by-one error under certain
conditions. Depending on where the test is located, you either process
one item too few or one too many.

Bart v Ingen Schenau

Here is a test you may do:

#include <iostream>
#include <fstream>

int main(int argc, char **argv)
{
using namespace std;

char c;

ifstream inputFile(argv[argc- 1]);

while(inputFile.read(&c, 1))
{
static int i= 0;

cout<< ++i<< endl;
}

}

Create a text file with one character of the basic character set, and run the program in the style:

../myprogram test.txt

This will print

1
2

and will terminate.

The failfbit is not be set before EOF is read twice.

--
Ioannis A. Vranos

C95 / C++03 Developer

http://www.cpp-software.net

Alf P. Steinbach · Jun 12, 2009

* Ioannis Vranos:

#include <iostream>
#include <fstream>

int main(int argc, char **argv)
{
using namespace std;

char c;

ifstream inputFile(argv[argc- 1]);

while(inputFile.read(&c, 1))
{
static int i= 0;

cout<< ++i<< endl;
}

}

Create a text file with one character of the basic character set, and
run the program in the style:

./myprogram test.txt

This will print

1
2

and will terminate.

I get the output

1

with g++ and MSVC.

Have you checked that the size of your data file is exactly 1 byte?

If the file really is 1 byte, then what is your compiler and system?

The failfbit is not be set before EOF is read twice.

§27.6.1.3/27 "[when] end-of-file occurs on the input sequence ... calls
setstate(failbit|eofbit)"

Cheers & hth.,

- Alf

Ioannis Vranos · Jun 12, 2009

Ioannis said:
Bart said:

That is not true. If the very last character in the stream is a digit,
then ios_base::eofbit will be set when the last number is successfully
read from the stream.

That will give you a loop that has an off-by-one error under certain
conditions. Depending on where the test is located, you either process
one item too few or one too many.

Bart v Ingen Schenau

Click to expand...

Here is a test you may do:

#include <iostream>
#include <fstream>

int main(int argc, char **argv)
{
using namespace std;

char c;

ifstream inputFile(argv[argc- 1]);

while(inputFile.read(&c, 1))
{
static int i= 0;

cout<< ++i<< endl;
}

}

Create a text file with one character of the basic character set, and
run the program in the style:

./myprogram test.txt

This will print

1
2

and will terminate.

The failfbit is not be set before EOF is read twice.

I am wrong.

--
Ioannis A. Vranos

C95 / C++03 Developer

http://www.cpp-software.net

Ioannis Vranos · Jun 12, 2009

Based on the input of other people I corrected the loop condition. Also I improved the timing mechanism to be
more accurate:

#include <valarray>
#include <fstream>
#include <cstdlib>
#include <cstdio>
#include <iostream>
#include <string>
#include <cctype>
#include <ctime>

int main(int argc, char **argv)
{
using namespace std;

// Warning: long double has problems with MINGW compiler for Windows.
//
// The C++ basic character set is using the value range [0, 127].
// If we used vector<long double>, it would not have any run-time difference in any modern compiler.
valarray<long double> characterFrequencies(128);

// The array where the read characters will be stored.
char buffer[BUFSIZ];

// If argc!= 2, then either the number of arguments is not correct, or the platform does not
// support arguments.
if(argc!= 2)
{
cerr<< "\nUsage: "<< argv[0]<< " fileNameToRead\n\n";

return EXIT_FAILURE;
}

// We disable synchronisation with stdio, to speed up C++ I/O.
ios_base::sync_with_stdio(false);

string characters= "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

clock_t time1, time2;

// We start timing.
time1= clock();

// We open the file
ifstream inputFile(argv[argc -1]);

// An error happened
if(not inputFile.good())
{
cerr<< "\nCould not open file for reading, exiting...\n\n";

return EXIT_FAILURE;
}

do
{
inputFile.read(buffer, sizeof(buffer));

for(streamsize i= 0; i< inputFile.gcount(); ++i)
++characterFrequencies[ buffer ];

}while(inputFile);

// Since rule 1 is: "Your program should be case insensitive when it counts letters",
// we add the results of lowercase characters and their equivallent uppercase letters together.
cout<<fixed<< "\n\n\nThe letter frequencies are:\n";

long double totalcharacterFrequencies= 0;

for(string::size_type i= 0; i< characters.size(); ++i)
totalcharacterFrequencies+= characterFrequencies[ characters ]+ characterFrequencies[
tolower(characters) ];

for(string::size_type i= 0; i< characters.size(); ++i)
cout<< characters<< ": "<< (characterFrequencies[ characters ]+ characterFrequencies[
tolower(characters) ])/ totalcharacterFrequencies* 100<< "%\n";

// We "stop" timing.
time2= clock();

// We convert the timing to seconds.
double totalTimeInSeconds= static_cast<double>(time2- time1)/ CLOCKS_PER_SEC;

cout<<"\n\nThe whole process took "<< totalTimeInSeconds<< " seconds.\n";

cout<<"\n\nHave a nice day!\n";
}

In my machine, the program produces:

john@john-laptop:~/Projects/anjuta/cpp/src$ g++ -ansi -pedantic-errors -Wall -O3 main.cc -o foobar
john@john-laptop:~/Projects/anjuta/cpp/src$ ./foobar LetterFrequencyInput.txt

The letter frequencies are:
A: 6.919578%
B: 1.162287%
C: 4.205169%
D: 3.317211%
E: 11.644528%
F: 2.559197%
G: 1.895033%
H: 3.822553%
I: 7.818366%
J: 0.101068%
K: 0.638897%
L: 3.396621%
M: 2.367889%
N: 6.865435%
O: 9.381317%
P: 2.801040%
Q: 0.126336%
R: 7.865290%
S: 6.064106%
T: 8.821831%
U: 2.974300%
V: 1.180335%
W: 1.497979%
X: 0.202137%
Y: 2.331793%
Z: 0.039705%

The whole process took 1.080000 seconds.

Have a nice day!
john@john-laptop:~/Projects/anjuta/cpp/src$

Ioannis Vranos · Jun 12, 2009

Peter said:
/.../

Hi,

I have included your code in the list of results at
http://blog.p-jansson.com/2009/06/programming-challenge-letter-frequency.html

My Ubuntu machine had Boost 1.37 so that is what was used during
compilation.

Cheers,
Peter Jansson

I think you are relaxing the rules. I know Qt programming, and most (if not all) Linux distributions provide
the Qt libraries.

May I use Qt?

--
Ioannis A. Vranos

C95 / C++03 Developer

http://www.cpp-software.net

Chris M. Thomasson · Jun 13, 2009

blargg said:
Chris M. Thomasson wrote:
[...]

unsigned long counts[UCHAR_MAX + 1] = { 0 };
static unsigned char buf[BUFFER];

while ((size = fread(buf, 1, BUFFER, file))) {
size_t tmpsize = size;

while (tmpsize) {
++counts[buf[--tmpsize]];

Click to expand...

I wouldn't be surprised if this were slower than going forward, as
backwards might foil a processor's read-ahead of memory into the cache.
You can preserve your loop's style by having your index count from -size
to 0:

unsigned char* const buf_end = buf + size;
ptrdiff_t i = -(ptrdiff_t) size;
do {
++counts[buf_end];
}
while ( ++i );

Yeah. Your right. I don't really know why I did it that way. I think I would
just go with something like:

while ((size = fread(buf, 1, BUFFER, file))) {
size_t tmpsize;

for (tmpsize = 0; tmpsize < size; ++tmpsize) {
++counts[buf[tmpsize]];
}

if (size < BUFFER) break;
}

Jonathan Lee · Jun 13, 2009

I have created a small programming challenge for those of you who are

interested in challenging your Standard C++ programming skills.

If this has already been brought up then please ignore me, but how are
you ensuring the programs are reading from disk? For example, the
current program listed as fastest on your blog clocks in at .13s. So I
downloaded the source, downloaded the text file and unzipped it, and
ran it. Result: 0.17s. Fine, but I notice that my hard disk light
doesn't flash. So I restart the computer and run it again. Result:
6.1s.

I suspect that some of the times you've posted may be benefiting from
the fact that the file is in memory, and others are genuinely reading
it from disk.

--Jonathan

Jonathan Lee · Jun 13, 2009

Here's a submission in C++. Nothing that interesting, but to
illustrate a
point: the speed of this program is dominated by the printing of the
floating point value representing percent. i.e., this line:

cout << ("abcdefghijklmnopqrstuvwxyz") << ' ' << p <<"%\n";

if you remove "<< p" it will only take 0.158s on my computer. With
that in, it takes
0.514s. Perhaps it's just g++'s implementation of cout's float to text
processing.
In any event, I think we're at the mercy of the compiler's libs.

For what it's worth I also coded up a version that read longs instead
of chars, and
accumulated 16 bits at a time in the "count" array (fixing the
calculation at the
end). This didn't make much of a difference, though, in light of the
cout penalty
noted above.

//-----------------------------------------------------------------------------------

#include <cstdio>
#include <iostream>
using ::std::cout;
using ::std::endl;

#define BUFFSIZE 4096

int main(int argc, char* argv[]) {
if( argc <= 1 ) return (printf("Please provide a file name\n"), 1);
CStopWatch s;
s.startTimer();

if (FILE* f = fopen(argv[1], "r")) {
unsigned char buff[BUFFSIZE];
unsigned long count[256] = {0};
unsigned long total_alpha[26] = {0};

size_t bytesread;

while ((bytesread = fread(buff, 1, BUFFSIZE, f)) > 0) {
for (size_t i = 0; i < bytesread; ++i) {
++count[buff];
}
}

unsigned long total = 0;
for (int i = 0; i < 26; ++i) {
unsigned long x;
x = count[("abcdefghijklmnopqrstuvwxyz")]
+ count[("ABCDEFGHIJKLMNOPQRSTUVWXYZ")];
total_alpha = x;
total += x;
}
float p2 = 100.0f / total;
for (int i = 0; i < 26; ++i) {
float p = p2 * total_alpha;
cout << ("abcdefghijklmnopqrstuvwxyz") << ' ' << p <<"%\n";
}

cout << endl;
fclose(f);
}
s.stopTimer();
cout << "Time elapsed: " << s.getElapsedTime() << endl;

return 0;
}

Keith H Duggar · Jun 13, 2009

http://blog.p-jansson.com/2009/06/programming-challenge-letter-freque...

Why is that link programmed to refresh every three seconds? Can you
please change that to something more reasonable (15 seconds at least)
as it is very annoying.

Ioannis Vranos · Jun 13, 2009

Peter said:
/.../

Hi, you are quite right about the unreliable outcomes we see when
counting wall clock time during I/O. I have, however, included your
program at the comparison page:

http://blog.p-jansson.com/2009/06/programming-challenge-letter-frequency.html

Sincerely,
Peter Jansson

Opening the text file in binary mode is a mistake, so I think you should remove such answers as valid answers.

--
Ioannis A. Vranos

C95 / C++03 Developer

http://www.cpp-software.net

C programming in 2011	148	May 26, 2011
Ann: Registration for the Python Game Programming Challenge is nowopen!	0	Jul 29, 2005
Requirement @ Appulse	0	Jul 24, 2008
Some errors in MIT's intro C++ course	109	Sep 8, 2010
Looking for C++ Professionals (Bangalore)	4	Aug 17, 2007
C language now truly universal	0	Jan 1, 2011
Seek Contract Programming Work - 17 Years Experience	0	Feb 22, 2005
Fundamentals of Financial Management Concise 7e Brigham Houston	0	May 1, 2011

C++ programming challenge

Chris M. Thomasson

Ioannis Vranos

Alf P. Steinbach

Thomas J. Gritzan

Ioannis Vranos

Chris M. Thomasson

Chris M. Thomasson

Chris M. Thomasson

fifoforlifo

Bart van Ingen Schenau

Ioannis Vranos

Alf P. Steinbach

Ioannis Vranos

Ioannis Vranos

Ioannis Vranos

Chris M. Thomasson

Jonathan Lee

Jonathan Lee

Keith H Duggar

Ioannis Vranos

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads