Standard C++ Library

R

Razii

I'll go out on a limb here and guess that you're self taught. You might
want to consider a course in algorithms if you're really interested in
programming.

No thanks.
You clearly seem concerned with appearances as evidenced by
your taunts. You can't even imagine how much it would improve your
standing with programmers if you could replace "doubt" and "guess"
with a clear, correct, and concise analysis of an algorithm.

I already posted the reason why Hasmap is faster than sorted map in
this situation. I have tested both version.
And your reading skills have already been clearly demonstrated.

By the way, I checked your posting history for this newsgroup. Other
than one post in "Best book" thread, all your posts are in the threads
started by me :) How come? Did I inspire you so much that you
started posting to this newsgroup regularly? Do you have a crush on
me?
 
Y

Yannick Tremblay

I doubt it's that easy to trigger worse case scenario. You will need
to read hashing algorithm and then write a bible size book with
specifically chosen data.

Regardless if it is easy or hard, if you are going to release software
in the wild, then you must make sure that a nasty character can't
destroy your application simply by looking up the Java Hash algorithm
and feeding it hand prepared data. Read the history of buffer
overflow exploit to learn to what extend peoples will go to break
programs.

A lot of search/sort/hash algorithm have worse case scenario that are
maybe unlikely to be found naturally but rather easy to generate.
Some examples would be: already sorted data, almost already sorted
data, data sorted in reverse order. Although random data is unlikely
to trigger any of the above, it is not that uncommonly found in the
wild and a good programmer must be aware of these potential problems
and protect his software against it.

So if you any real interest in algorithms, try pre-sorting your text
before attempting your test.

By the way, IMO, the best solution to your problem is:

$ sort bible.txt > sortedbible.txt

Using such an utility, generating somewhat sorted data is trivial. e,g:

$ sort -r bible.txt > reversesortedbible.txt
$ sort bible.txt > doublesortedbible.txt && sort bible.txt >>
doublesortedbible.txt

Very very simple.
 
M

Mirek Fidler

Might be in the general case. Howver, if you write a word sorter
using a hash table, it is possible that you would encounter data that
would trigger a worse case scenario performance. So you could write
your program, claim it is very fast and I could make it incredibly
slow simply by feeding it specifically chosen data. Not really
something you'd want to expose to the world.

Well, but exactly the same holds for the better part of all current
computational power.

Admit it, everything in modern CPU is basically based on designs
basically equal to hashing (cache, paging, branch prediction). Chances
that your hashmap will fail due to worst case scenario is as likely as
regular map failing because of cache/paging issues.

BTW, speaking about it, one reason why good hashmap always outperforms
binary tree on modern CPUs is branch misprediction - that central
compare and jump is totally random in the loop, mispredicted in 50% of
cases.

Choosing binary tree map before hash map is 1990 thinking. A lot has
changed since then.

Mirek
 
R

red floyd

Razii said:
How is it standard when VC++ doesn't have it?

Simple. Microsoft doesn't define Standard C++, ISO does (though given
what happened with DIS29500...)
 
R

Razii

$ sort bible.txt > sortedbible.txt
Using such an utility, generating somewhat sorted data is trivial. e,g:

I think you are a little confused. This is not the sorting bible
problem. This one counts up the frequency of each word, and it makes
no difference if the data is already sorted. The sort is done to print
output (.e words) in nice format. Here, I wilt post it to a blogpage
that explains it..

Why is C++ slower than Java?

http://razi2.blogspot.com/2008/04/why-is-c-slower-than-java.html
 
R

Razii

The hint was that if you just cannot wait for the next compiler
release from MS, you can buy your own copy of the library from the
same place they are getting it from.

I downloaded MinGW that uses GCC compiler. Here is C++ version that
uses map

http://www.pastebin.ca/965243

and I changed it to

std::tr1::unordered_map< std::string, int > dictionary;

The time improved from 5600 ms to 3500 ms (which is still slower than
java version 2300 ms). However, the result is not valid since java
version sorts it by transferring the data back to TreeMap in the end.

Can anyone post the most efficient way how would I transfer data that
is in std::tr1::unordered_map< std::string, int > dictionary to a
sorted map

Or is there any other way to sort unordered_map?
 
P

Paul Brettschneider

Razii said:
I think you are a little confused. This is not the sorting bible
problem. This one counts up the frequency of each word, and it makes
no difference if the data is already sorted. The sort is done to print
output (.e words) in nice format. Here, I wilt post it to a blogpage
that explains it..

Why is C++ slower than Java?

http://razi2.blogspot.com/2008/04/why-is-c-slower-than-java.html

Yawn. What a lame attempt at trolling.

Anyway, a few changes and down to 750 ms from 6500 ms:

#include <iostream>
#include <fstream>
#include <sstream>
#include <ctime>
#include <cstring>

static const int num_morons = ('z' - 'a' + 1) * 2;

template <typename T> class MoronicArray {
T arr[num_morons];
public:
T &operator[](unsigned char c) {
int n;
if(c >= 'a' && c <= 'z')
n = c - 'a';
else if(c >= 'A' && c <= 'Z')
n = c- 'A' + num_morons/2;
else
n = 0;
return arr[n];
}
MoronicArray() { memset(this, '\0', sizeof(*this)); }
};

class Moron {
MoronicArray<int> counts;
MoronicArray<Moron *> morons;
public:
Moron *operator[](unsigned char c) {
Moron *&m = morons[c];
if(m != NULL)
return m;
return m = new Moron();
};
void inc(unsigned char c) {
++counts[c];
};
void moronificator(const std::string &s) {
for(unsigned char c = 'a'; c <= 'z'; ++c) {
if(counts[c] > 0)
std::cout << s << c << " " << counts[c] <<
std::endl;
if(morons[c])
morons[c]->moronificator(s + std::string(1,
c));
}
for(unsigned char c = 'A'; c <= 'Z'; ++c) {
if(counts[c] > 0)
std::cout << s << c << " " << counts[c] <<
std::endl;
if(morons[c])
morons[c]->moronificator(s + std::string(1,
c));
}
}
};

int main(int argc, char **argv)
{
int w_total = 0, l_total = 0, c_total = 0;
Moron m;

printf(" lines words bytes file\n" );
clock_t start=clock();

for(int i = 1; i <argc; ++i) {
std::ifstream input_file(argv);
std::eek:stringstream buffer;
buffer << input_file.rdbuf();
const std::string &s = buffer.str();
int l = s.size();

int w_cnt = 0, l_cnt = 0, c_cnt = 0;
Moron *act = &m;
unsigned char last = '\0';
for(int j = 0; j < l; ++j) {
unsigned char c = s[j];
if(c == '\n') {
++l_cnt;
}
if(c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z') {
if(last)
act = (*act)[last];
last = c;
} else if(last) {
act->inc(last);
act = &m;
last = '\0';
} else {
last = '\0';
}
++c_cnt;
}

printf("%d\t%d\t%d\t %s\n", l_cnt, w_cnt, c_cnt, argv);
l_total += l_cnt;
w_total += w_cnt;
c_total += c_cnt;
}

clock_t end=clock();

if(argc > 2) {
printf("--------------------------------------\n%d\t%d\t%d\t
total",
l_total, w_total, c_total);
}

printf("--------------------------------------\n");

m.moronificator(std::string());

std::cout <<"Time: " << double(end-start)/CLOCKS_PER_SEC * 1000 << "
ms\n";
}

Lrf, guvf boivbhfyl jnf n wbxr :C.
 
P

Paul Brettschneider

Razii said:
lol at "few" changes. I haven't tested it yet but I don't see <map> in
your version.

Sheesh... I'd thought it would be obvious that the code was a joke. :(

And of course you can write the same silly code with std::map. It's slower
then, but still 4-5 times faster than the original.

#include <iostream>
#include <fstream>
#include <ctime>
#include <map>

class Moron {
int count;
std::map<char, Moron *> morons;
public:
Moron() : count(0) {};
Moron *operator[](char c) {
std::map<char, Moron *>::iterator it = morons.find(c);
if(it != morons.end())
return it->second;
Moron *res = new Moron();
morons[c] = res;
return res;
};
void inc() {
++count;
};
void moronificator(const std::string &s) {
if(count)
std::cout << s << " " << count << std::endl;
for(std::map<char, Moron *>::iterator it = morons.begin();
it != morons.end(); ++it)
it->second->moronificator(s + std::string(1,
it->first));
};
};

int main(int argc, char **argv)
{
int w_total = 0, l_total = 0, c_total = 0;
const size_t bufsiz = 1024 * 1024;
char buf[bufsiz];
Moron m;

printf(" lines words bytes file\n" );
clock_t start=clock();

for(int i = 1; i <argc; ++i) {
std::ifstream input_file(argv);
int w_cnt = 0, l_cnt = 0, c_cnt = 0;
Moron *act = &m;

for(;;) {
if(input_file.read(buf, bufsiz) &&
input_file.fail())
break;
int l = input_file.gcount();
if(!l)
break;
for(int j = 0; j < l; ++j) {
char c = buf[j];
if(c == '\n') {
++l_cnt;
}
if(c >= 'a' && c <= 'z' || c >= 'A' && c
<= 'Z') {
act = (*act)[c];
} else if(act != &m) {
++w_cnt;
act->inc();
act = &m;
}
++c_cnt;
}
}
if(act != &m) {
++w_cnt;
act->inc();
}

printf("%d\t%d\t%d\t %s\n", l_cnt, w_cnt, c_cnt, argv);
l_total += l_cnt;
w_total += w_cnt;
c_total += c_cnt;
}
clock_t end=clock();

if(argc > 2) {
printf("--------------------------------------\n%d\t%d\t%d\t
total",
l_total, w_total, c_total);
}

printf("--------------------------------------\n");

m.moronificator(std::string());

std::cout <<"Time: " << double(end-start)/CLOCKS_PER_SEC * 1000 << "
ms\n";
}
 
P

Paul Brettschneider

Razii said:
I didn't compile but doesn't matter.

Then fix it. You might even learn something, though you seem to be genuinely
not interested in learning. Your loss.
You are still using trie data
structure.

So what? You wanted a faster program which uses std::map and I gave you a
400-500% speed increase. Not too shabby, IMHO. BTW: I get another 20% speed
improvement by using non-buffered IO, but leave that to you as an exercise.
 
P

Paul Brettschneider

Razii said:
LOL. You post code that doesn't compile or is missing things (like
total words in the last versions) and then, you ask people to fix it.
I am not here to learn. I am here for having fun. If wanted to learn,
I would go to school or buy books. Why would I come to USENET?

This might be news to you, but for many people learning is actually fun. And
I know Usenet is a great pool of knowledge. You seem to gain fun from being
and staying a brat. Again: your loss.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,176
Messages
2,570,949
Members
47,500
Latest member
ArianneJsb

Latest Threads

Top