C++ solution for K & R(2nd Ed) Ex.6-4 - better solution needed

S

subramanian100in

As a beginner in C++, I have attempted the C++ solution for the
following:

Consider the Ex. 6-4 in K & R(ANSI C) 2nd Edition in Page 143 :

Write a program that prints the distinct words in its input sorted
into descending order of frequency of occurrence. Precede each word by
its count.

(Here I assume that we should NOT sort the words first. Instead sort
in decreasing order as per frequency.)

Following is my C++ solution which works fine. Kindly suggest better
way of doing it.

#include <iostream>
#include <vector>
#include <string>
#include <utility>
#include <algorithm>

using namespace std;

typedef pair<int, string> is;

bool cmp_fn(is arg1, is arg2)
{
return (arg2.first < arg1.first) ? true : false;
}

void print(const is & arg)
{
cout << arg.second << ": " << arg.first << endl;
}

int main()
{
vector<string> unique_words;
vector<is> v;

string word;

while (cin >> word)
{
if (find(unique_words.begin(), unique_words.end(),
word) == unique_words.end())
{
unique_words.push_back(word);
v.push_back(make_pair(1, word));
}
else
{
for (vector<is>::iterator i = v.begin(); i !=
v.end(); ++i)
if (i->second == word)
{
++i->first;
break;
}

}
}

sort(v.begin(), v.end(), cmp_fn);

for_each(v.begin(), v.end(), print);

return 0;
}

Kindly help.

Thanks
V.Subramanian
 
B

Barry

As a beginner in C++, I have attempted the C++ solution for the
following:

Consider the Ex. 6-4 in K & R(ANSI C) 2nd Edition in Page 143 :

Write a program that prints the distinct words in its input sorted
into descending order of frequency of occurrence. Precede each word by
its count.

(Here I assume that we should NOT sort the words first. Instead sort
in decreasing order as per frequency.)

Following is my C++ solution which works fine. Kindly suggest better
way of doing it.

#include <iostream>
#include <vector>
#include <string>
#include <utility>
#include <algorithm>

using namespace std;

typedef pair<int, string> is;

bool cmp_fn(is arg1, is arg2)
{
return (arg2.first < arg1.first) ? true : false;
}

void print(const is & arg)
{
cout << arg.second << ": " << arg.first << endl;
}

int main()
{
vector<string> unique_words;
vector<is> v;

string word;

while (cin >> word)
{
if (find(unique_words.begin(), unique_words.end(),
word) == unique_words.end())
{
unique_words.push_back(word);
v.push_back(make_pair(1, word));
}
else
{
for (vector<is>::iterator i = v.begin(); i !=
v.end(); ++i)
if (i->second == word)
{
++i->first;
break;
}

}
}

sort(v.begin(), v.end(), cmp_fn);

for_each(v.begin(), v.end(), print);

return 0;
}

Kindly help.

I see that the problem needs mutli_index, where we need string as key,
but sorted by another key of int.

I think Boost has something for you.

Anyway, if you don't want to use big tool to deal with small thing,
I suggest you use std::vector<pair<int, string> > as your associative
container, you wrap this as your member, when you add a key word, find a
right place to insert.

In this case, insertion, searching, is of O(N), not considering the cost
of caused by insertion.
 
G

Guest

As a beginner in C++, I have attempted the C++ solution for the
following:

Consider the Ex. 6-4 in K & R(ANSI C) 2nd Edition in Page 143 :

Write a program that prints the distinct words in its input sorted
into descending order of frequency of occurrence. Precede each word by
its count.

(Here I assume that we should NOT sort the words first. Instead sort
in decreasing order as per frequency.)

Following is my C++ solution which works fine. Kindly suggest better
way of doing it.

#include <iostream>
#include <vector>
#include <string>
#include <utility>
#include <algorithm>

I have not looked at your code, but for this you should only need to
include <iostream>, <string> and <map>.

Some hints:

Use std::map to store the word as a key and the number of occurrences.

The [] operator of std::map can be used like this:

map[key] = value;

One nice feature of the [] operator is that if there is no key/value
pair already in the map it will be created, and the value will be
default initialised (meaning that if the value is an int it will be set
to 0).
 
K

Kai-Uwe Bux

Erik said:
I have not looked at your code, but for this you should only need to
include <iostream>, <string> and <map>.

Really? You are probably thinking of using

std::map< std::string, unsigned >

But that will not magially sort the strings by frequency.

It think, I still would have <vector> and <algorithm>. (Of course, you could
first build the map and then use an inverse multimap, but I think that
would be more obscure than just sorting a vector.)
Some hints:

Use std::map to store the word as a key and the number of occurrences.

The [] operator of std::map can be used like this:

map[key] = value;

One nice feature of the [] operator is that if there is no key/value
pair already in the map it will be created, and the value will be
default initialised (meaning that if the value is an int it will be set
to 0).


Best

Kai-Uwe Bux
 
R

red floyd

Kai-Uwe Bux said:
Really? You are probably thinking of using

std::map< std::string, unsigned >

// comparator for *DECREASING* order
struct compare {
bool operator()(const std::pair<string, unsigned>& lhs,
const std::pair<string, unsigned& rhs) const
{
return lhs.second > rhs.second;
}
};

std::map<std::string, unsigned> words_and_freqs;

// fill map, then do this. Creates a set sorted by frequency

std::set<std::pair<std::string, unsigned> >
freqs_first(words_and_freqs.begin(), words_and_freqs.end(),
compare());

the set is sorted by the frequency in descending order.
 
K

Kai-Uwe Bux

red said:
// comparator for *DECREASING* order
struct compare {
bool operator()(const std::pair<string, unsigned>& lhs,
const std::pair<string, unsigned& rhs) const
{
return lhs.second > rhs.second;
}
};

std::map<std::string, unsigned> words_and_freqs;

// fill map, then do this. Creates a set sorted by frequency

std::set<std::pair<std::string, unsigned> >
freqs_first(words_and_freqs.begin(), words_and_freqs.end(),
compare());

the set is sorted by the frequency in descending order.

And, if I see this correctly, it will contain exactly one word for each
frequency, because two pairs with equal frequency counts will be treated as
comparing equal by std::set<>.

However, that can be fixed, in which case you would throw in the header
<set> instead of <vector> and <algorithm>. Well, that's fine, too.


Best

Kai-Uwe Bux
 
S

subramanian100in

First I express my thanks to all of you for replying.

Looks like, some of you have suggested map data structure. This cannot
be used because it sorts the input based on string which is the key.
But the input words should not be sorted. Their order should be
retained as they appear in the input. Later, they should be sorted
based on frequency of their occurrence.

Let me put the logic that I have used, in words.
Instead of going through the code, kindly go through this and give me
your feedback as help.

I create "vector<string> unique_words;" to store each input word as it
arrives(after checking if it is already not found in this vector). I
also create "vector< pair<int, string> > v;" along with the above
vector<string>. Whenever a word arrives, first it is stored in
vector<string> if it is a new word and in this case, make_pair(1,
word) is stored in vector<pair<int,string>>. If the word has been
previously stored, then its count is incremented in
vector<pair<int,string>>. After reading all words, I do

sort(v.begin(), v.end(), cmp_fn);

for_each(v.begin(), v.end(), print);

The disadvantage is the addition of two global functions cmp_fn and
print. There can be other disadvantages also.

Kindly give your feedback.

Thanks
V.Subramanian
 
K

Kai-Uwe Bux

First I express my thanks to all of you for replying.

Looks like, some of you have suggested map data structure. This cannot
be used because it sorts the input based on string which is the key.

It still can (and should!) be used to compile the frequency data. You'r just
not done, yet :)
But the input words should not be sorted. Their order should be
retained as they appear in the input. Later, they should be sorted
based on frequency of their occurrence.

So, there needs to be a second step in the process. Assume you had a map

std::map< std::string, unsigned int >

with frequency data. How would you go about sorting them by frequency. There
are several options:

a) convert the list into a vector of pairs. (Easy since vector has a
constructor that takes a pair of iterators.) Then sort by frequency. Then
write to screen.

b) convert the list into a std::multimap said:
Let me put the logic that I have used, in words.
Instead of going through the code, kindly go through this and give me
your feedback as help.

I create "vector<string> unique_words;" to store each input word as it
arrives(after checking if it is already not found in this vector). I
also create "vector< pair<int, string> > v;" along with the above
vector<string>. Whenever a word arrives, first it is stored in
vector<string> if it is a new word and in this case, make_pair(1,
word) is stored in vector<pair<int,string>>. If the word has been
previously stored, then its count is incremented in
vector<pair<int,string>>. After reading all words, I do

sort(v.begin(), v.end(), cmp_fn);

good thinking.

for_each(v.begin(), v.end(), print);

You might want to have a look into ostream_iterator. Then you can use

std::copy( v.begin(),v.end(), ... );

The disadvantage is the addition of two global functions cmp_fn and
print. There can be other disadvantages also.

Kindly give your feedback.

You'r on the right track.



Best

Kai-Uwe Bux
 
B

Barry

First I express my thanks to all of you for replying.

Looks like, some of you have suggested map data structure. This cannot
be used because it sorts the input based on string which is the key.
But the input words should not be sorted. Their order should be
retained as they appear in the input. Later, they should be sorted
based on frequency of their occurrence.

Let me put the logic that I have used, in words.
Instead of going through the code, kindly go through this and give me
your feedback as help.

I create "vector<string> unique_words;" to store each input word as it
arrives(after checking if it is already not found in this vector). I
also create "vector< pair<int, string> > v;" along with the above
vector<string>. Whenever a word arrives, first it is stored in
vector<string> if it is a new word and in this case, make_pair(1,
word) is stored in vector<pair<int,string>>. If the word has been
previously stored, then its count is incremented in
vector<pair<int,string>>. After reading all words, I do

sort(v.begin(), v.end(), cmp_fn);

for_each(v.begin(), v.end(), print);

The disadvantage is the addition of two global functions cmp_fn and
print. There can be other disadvantages also.

I think you can wrap them up,

You can check this out:

#include <string>
#include <iostream>
#include <vector>
#include <algorithm>

class AssocVector
{
public:
typedef std::vector<std::pair<int, std::string> > ContainerType;
typedef ContainerType::iterator Iterator;
typedef ContainerType::const_iterator ConstIterator;
public:
void Insert(std::string const& word)
{
bool found = false;
Iterator iter = c.begin();
for (; iter != c.end(); ++iter)
{
if (iter->second == word)
{
++(iter->first);
found = true;
break;
}
}

if (found && iter != c.begin())
{
for (; iter != c.begin(); --iter)
{
Iterator prev = iter;
--prev;
if (prev->first < iter->first)
std::iter_swap(prev, iter);
else
break;
}
}

if (!found)
c.push_back(std::make_pair(1, word));
}

Iterator begin()
{
return c.begin();
}

Iterator end()
{
return c.end();
}

ConstIterator begin() const
{
return c.begin();
}

ConstIterator end() const
{
return c.end();
}
protected:
ContainerType c;
};

struct Printer
{
void operator() (std::pair<int, std::string> const& p) const
{
std::cout << p.first << ' ' << p.second << std::endl;
}

};

int main()
{
AssocVector assocVec;

std::string word;
while (std::cin >> word)
assocVec.Insert(word);

std::for_each (assocVec.begin(), assocVec.end(), (Printer()));
}
 
K

Kai-Uwe Bux

Barry said:
I think you can wrap them up,

You can check this out:

#include <string>
#include <iostream>
#include <vector>
#include <algorithm>

class AssocVector
{
public:
typedef std::vector<std::pair<int, std::string> > ContainerType;
typedef ContainerType::iterator Iterator;
typedef ContainerType::const_iterator ConstIterator;
public:
void Insert(std::string const& word)
{
bool found = false;
Iterator iter = c.begin();
for (; iter != c.end(); ++iter)
{
if (iter->second == word)
{
++(iter->first);
found = true;
break;
}
}

if (found && iter != c.begin())
{
for (; iter != c.begin(); --iter)
{
Iterator prev = iter;
--prev;
if (prev->first < iter->first)
std::iter_swap(prev, iter);
else
break;
}
}

if (!found)
c.push_back(std::make_pair(1, word));
}

Wow: is that bubble sort on the fly?

Anyway, I don't like the mix of responsibilities that issues from putting
the increment of the frequency count within the search loop. Consider:


void Insert(std::string const& word)
{
Iterator iter = c.begin();
while ( iter != c.end() && iter->second != word ) {
++ iter;
}
if ( iter == c.end() ) {
c.push_back(std::make_pair(1, word));
} else {
++ iter->first;
while ( iter != c.begin() ) {
Iterator next = iter;
--iter;
if (iter->first < next->first) {
std::iter_swap(next, iter);
} else {
return;
}
}
}
}

Iterator begin()
{
return c.begin();
}

Iterator end()
{
return c.end();
}

ConstIterator begin() const
{
return c.begin();
}

ConstIterator end() const
{
return c.end();
}
protected:
ContainerType c;
};

struct Printer
{
void operator() (std::pair<int, std::string> const& p) const
{
std::cout << p.first << ' ' << p.second << std::endl;
}

};

int main()
{
AssocVector assocVec;

std::string word;
while (std::cin >> word)
assocVec.Insert(word);

std::for_each (assocVec.begin(), assocVec.end(), (Printer()));
}

I think this is overkill (and inefficient due to the use of bubble sort :)

Instead, let me suggest some spagetty code (all goes into main):


#include <iostream>
#include <string>
#include <map>

int main ( void ) {
std::string word;
typedef std::map< std::string, unsigned long >
frequency_table;
frequency_table frequency;

// reading and counting:
while ( std::cin >> word ) {
++ frequency[ word ];
}

// reverting and resorting:
typedef std::multimap< unsigned long, std::string >
inverse_table;
inverse_table inverse;
for ( frequency_table::const_iterator iter
= frequency.begin();
iter != frequency.end(); ++ iter ) {
inverse.insert
( inverse_table::value_type
( iter->second, iter->first ) );
}

// output:
for ( inverse_table::const_reverse_iterator iter
= inverse.rbegin();
iter != inverse.rend(); ++ iter ) {
std::cout << iter->second << " " << iter->first << '\n';
}
}


Best

Kai-Uwe Bux
 
B

Barry

Kai-Uwe Bux said:
Wow: is that bubble sort on the fly?

Nah,
Actually, this is more like /insertion sort/

I shouldn't keep swapping, I just need swap once

if (found && iter != c.begin())
{
Iterator hole = iter;

for (; iter != c.begin(); --iter)
{
Iterator prev = iter;
--prev;
if (prev->first < iter->first)
//std::iter_swap(prev, iter);
;
else
break;
}
if (iter != hole)
std::iter_swap(iter, hole);
}
Anyway, I don't like the mix of responsibilities that issues from putting
the increment of the frequency count within the search loop. Consider:


void Insert(std::string const& word)
{
Iterator iter = c.begin();
while ( iter != c.end() && iter->second != word ) {
++ iter;
}
if ( iter == c.end() ) {
c.push_back(std::make_pair(1, word));
} else {
++ iter->first;
while ( iter != c.begin() ) {
Iterator next = iter;
--iter;
if (iter->first < next->first) {
std::iter_swap(next, iter);
} else {
return;
}
}
}
}

Nah,
*found* variable is not needed.
Iterator begin()
{
return c.begin();
}

Iterator end()
{
return c.end();
}

ConstIterator begin() const
{
return c.begin();
}

ConstIterator end() const
{
return c.end();
}
protected:
ContainerType c;
};

struct Printer
{
void operator() (std::pair<int, std::string> const& p) const
{
std::cout << p.first << ' ' << p.second << std::endl;
}

};

int main()
{
AssocVector assocVec;

std::string word;
while (std::cin >> word)
assocVec.Insert(word);

std::for_each (assocVec.begin(), assocVec.end(), (Printer()));
}

I think this is overkill (and inefficient due to the use of bubble sort :)

Instead, let me suggest some spagetty code (all goes into main):


#include <iostream>
#include <string>
#include <map>

int main ( void ) {
std::string word;
typedef std::map< std::string, unsigned long >
frequency_table;
frequency_table frequency;

// reading and counting:
while ( std::cin >> word ) {
++ frequency[ word ];
}

// reverting and resorting:
typedef std::multimap< unsigned long, std::string >
inverse_table;
inverse_table inverse;
for ( frequency_table::const_iterator iter
= frequency.begin();
iter != frequency.end(); ++ iter ) {
inverse.insert
( inverse_table::value_type
( iter->second, iter->first ) );
}

// output:
for ( inverse_table::const_reverse_iterator iter
= inverse.rbegin();
iter != inverse.rend(); ++ iter ) {
std::cout << iter->second << " " << iter->first << '\n';
}
}

yeh, using map for sorting is better, but have to buy another one for
indexing word while increment the frequency of a certain word.

So, this is Space VS. Time problem
 
K

Kai-Uwe Bux

Barry said:
Nah,
Actually, this is more like /insertion sort/

I shouldn't keep swapping, I just need swap once

if (found && iter != c.begin())
{
Iterator hole = iter;

for (; iter != c.begin(); --iter)
{
Iterator prev = iter;
--prev;
if (prev->first < iter->first)
//std::iter_swap(prev, iter);
;
else
break;
}
if (iter != hole)
std::iter_swap(iter, hole);
}

a) The code obscures the logic. Why are you using a break here?

b) The code is incorrect. You keep changing iter, but leaving out the swap
means that the value is not moving along with the --iter moves. Thus you
are comparing the wrong values.

Nah, *found* variable is not needed.

That's part of what I was trying to point out. The code I posted does not
use it.

Iterator begin()
{
return c.begin();
}

Iterator end()
{
return c.end();
}

ConstIterator begin() const
{
return c.begin();
}

ConstIterator end() const
{
return c.end();
}
protected:
ContainerType c;
};

struct Printer
{
void operator() (std::pair<int, std::string> const& p) const
{
std::cout << p.first << ' ' << p.second << std::endl;
}

};

int main()
{
AssocVector assocVec;

std::string word;
while (std::cin >> word)
assocVec.Insert(word);

std::for_each (assocVec.begin(), assocVec.end(), (Printer()));
}

I think this is overkill (and inefficient due to the use of bubble sort
:)

Instead, let me suggest some spagetty code (all goes into main):


#include <iostream>
#include <string>
#include <map>

int main ( void ) {
std::string word;
typedef std::map< std::string, unsigned long >
frequency_table;
frequency_table frequency;

// reading and counting:
while ( std::cin >> word ) {
++ frequency[ word ];
}

// reverting and resorting:
typedef std::multimap< unsigned long, std::string >
inverse_table;
inverse_table inverse;
for ( frequency_table::const_iterator iter
= frequency.begin();
iter != frequency.end(); ++ iter ) {
inverse.insert
( inverse_table::value_type
( iter->second, iter->first ) );
}

// output:
for ( inverse_table::const_reverse_iterator iter
= inverse.rbegin();
iter != inverse.rend(); ++ iter ) {
std::cout << iter->second << " " << iter->first << '\n';
}
}

yeh, using map for sorting is better, but have to buy another one for
indexing word while increment the frequency of a certain word.

So, this is Space VS. Time problem

Not really: you can dismantle the map while building the multimap. That way,
you keep the space roughly constant. On the other hand, map and multimap
will probably both be larger than the vector you are using.


#include <iostream>
#include <string>
#include <map>

int main ( void ) {
std::string word;
typedef std::map< std::string, unsigned long >
frequency_table;
frequency_table frequency;

// reading and counting:
while ( std::cin >> word ) {
++ frequency[ word ];
}

// reverting and resorting:
typedef std::multimap< unsigned long, std::string >
inverse_table;
inverse_table inverse;
while ( ! frequency.empty() ) {
frequency_table::iterator start = frequency.begin();
inverse.insert
( inverse_table::value_type
( start->second, start->first ) );
frequency.erase ( start );
}

// output:
for ( inverse_table::const_reverse_iterator iter
= inverse.rbegin();
iter != inverse.rend(); ++ iter ) {
std::cout << iter->second << " " << iter->first << '\n';
}
}


Best

Kai-Uwe Bux
 
B

Barry

Kai-Uwe Bux said:
a) The code obscures the logic. Why are you using a break here?

I rewrote the code without thinking
actually,
for (; iter != c.begin() && iter->first < hole->first;
--iter)
;

the loop is try to the word with frequency that is less than the current
word, the case when we need swapping is that the current word frequency
has left neighbor whose frequency is the same, as Insert only add 1 to
the frequency of the current word.

say the frequency sequence is like the following:

3 2 2 2 ...
^
current word frequency

now we add 1 to the current word frequency, then it's 3, we only have to
swap the second and the current slots. The sequency now becomes

3 3 2 2
^

right?
b) The code is incorrect. You keep changing iter, but leaving out the swap
means that the value is not moving along with the --iter moves. Thus you
are comparing the wrong values.

As I explained, I did a careless mistake.
 
B

Barry

forget what I posted, still wrong
So careless.
Good luck to myself when I do job interview
 
G

Guest

First I express my thanks to all of you for replying.

Looks like, some of you have suggested map data structure. This cannot
be used because it sorts the input based on string which is the key.
But the input words should not be sorted. Their order should be
retained as they appear in the input. Later, they should be sorted
based on frequency of their occurrence.

Let me put the logic that I have used, in words.
Instead of going through the code, kindly go through this and give me
your feedback as help.

I create "vector<string> unique_words;" to store each input word as it
arrives(after checking if it is already not found in this vector). I
also create "vector< pair<int, string> > v;" along with the above
vector<string>. Whenever a word arrives, first it is stored in
vector<string> if it is a new word and in this case, make_pair(1,
word) is stored in vector<pair<int,string>>. If the word has been
previously stored, then its count is incremented in
vector<pair<int,string>>. After reading all words, I do

sort(v.begin(), v.end(), cmp_fn);

for_each(v.begin(), v.end(), print);

The disadvantage is the addition of two global functions cmp_fn and
print. There can be other disadvantages also.

It is a good solution, however it does not take full advantage of the
containers provided in the standard library. For example unique_words
should probably be a set instead of a vector. Using the vector to store
both frequency and word have some advantages over a map, since it can be
sorted by both frequency and word making it useful in both stages of the
program. My only complaint is that your solution requires you to write
more code than one that takes better advantage of the standard
containers, it will however probably perform better and sometimes that
is more important.

My suggestion would have been to use first a map to read in the words
and counting their frequency, and then construct a multimap sorted by
frequency which is then printed:

#include <iostream>
#include <string>
#include <map>

int main()
{
std::map<std::string, size_t> words;

// Read words and count frequency
std::string w;
while (std::cin >> w)
{
++words[w];
}

// Construct multimap, notice the use of greater to sort it in
// decending order instead of ascending
std::multimap<size_t, std::string, std::greater<size_t> > freq;
std::map<std::string, size_t>::iterator i;
for (i = words.begin(); i != words.end(); ++i)
{
freq.insert(std::make_pair(i->second, i->first));
}

// Print, someone can probably come up with a way to use std::copy
// to print it instead, but I prefer an explicit loop
std::multimap<size_t, std::string, std::greater<size_t>>::iterator j;
for (j = freq.begin(); j != freq.end(); ++j)
{
std::cout << j->first << "\t" << j->second << "\n";
}
}
 
S

subramanian100in

It is a good solution, however it does not take full advantage of the
containers provided in the standard library. For example unique_words
should probably be a set instead of a vector. Using the vector to store
both frequency and word have some advantages over a map, since it can be
sorted by both frequency and word making it useful in both stages of the
program. My only complaint is that your solution requires you to write
more code than one that takes better advantage of the standard
containers, it will however probably perform better and sometimes that
is more important.

My suggestion would have been to use first a map to read in the words
and counting their frequency, and then construct a multimap sorted by
frequency which is then printed:

#include <iostream>
#include <string>
#include <map>

int main()
{
std::map<std::string, size_t> words;

// Read words and count frequency
std::string w;
while (std::cin >> w)
{
++words[w];
}

// Construct multimap, notice the use of greater to sort it in
// decending order instead of ascending
std::multimap<size_t, std::string, std::greater<size_t> > freq;
std::map<std::string, size_t>::iterator i;
for (i = words.begin(); i != words.end(); ++i)
{
freq.insert(std::make_pair(i->second, i->first));
}

// Print, someone can probably come up with a way to use std::copy
// to print it instead, but I prefer an explicit loop
std::multimap<size_t, std::string, std::greater<size_t>>::iterator j;
for (j = freq.begin(); j != freq.end(); ++j)
{
std::cout << j->first << "\t" << j->second << "\n";
}

}

Let me restate the exercise:

Consider the Ex. 6-4 in K & R(ANSI C) 2nd Edition in Page 143 :

Write a program that prints the distinct words in its input sorted
into descending order of frequency of occurrence. Precede each word by
its count.

(Here I assumed that we should NOT sort the words first. Instead sort
in decreasing order as per frequency.)

If we use a set or a map, won't it store input words in sorted
manner ? If so, it is not accepted in this problem. Order of input
words can be changed only for sorting by their frequency. That is why
I had to use vector. Correct me if I am wrong.

However thanks for giving a simple solution and I will use it if there
is no condition imposed on the sorting of words(ie the words may or
may not maintain their input order).

Thanks
V.Subramanian
 
G

Guest

It is a good solution, however it does not take full advantage of the
containers provided in the standard library. For example unique_words
should probably be a set instead of a vector. Using the vector to store
both frequency and word have some advantages over a map, since it can be
sorted by both frequency and word making it useful in both stages of the
program. My only complaint is that your solution requires you to write
more code than one that takes better advantage of the standard
containers, it will however probably perform better and sometimes that
is more important.

My suggestion would have been to use first a map to read in the words
and counting their frequency, and then construct a multimap sorted by
frequency which is then printed:

#include <iostream>
#include <string>
#include <map>

int main()
{
std::map<std::string, size_t> words;

// Read words and count frequency
std::string w;
while (std::cin >> w)
{
++words[w];
}

// Construct multimap, notice the use of greater to sort it in
// decending order instead of ascending
std::multimap<size_t, std::string, std::greater<size_t> > freq;
std::map<std::string, size_t>::iterator i;
for (i = words.begin(); i != words.end(); ++i)
{
freq.insert(std::make_pair(i->second, i->first));
}

// Print, someone can probably come up with a way to use std::copy
// to print it instead, but I prefer an explicit loop
std::multimap<size_t, std::string, std::greater<size_t>>::iterator j;
for (j = freq.begin(); j != freq.end(); ++j)
{
std::cout << j->first << "\t" << j->second << "\n";
}

}

Let me restate the exercise:

Consider the Ex. 6-4 in K & R(ANSI C) 2nd Edition in Page 143 :

Write a program that prints the distinct words in its input sorted
into descending order of frequency of occurrence. Precede each word by
its count.

(Here I assumed that we should NOT sort the words first. Instead sort
in decreasing order as per frequency.)

If we use a set or a map, won't it store input words in sorted
manner ? If so, it is not accepted in this problem. Order of input
words can be changed only for sorting by their frequency. That is why
I had to use vector. Correct me if I am wrong.

I see nothing in the requirements that tells you how to implement it,
just on the observable behaviour, so anything should be allowed as long
as the results are correct.
However thanks for giving a simple solution and I will use it if there
is no condition imposed on the sorting of words(ie the words may or
may not maintain their input order).

While reading the words into the map they will be sorted in whatever
manner strings are sorted, then I copy the contents of the map over to
the multimap but this time they are stored sorted by frequency. So when
you iterate over the multimap you will go through the words in the order
of most used word to least used word (I do not know how two words which
have the same frequency are sorted internally, but I imagine that they
are sorted by comparing the words).
 
J

James Kanze

On 2007-09-30 16:03, (e-mail address removed), India wrote: [...]
Let me restate the exercise:
Consider the Ex. 6-4 in K & R(ANSI C) 2nd Edition in Page 143 :
Write a program that prints the distinct words in its input sorted
into descending order of frequency of occurrence. Precede each word by
its count.
(Here I assumed that we should NOT sort the words first. Instead sort
in decreasing order as per frequency.)
If we use a set or a map, won't it store input words in sorted
manner ? If so, it is not accepted in this problem. Order of input
words can be changed only for sorting by their frequency. That is why
I had to use vector. Correct me if I am wrong.
I see nothing in the requirements that tells you how to implement it,
just on the observable behaviour, so anything should be allowed as long
as the results are correct.
While reading the words into the map they will be sorted in whatever
manner strings are sorted, then I copy the contents of the map over to
the multimap but this time they are stored sorted by frequency.

I rather suspect that it would be preferrable to copy to an
std::vector, and then sort that, rather than use multimap. Both
simpler to program, and faster.

It's also possible to maintain everything in a vector, keeping
the vector sorted and using std::lower_bound for the search.
Doing this, insertion will be significantly slower, but you save
the copy at the end. Off hand, I couldn't say which would be
fastest, but using the map, then copying to the vector, would
certainly be the easiest to write and maintain. (One might also
consider using hash_map once it's available.)
So when you iterate over the multimap you will go through the
words in the order of most used word to least used word (I do
not know how two words which have the same frequency are
sorted internally, but I imagine that they are sorted by
comparing the words).

In multimap, no, or at least not directly. Multimap and
multiset preserve the relative order of equivalent elements.
But if you're copying from a container which was sorted on the
words, of course, the relative order of equivalent elements will
be the order of the words.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,150
Members
46,697
Latest member
AugustNabo

Latest Threads

Top