frequency list

jason_box · Apr 25, 2006

I'm alittle new at C and I'm trying to write a simple program that will
record the frequency of words and just print it out. It is suppose to
take stdin and I heard it's only a few lines but I'm not sure where to
start.

The only example I have to work with is if you ran the program say:

File list contains:
This is a test.

% test < list

10 1
32 3
46 1
84 1
97 1
101 1
104 1
105 2
115 3
116 2

It just checks for the characters and converts it to the ascci value
and number of occurrences and just uses a simple space or tab. I think
someone said that it shouldn't take more than 10-15 lines but I haven't
tried yet. Thank you time.

Richard Heathfield · Apr 25, 2006

jason_box said:

I'm alittle new at C and I'm trying to write a simple program that will
record the frequency of words and just print it out. It is suppose to
take stdin and I heard it's only a few lines but I'm not sure where to
start.

If you just want to know how many words there are in all, it's pretty easy,
yes. If you wish to record the frequency of each distinct word, it's not as
simple as it sounds. I don't see anyone doing it in fifteen lines, without
using either extremely long lines or, perhaps, non-standard language
extensions (such as, say, C++).

I recommend that you do /not/ try, just yet, to try the latter. Instead,
just write a program to count words. This is easy to do. Start a counter at
0, and record the current "state" of your input - which should be "not in a
word" - i.e. 0. Whenever you hit a non-whitespace character, check the
state. If it's 0 (i.e. not in a word), change it to 1 (i.e. in a word) and
add 1 to the counter. Otherwise, do nothing. Whenever you hit a whitespace
character, change the state to 0.

#include <ctype.h> to get the prototype for isspace().

Once you've dealt with all the input, print the counter.

Al Balmer · Apr 25, 2006

I'm alittle new at C and I'm trying to write a simple program that will
record the frequency of words and just print it out. It is suppose to
take stdin and I heard it's only a few lines but I'm not sure where to
start.

The only example I have to work with is if you ran the program say:

File list contains:
This is a test.

% test < list

10 1
32 3
46 1
84 1
97 1
101 1
104 1
105 2
115 3
116 2

It just checks for the characters and converts it to the ascci value
and number of occurrences and just uses a simple space or tab.

Your first line says "frequency of words", but your example looks like
frequency of characters. Which is it?

I think
someone said that it shouldn't take more than 10-15 lines but I haven't
tried yet.

Do try. Then you can come back for help and critique.

jason_box · Apr 25, 2006

Oh sorry about that, yes it is a frequency of characters in the input
file.

Richard Heathfield · Apr 25, 2006

jason_box said:

Oh sorry about that, yes it is a frequency of characters in the input
file.

#include <limits.h>
#include <stdio.h>
int main(void){
unsigned long c[UCHAR_MAX + 1]={0};int ch;while((ch=getchar())!=EOF)++c[ch];
for(ch=0;ch<256;ch++)if(c[ch]>0)printf("%d:%lu\n",ch,c[ch]);return 0;}

jason_box · Apr 25, 2006

What is the lenght of an unsigned long versus using a standard long for
the character array?

Richard said:
jason_box said:

Oh sorry about that, yes it is a frequency of characters in the input
file.

Click to expand...

#include <limits.h>
#include <stdio.h>
int main(void){
unsigned long c[UCHAR_MAX + 1]={0};int ch;while((ch=getchar())!=EOF)++c[ch];
for(ch=0;ch<256;ch++)if(c[ch]>0)printf("%d:%lu\n",ch,c[ch]);return 0;}

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Al Balmer · Apr 25, 2006

Oh sorry about that, yes it is a frequency of characters in the input
file.

Richard Heathfield · Apr 25, 2006

jason_box said:

What is the lenght of an unsigned long versus using a standard long

An standard long is sizeof(long) bytes in size, whereas an unsigned long,
which is just as standard as a standard long, is sizeof(unsigned long)
bytes in size.

for the character array?

What character array?

jason_box · Apr 25, 2006

Oh nevermind, I understand it now. From this list that is produced from
the frequency encoding, I would need to generate the huffman encoding
for the list. I read that there can be many ways of doing this and even
as simple as not using any crazy data structures to get the encoding. I
know for the huffman tree, it uses a tree structure and generates a
node and eventually combines it with a root and left traversal would
denote a '0' and right traversal would denote '1' and with the
traversal to a specific node, aka a character is your encoding. Guess
now I have to find a smart way of encoding the list that is given into
the huffman codes.

Andrew Poelstra · Apr 25, 2006

jason_box said:
I'm alittle new at C and I'm trying to write a simple program that will
record the frequency of words and just print it out. It is suppose to
take stdin and I heard it's only a few lines but I'm not sure where to
start.

The only example I have to work with is if you ran the program say:

File list contains:
This is a test.

% test < list

10 1
32 3
46 1
84 1
97 1
101 1
104 1
105 2
115 3
116 2

It just checks for the characters and converts it to the ascci value
and number of occurrences and just uses a simple space or tab. I think
someone said that it shouldn't take more than 10-15 lines but I haven't
tried yet. Thank you time.

Assuming you meant 'frequency of characters':

I wrote a program like that for help with cryptograms... unfortunately,
it's on my laptop, so I can't simply copy-and-paste it onto here. Also,
it had a curses interface, and so I'd have to modify the display module
to post it.

Basically, you create a
struct letter {
int chr;
int freq;
}

Then make an array of those, and each time you encounter a character,
increase the list[x].freq. chr should be set when you initialize the array.

Richard Heathfield · Apr 25, 2006

jason_box said:

Oh nevermind, I understand it now. From this list that is produced from
the frequency encoding, I would need to generate the huffman encoding
for the list.

That's pretty trivial; and having seen you display such mastery over the
frequency list program, I don't think you'll have any trouble doing it
yourself.

Just as a brief hint, though, Huffman encoding works like this: if you have
n bytes in your data and a particular character c occurs f times, then
character c should be represented in the Huffman tree by a bit sequence
log2(n / f) bits long (except that you might want to reserve one relatively
short bit sequence as a "quote" sequence for characters that are very rare
but nevertheless present).

jason_box · Apr 25, 2006

I guess the generation of the huffman code is where it'll get a little
complicated. Depending on what sort of structure that I used. I just
read from other forums that it does have to be done in a BST structure
but perhaps something simplier. I guess I'll haver to jot this on paper
and see what I can come up with.

jason_box · Apr 25, 2006

I guess the generation of the huffman code is where it'll get a little
complicated. Depending on what sort of structure that I used. I just
read from other forums that it does have to be done in a BST structure
but perhaps something simplier. I guess I'll haver to jot this on paper
and see what I can come up with.

Richard Heathfield · Apr 25, 2006

jason_box said:

I guess the generation of the huffman code is where it'll get a little
complicated.

No, not really.

Depending on what sort of structure that I used. I just
read from other forums that it does have to be done in a BST structure
but perhaps something simplier.

Look up "trie".

jason_box · Apr 25, 2006

I've worked with Trie's before and I didn't have a good time with them,
though that is one good structure to use since you can traverse down
the tree and each node holds a character and there can be extra param
to encode 0 or 1. I'm going to look for a more basic structure, maybe
priority queue would work. Thanks for your help.

Keith Thompson · Apr 25, 2006

jason_box said:
I guess the generation of the huffman code is where it'll get a little
complicated. Depending on what sort of structure that I used. I just
read from other forums that it does have to be done in a BST structure
but perhaps something simplier. I guess I'll haver to jot this on paper
and see what I can come up with.

You posted two copies of the same article, less than a minute apart.
Apparently there's some flaw in Google Groups that encourages this
kind of mistake.

Also, please don't top-post. Your response goes after, or mixed with,
any quoted text, and the quoted text should be trimmed to just what's
relevant to your response. In particular, don't quote the signature
unless you're commenting on it. See most of the articles in this
newsgroup for examples; see also <http://www.caliburn.nl/topposting.html>.

Al Balmer · Apr 26, 2006

Assuming you meant 'frequency of characters':

I wrote a program like that for help with cryptograms... unfortunately,
it's on my laptop, so I can't simply copy-and-paste it onto here. Also,
it had a curses interface, and so I'd have to modify the display module
to post it.

Basically, you create a
struct letter {
int chr;
int freq;
}

Then make an array of those, and each time you encounter a character,
increase the list[x].freq. chr should be set when you initialize the array.

Or, considering that there are a limited number of possible
characters, just declare a frequency array of UCHAR_MAX + 1 elements
and use the character to index into it.

Andrew Poelstra · Apr 26, 2006

Al said:
Assuming you meant 'frequency of characters':

I wrote a program like that for help with cryptograms... unfortunately,
it's on my laptop, so I can't simply copy-and-paste it onto here. Also,
it had a curses interface, and so I'd have to modify the display module
to post it.

Basically, you create a
struct letter {
int chr;
int freq;
}

Then make an array of those, and each time you encounter a character,
increase the list[x].freq. chr should be set when you initialize the array.

Click to expand...

Or, considering that there are a limited number of possible
characters, just declare a frequency array of UCHAR_MAX + 1 elements
and use the character to index into it.

That's true; however, when sorting you run into problems. So,

Z - 5
X - 4
B - 3
C - 9
H - 11

will turn into
Z - 11
X - 9
B - 5
C - 4
H - 3

when sorted, because obviously the indices are not changed.

jason_box · Apr 26, 2006

From reading some more in our book, I think a min-heap would be the
best to use? or would ti be a max heap since at the end of the
encoding, the max value will be computed from the two previous
children? Also I had a question on how i could store the values
generated by the frequency list that is read in through stdin. I have
the first column as the characters in ascii value adn the second column
thhe frequency and I would liek to store it somehow that they would
always link to each other. What would be teh best method to do the
follwing?

Thank you.

Andrew said:
Al said:

Assuming you meant 'frequency of characters':

I wrote a program like that for help with cryptograms... unfortunately,
it's on my laptop, so I can't simply copy-and-paste it onto here. Also,
it had a curses interface, and so I'd have to modify the display module
to post it.

Basically, you create a
struct letter {
int chr;
int freq;
}

Then make an array of those, and each time you encounter a character,
increase the list[x].freq. chr should be set when you initialize the array.

Click to expand...

Or, considering that there are a limited number of possible
characters, just declare a frequency array of UCHAR_MAX + 1 elements
and use the character to index into it.

Click to expand...

That's true; however, when sorting you run into problems. So,

Z - 5
X - 4
B - 3
C - 9
H - 11

will turn into
Z - 11
X - 9
B - 5
C - 4
H - 3

when sorted, because obviously the indices are not changed.

jason_box · Apr 26, 2006

From reading some more in our book, I think a min-heap would be the
best to use? or would ti be a max heap since at the end of the
encoding, the max value will be computed from the two previous
children? Also I had a question on how i could store the values
generated by the frequency list that is read in through stdin. I have
the first column as the characters in ascii value adn the second column
thhe frequency and I would liek to store it somehow that they would
always link to each other. What would be teh best method to do the
follwing?

Thank you.

Andrew said:
Al said:

Assuming you meant 'frequency of characters':

I wrote a program like that for help with cryptograms... unfortunately,
it's on my laptop, so I can't simply copy-and-paste it onto here. Also,
it had a curses interface, and so I'd have to modify the display module
to post it.

Basically, you create a
struct letter {
int chr;
int freq;
}

Then make an array of those, and each time you encounter a character,
increase the list[x].freq. chr should be set when you initialize the array.

Click to expand...

Or, considering that there are a limited number of possible
characters, just declare a frequency array of UCHAR_MAX + 1 elements
and use the character to index into it.

Click to expand...

That's true; however, when sorting you run into problems. So,

Z - 5
X - 4
B - 3
C - 9
H - 11

will turn into
Z - 11
X - 9
B - 5
C - 4
H - 3

when sorted, because obviously the indices are not changed.

Solve this riddle	1	Oct 10, 2023
Minimum Total Difficulty	0	Nov 15, 2023
Range / empty list issues??	1	Dec 11, 2023
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Dictionary and List	1	Apr 26, 2021
Help with frequency divider	2	Aug 4, 2009
Turtle program error. Please help!	1	Dec 10, 2023

frequency list

jason_box

Richard Heathfield

Al Balmer

jason_box

Richard Heathfield

jason_box

Al Balmer

Richard Heathfield

jason_box

Andrew Poelstra

Richard Heathfield

jason_box

jason_box

Richard Heathfield

jason_box

Keith Thompson

Al Balmer

Andrew Poelstra

jason_box

jason_box

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads