C++ code for parsing syllables?

me · Mar 18, 2010

I'm pulling my hair out trying to figure out code for
parsing and counting syllables in simple English
sentences.

Can someone throw the dog a bone on where to start?

Michael Angelo Ravera · Mar 18, 2010

I'm pulling my hair out trying to figure out code for
parsing and counting syllables in simple English
sentences.

Can someone throw the dog a bone on where to start?

This isn't really a C++ question, but a Computational Linguistics
question.

The first step is in recognizing vowel groups. Once you recognize
vowel groups, you can try to determine whether the group forms 1or
more sylables.

Kai-Uwe Bux · Mar 18, 2010

Andy said:
Daniel said:

Google is your friend:
http://english.glendale.cc.ca.us/phonics.rules.html

Click to expand...

<snip> [..]
This web site

http://www.wordcalc.com/

seems to do what you want. Except... "The rhythm of life" contains two
syllables. Half a syllable per word.

Good luck. This is a hard problem.

Maybe from a linguistic point of view, it is hard. But algorithmically, it
seems somewhat easy: English has about 1,000,000 words (with very inclusive
counting) and the number of syllables in each of them is known. So just do a
table look-up. This algorithm also has the advantage of being applicable to
any language (and it will be easier as English has a huge vocabulary).

It's a finite problem and in fact smaller than, say, the problem of finding
phone numbers based on name and address. The interesting part would be to
use frequency information about words to make the look-up fast; or to find a
good data structure to reduce memory consumption.

Of course, there is the issue of words being added to the language. However,
a rule based algorithm should not be expected to cope with the new words
either: its rules are just designed to deal with the known words.

Best

Kai-Uwe Bux

Daniel Pitts · Mar 18, 2010

Andy said:
Andy said:

Daniel said:

(e-mail address removed) wrote:

I'm pulling my hair out trying to figure out code for
parsing and counting syllables in simple English
sentences.

Can someone throw the dog a bone on where to start?

Google is your friend:
http://english.glendale.cc.ca.us/phonics.rules.html
<snip>

Click to expand...

[..]
This web site

http://www.wordcalc.com/

seems to do what you want. Except... "The rhythm of life" contains two
syllables. Half a syllable per word.

Good luck. This is a hard problem.

Click to expand...

Maybe from a linguistic point of view, it is hard. But algorithmically, it
seems somewhat easy: English has about 1,000,000 words (with very inclusive
counting) and the number of syllables in each of them is known. So just do a
table look-up. This algorithm also has the advantage of being applicable to
any language (and it will be easier as English has a huge vocabulary).

It's a finite problem and in fact smaller than, say, the problem of finding
phone numbers based on name and address. The interesting part would be to
use frequency information about words to make the look-up fast; or to find a
good data structure to reduce memory consumption.

How about a hash-map for both of those.

Actually, with only 1 million words, the entirety of the data structure
can easily fit in memory on even the cheapest of today's desktop/server
machines (mobile/embedded are a different story). Making look up
extremely fast.

me · Mar 19, 2010

Daniel T. said:
The exceptions remind me of a joke by Emo Phillips.

Most states do not end in the letter "a." The only ones that do are
Alabama, Georgia, Florida, Louisiana, Oklahoma, Arizona, California,
Nevada, Alaska, Montana, Nebraska, South Dakota, North Dakota,
Minnesota, Iowa, Indiana, Pennsylvania, North Carolina, South
Carolina, West Virginia, east Virginia, and Missouri.

That's funny!

I live in MissourA as well!!

James Kanze · Mar 19, 2010

Pay special attention to rule 1.

The rhythm can be foretold by looking at where the vowels are,
right? So "rhythm" has ... err... two syllables, because it's
split by the Y which counts as a vowel,

The y is the only possible vowel, so rhythm can't have more than
one syllable. Except that as I hear it (and according to
dictionaries), it has two: in this case, the m acts as a
syllable.

whereas "foretold" obviously has three syllables, centred
around the three vowels.

Or is that centered?

Rule 7 and the second point under 1 in the Basic Syllable Rules
do imply that silent e's don't count

. (Of course, they don't
give any hint as to how a program is to determine whether an e
is silent or not.)

This web site

seems to do what you want. Except... "The rhythm of life"
contains two syllables. Half a syllable per word.

Good luck. This is a hard problem.

To put it mildly. Compare "ccoper" with the beginning of
"cooperation".

And that's without internationalization: the rules will be
distinctly different in French or in German than in English.

For starters, you'll probably want to see
http://tug.org/docs/liang/. To my knowledge, no one has done
better since (and it works for all, or at least most languages,
with a simple replacement of machine generated tables).

Seeking co-founders for my company.	3	Sep 8, 2024
Code For New Image Every Time The Page Refreshes?	0	Dec 11, 2021
Homework in C - Help Needed	1	Oct 16, 2024
Who are low code solutions designed for?	1	Oct 22, 2023
I'm tempted to quit out of frustration	1	Aug 13, 2023
C programing code	6	Aug 1, 2023
Chatbox for website	0	Oct 16, 2024
How to try a range of hex values in C# code ?	0	Nov 19, 2022

C++ code for parsing syllables?

me

Michael Angelo Ravera

Kai-Uwe Bux

Daniel Pitts

me

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads