Determining Syllables

P

pemo

Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?

FYI, there are two approaches I'm currently considering.

One is to reduce/convert (somehow) the word into its IPA equivalent, e.g.,
'parliament' becomes 'plm()nt' (apologies if that doesn't come out too
right!) - parsing that is, I believe, straight forward. However, I can't
find a program to convert English words into their IPA equivalents, and, so
I'm currently stuck with using a dictionary - not so bad, until a word isn't
in the dictionary I'm using!

Another approach might be to modify a good hyphenating algorithm; as I'm
lead to believe that these usually insert a hyphen at a syllable boundary.
However, how they do that (determine the point), and whether it's even true,
I just don't know.

I've also had a look at the Flesch readability stuff - but it's probably not
going to be accurate enough for what I need it for.
 
G

Giannis Papadopoulos

pemo said:
Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?

FYI, there are two approaches I'm currently considering.

One is to reduce/convert (somehow) the word into its IPA equivalent, e.g.,
'parliament' becomes 'plm()nt' (apologies if that doesn't come out too
right!) - parsing that is, I believe, straight forward. However, I can't
find a program to convert English words into their IPA equivalents, and, so
I'm currently stuck with using a dictionary - not so bad, until a word isn't
in the dictionary I'm using!

Another approach might be to modify a good hyphenating algorithm; as I'm
lead to believe that these usually insert a hyphen at a syllable boundary.
However, how they do that (determine the point), and whether it's even true,
I just don't know.

I've also had a look at the Flesch readability stuff - but it's probably not
going to be accurate enough for what I need it for.

This is OT. It has nothing to do with C...


--
one's freedom stops where other's begin

Giannis Papadopoulos
http://dop.users.uth.gr/
University of Thessaly
Computer & Communications Engineering dept.
 
R

Rouben Rostamian

Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?

I don't know about comp.programming, but this topic certainly
does not belong to comp.lang.c

Perhaps the information on the following page may be of some help:

http://www.tex.ac.uk/cgi-bin/texfaq2html?label=hyphen

Read that then do a web search on "hyphenation Frank Liang".
 
A

Arthur J. O'Dwyer

[followups set to c.p, since this is not a C question]
Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?

FYI, there are two approaches I'm currently considering.

One is to reduce/convert (somehow) the word into its IPA equivalent, e.g.,
'parliament' becomes 'plm()nt' (apologies if that doesn't come out too
right!) - parsing that is, I believe, straight forward. However, I can't
find a program to convert English words into their IPA equivalents,

Not surprising, since that would be equivalent to counting the syllables
in English words, and that's not an algorithmic problem.
English doesn't follow strictly algorithmic rules, because it's not
strictly phonetic. I could come along tomorrow and make up a word, like
"Worcestershire," and make up a pronunciation for it, like "wooster," and
any computer program in the word wouldn't be able to figure that out from
the spelling. Heck, most /humans/ don't know how every English word is
pronounced, and we have many, many man-years to study the problem!

[...]
Another approach might be to modify a good hyphenating algorithm; as I'm
lead to believe that these usually insert a hyphen at a syllable boundary.
However, how they do that (determine the point), and whether it's even true,
I just don't know.

Yes, a good hyphenation algorithm can be /very/ good. The basic rule of
good hyphenation is to come up with sets of English words that all have a
hyphenation point in the same general context, and then remember the
context. For example, if you see a word ending in -ible, you can hyphenate
it there, unless it ends in c-ible or g-ible, in which case you can't. You
can generally hyphenate before -str, or after hy-. And so on.
The basic research for hyphenation patterns in English has already been
done several times, e.g. by Frank Liang for TeX, but I don't know anywhere
you could get patterns for syllable counting. Still, I'd start by
downloading the TeX hyphenation patterns, and using them to find every
single hyphenation point in your word. Then it would probably be a good
idea to discard any segments that don't contain any vowels (but I'm sure
there are exceptions, and not just "nth" and "ssh").
I've also had a look at the Flesch readability stuff - but it's probably not
going to be accurate enough for what I need it for.

Really? One of the inputs to the Flesch readability formula /is/ the
number of syllables in the text. So if you can find a program that claims
to accurately compute Flesch scores, go with it! (I doubt such programs
exist, though. A Google search turned up Flesh,
http://jack.gravco.com/flesh.html, but it thinks "birthday" has one
syllable, so I didn't bother investigating any further.)

Actually, given the application to Flesch readability computations, I
might be interested in the syllable-counting problem. If you get anything
working, would you let me know? And I'll post here if I find anything
clever --- but don't hold your breath.

-Arthur
 
W

Walter Roberson

Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?

It cannot be done on a stand-alone basis. The same character string
might be multiple words with different pronunciations and
different syllable boundaries, so one would have to be able to
deduce which of the words was meant by examining surrounding context.

I played around informally with syllabification a few decades ago,
and eventually realized that in English (or Canadian English anyhow)
the proper syllabification depended upon the atress points ("accents"),
and was also tied in with whether particular vowels were long or short.
You have to "look ahead": syllables can change depending upon the
suffixes one adds... and if one then adds further suffixes,
they can change again.
 
M

Malcolm

pemo said:
Does anyone know of an algorithm that can accurately determine the number
of syllables in a given English word - esp. if that word isn't already
'known' by such an algorithm?

FYI, there are two approaches I'm currently considering.
It's a machine learning problem. Try Hidden Markov Models or neural
networks. However the language you chose to implement such an algorithm in
will be the least of your problems, so comp.lang.c isn't very relevant.

Look at the NETtalk program. That used a neural network to convert text to
speech (phonemes) and could easily be modified to count syallables per word,
I would imagine.
 
C

Chris Croughton

Does anyone know of an algorithm that can accurately determine the number of
syllables in a given English word - esp. if that word isn't already 'known'
by such an algorithm?

FYI, there are two approaches I'm currently considering.

One is to reduce/convert (somehow) the word into its IPA equivalent, e.g.,
'parliament' becomes 'plm()nt' (apologies if that doesn't come out too
right!) - parsing that is, I believe, straight forward.

Well, you've got a problem right there -- some people say par'-luh-munt
and others say par'-li-a-ment. Lots of proper names (places as well
as people) have that effect -- is Worcester three syllables or two?
Aylesbury (ails'-bri or ails'-buh-ri)? Chol-mon-de-ly or Chum-ley?
Tall-i-a-fe-ro or Tol-i-ver? Michael pronounced mikh'-ah-el or mI'-kel?
Is Catherine kath-uh-rin or kath-rin? Con-sid-er-ing or con-sid-ring?
Equiv-a-lent or equ-v-lent? Al-go-rithm or Al-go-rith-um?
Dic-shun-ar-y or dik-shun-ry?
However, I can't find a program to convert English words into their
IPA equivalents, and, so I'm currently stuck with using a dictionary -
not so bad, until a word isn't in the dictionary I'm using!

Since there is no fixed rendering of English words into phonetic form,
in a lot of cases (even the OED often describes several different
pronunciations) it's not surprising that you can't find one which works
well.

And if you want words which aren't 'known' all bets are off, since that
includes technical and foreign words 'imported' into the language...

(Followups to comp.programming)

Chris C
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,169
Messages
2,570,915
Members
47,456
Latest member
JavierWalp

Latest Threads

Top