STRING MANIPULATION: Marking syllables

B

basi

Hello,
How might one express the following syllabification rules:

1. A c(c)v sequence is a syllable if followed by a cv sequence

pe.so (the dot indicates a syllable break)
pra.da

2. A c(c)vc sequence is a syllable if followed by a c

fal.do
bran.di

where:
c = consonant
v = vowel
(c) = optional consonant

Thank you for your help.
basi
 
W

William James

basi said:
How might one express the following syllabification rules:

1. A c(c)v sequence is a syllable if followed by a cv sequence

pe.so (the dot indicates a syllable break)
pra.da

2. A c(c)vc sequence is a syllable if followed by a c

fal.do
bran.di

where:
c = consonant
v = vowel
(c) = optional consonant

CC = {}
CC['v'] = '[aeiou]'
CC['c'] = '[^aeiou]'

class String
# Convert rule like "c(c)vc.c" to a regular expression.
def to_syllrule
re = "^("
self.scan( /\(?([cv.])(\)?)/ ) { |x|
if "." == x[0]
re << ")("
else
re << CC[ x[0] ]
re << "?" if ")" == x[1]
end
}
Regexp.new( re + ".*)" )
end
end

# Make a list of the rules as regular expressions.
rules = %w( c(c)v.cv c(c)vc.c ).inject([]){|a,s| a<< s.to_syllrule }

%w( peso prada faldo brandi ).each { |word|
rules.each { |re|
if word =~ re
puts $~.captures.join('.')
break
end
}
}

-----------------
Output:

pe.so
pra.da
fal.do
bran.di
 
A

Austin Ziegler

How might one express the following syllabification rules:
1. A c(c)v sequence is a syllable if followed by a cv sequence [...]
2. A c(c)vc sequence is a syllable if followed by a c

There are a couple of syntax (human language syntax) libraries
mentioned on RAA; don't remember the names offhand. For hyphenation
(not *quite* the same, mind you), you can always use Text::Hyphen.

With a regexp, I'd do those as:

# Note: this counts y as both vowel and consonant. This may not
# always result in correct syllable identification.
VOWELS =3D V =3D %r{[aieouy]}i
CONSONANTS =3D C =3D %r{[b-df-hj-np-tv-z]}i
c_opc_v =3D %r{#{C}#{C}?#{V}}
c_opc_v_c =3D %r{#{C}#{C}?#{V}#{C}}

There are other rules, I'm sure, because these two rules could be,
at least theoretically, converted into c(c)v(c).

-austin
--=20
Austin Ziegler * (e-mail address removed)
* Alternate: (e-mail address removed)
 
A

Austin Ziegler

CC =3D {}
CC['v'] =3D '[aeiou]'
CC['c'] =3D '[^aeiou]'

Note that your CC['c'] will catch 0-9 and punctuation as well.

-austin
--=20
Austin Ziegler * (e-mail address removed)
* Alternate: (e-mail address removed)
 
W

William James

Austin said:
There are other rules, I'm sure, because these two rules could be,
at least theoretically, converted into c(c)v(c).

After making the following change

rules = %w( c(c)v(c). ).inject([]){|a,s| a<< s.to_syllrule }

the output becomes

pes.o
prad.a
fal.do
bran.di
 
B

basi

Hi,
Yes, I did mean to inquire for references to human language parsing
libraries, but forgot in my initial email. Thank you for pointing me to
RAA. I will visit it right away.
basi
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,176
Messages
2,570,947
Members
47,498
Latest member
log5Sshell/alfa5

Latest Threads

Top