replacing diacritics by simple character

U

Une Bévue

do u know of a way to replace diacritics by simple character (ie. : é
-o-> e)

the same with ligatures (ie. : Æ -o-> AE )

using tables ?
 
F

F. Senault

Le 25 septembre à 18:25, Une Bévue a écrit :

(Hello again... :) )
do u know of a way to replace diacritics by simple character (ie. : é
-o-> e)

the same with ligatures (ie. : Æ -o-> AE )

using tables ?

IConv can do that for you :
require "iconv" => true
i = Iconv.new("ASCII//TRANSLIT", "ISO-8859-15")
=> # said:
i.iconv("aéouï Æ") => "a'eou"i AE"
i.iconv("aéouï Æ").gsub(/[^a-zA-Z0-9 ]/, '')
=> "aeoui AE"

Fred
 
M

Michal Suchanek

--
I've found an axe can do a lot for a paper-mangling printer. Especially
if you shout for one at the top of your voice, and then a cow orker --------------------------------------------------------------------------------------^
???
brings you said instrument. Suddenly, no more paper jams.
(Kai Henningsen in the SDM)

:D
 
P

PA

IConv can do that for you :

An alternative approach is something like Sean M. Burke's=20
Text::Unidecode:

http://interglacial.com/~sburke/tpj/as_html/tpj22.html
http://search.cpan.org/~sburke/Text-Unidecode-0.04/lib/Text/Unidecode.pm


Here is an example of an implementation of Unidecode in Lua [1]:

local Unidecode =3D require( 'Unidecode' )

print( Unidecode( '=D0=9C=D0=BE=D1=81=D0=BA=D0=B2=D0=B0=CC=81' ) )
print( Unidecode( '=E5=8C=97=E4=BA=AC' ) )
print( Unidecode( '=E1=BC=88=CE=B8=CE=B7=CE=BD=E1=BE=B6' ) )
print( Unidecode( '=EC=84=9C=EC=9A=B8' ) )
print( Unidecode( '=E6=9D=B1=E4=BA=AC' ) )
print( Unidecode( '=E4=BA=AC=E9=83=BD=E5=B8=82' ) )
print( Unidecode( '=E0=A4=A8=E0=A5=87=E0=A4=AA=E0=A4=BE=E0=A4=B2' ) )
print( Unidecode( '=D7=AA=D6=B5=D6=BC=D7=9C=D6=BE=D7=90=D6=B8=D7=91=D6=B4=D7=
=99=D7=91-=D7=99=D6=B8=D7=A4=D7=95=D6=B9' ) )
print( Unidecode( '=D8=AA=D9=8E=D9=84=D9=92 =D8=A3=D9=8E=D8=A8=D9=90=D9=8A=
=D8=A8=D9=92 =D9=8A=D9=8E=D8=A7=D9=81=D9=8E=D8=A7' ) )
print( Unidecode( '=D8=AA=D9=87=D8=B1=D8=A7=D9=86' ) )
print( Unidecode( 'G=C3=A9ometrie Diff=C3=A9rentielle' ) )
Moskva
beijing
Athena
seoul
dongjing
jingdushi
nepaal
te'labiyb-yapvo
tal 'abiyb yaafaa
thran
Geometrie Differentielle

Cheers,

PA.

[1] http://dev.alt.textdrive.com/browser/HTTP/Unidecode.lua=
 
F

F. Senault

Le 25 septembre à 20:12, Michal Suchanek a écrit :
--------------------------------------------------------------------------------------^
???

It's intentional. Cow orker was probably a typo in the olden times, but
has entered the mainstream since then. Just ask google : "Results 1 -
10 of about 37,200 for "cow orker". (0.19 seconds)" :)

Fred
 
U

Une Bévue

Daniel DeLorme said:
That doesn't work on all platforms. For me:

=> "a?ou? AE"

:-(

Are u sure about the encoding of "aéouï Æ" ?

because i did it with UTF-8, it works :

-- the script ----------------------------------------------------------
#! /usr/bin/env ruby

require "iconv"

i = Iconv.new("ASCII//TRANSLIT", "UTF-8")

p i.iconv("aéouï Æ")
# => "a'eou\"i AE"

p i.iconv("aéouï Æ").gsub(/[^a-zA-Z0-9 ]/, '')
# => "aeoui AE"

p i.iconv("Être ou ne pas être, c'est la question. aéouï Æ, wie heiß du
?").gsub(/[^a-zA-Z0-9' ]/, '').gsub(/[' ]/, '_').gsub(/(.*)_$/, '\1')
# => "Etre_ou_ne_pas_etre_c_est_la_question_a_eoui_AE_wie_heiss_du"

p i.iconv("Être ou ne pas être, c'est la question. aéouï Æ, wie heiß
du?").gsub(/[^a-zA-Z0-9' ]/, '').gsub(/[' ]/, '_').gsub(/(.*)_$/, '\1')
# => "Etre_ou_ne_pas_etre_c_est_la_question_a_eoui_AE_wie_heiss_du"
 
D

Daniel DeLorme

Une said:
Are u sure about the encoding of "aéouï Æ" ?
yep.
=> "a?ou? AE"

but like I said, translit doesn't work the same on all platforms (I'm on
ubuntu btw)

Daniel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,264
Messages
2,571,324
Members
48,009
Latest member
KobyH55844

Latest Threads

Top