Removing accents from spanish characters

D

Duke of Hazard

If a user types in a spanish word with accents on some characters
(e.g. ~ on top of a), how can i remove just the accents and keep the
word?

Thanks,

Faraz
 
S

Steve (another one)

Duke said:
If a user types in a spanish word with accents on some characters
(e.g. ~ on top of a), how can i remove just the accents and keep the
word?

Thanks,

Faraz

tipex
 
G

Gunnar Hjalmarsson

Duke said:
If a user types in a spanish word with accents on some characters
(e.g. ~ on top of a), how can i remove just the accents and keep
the word?

Why would you like to do that?
 
J

Jürgen Exner

Duke said:
If a user types in a spanish word with accents on some characters
(e.g. ~ on top of a), how can i remove just the accents and keep the
word?

That's a contradiction in terms. If you change a letter then you have a
different word or more likely no word.

Or do you believe "in" and "ln" are the same word? After all "l" and "i" are
the same character, just that the "l" does not have the funny dot on top?

jue
 
H

Helgi Briem

That's a contradiction in terms. If you change a letter then you have a
different word or more likely no word.

Or do you believe "in" and "ln" are the same word? After all "l" and "i" are
the same character, just that the "l" does not have the funny dot on top?

It works fine in context, usually. We often need to write Icelandic
words without all the special Icelandic letters (ÁáÐðÉéÍíÓóÚúÝýÞþÆæÖö,
I have no idea if you can read these or not). It is quite
understandable 99.9% of the time.

I often need to use the following subroutine. The OP can modify
it for his own purposes.

sub fix_Icelandic_letters
{
for (@_)
{
tr/ÁÐÉÍÓÚÝÖáðéíóúýö/ADEIOUYOadeiouyo/;
s/Þ/Th/;
s/Æ/Ae/;
s/þ/th/;
s/æ/ae/;
return $_;
}
}
 
D

Duke of Hazard

Gunnar Hjalmarsson said:
Why would you like to do that?


I have a searchable database of cities that is used by people from all
over the world. So some people may type Sao Paulo with or without the
accent and I want my search function to treat both queries the same.

Faraz
 
B

Ben Morrow

If a user types in a spanish word with accents on some characters
(e.g. ~ on top of a), how can i remove just the accents and keep the
word?

With 5.8, you could try decomposing with Unicode::Normalize::NFD or NFKD
and then ditching all combining accent characters.

Ben
 
G

Gunnar Hjalmarsson

Duke said:
I have a searchable database of cities that is used by people from
all over the world. So some people may type Sao Paulo with or
without the accent and I want my search function to treat both
queries the same.

I see. Think I have seen a module that does that, but I'm not able to
find it now. Suppose the approach that Helgi suggested is sufficient.
 
G

gnari

Helgi Briem said:
sub fix_Icelandic_letters
{
for (@_)
tr/ÁÐÉÍÓÚÝÖáðéíóúýö/ADEIOUYOadeiouyo/;
s/Þ/Th/;
s/Æ/Ae/;
s/þ/th/;
s/æ/ae/;
return $_;
}
}

what's the deal with the return within the loop?
you realize this will just return the first argument (fixed),
so the loop has no real purpose, other than a fancy way
to do $_=$_[0]

gnari
 
J

Jürgen Exner

Gunnar said:
I see. Think I have seen a module that does that, but I'm not able to
find it now. Suppose the approach that Helgi suggested is sufficient.

Maybe String::Approx is a better approach?

jue
 
H

Helgi Briem

what's the deal with the return within the loop?
you realize this will just return the first argument (fixed),
so the loop has no real purpose, other than a fancy way
to do $_=$_[0]

No, gnari, you are wrong. It will return the substituted text.

Try this with and without the return. You can feed
the subroutine either and array or a scalar. It will
work with either.

#!perl
use warnings;
use strict;

my $text = 'Þjóðlegir þýskir ferðamenn líða ekki fúlar fréttir';
print "$text\n";
$text = fix_Icelandic_letters($text);
print "$text\n";

sub fix_Icelandic_letters
{
for (@_)
{
tr/ÁÐÉÍÓÚÝÖáðéíóúýö/ADEIOUYOadeiouyo/;
s/Þ/Th/;
s/Æ/Ae/;
s/þ/th/;
s/æ/ae/;
return $_;
}
}
 
G

Gunnar Hjalmarsson

Helgi said:
gnari said:
what's the deal with the return within the loop? you realize this
will just return the first argument (fixed), so the loop has no
real purpose, other than a fancy way to do $_=$_[0]

No, gnari, you are wrong. It will return the substituted text.

Try this with and without the return. You can feed the subroutine
either and array or a scalar. It will work with either.

No, it won't. If you feed it with an array (or a list), it will only
return the first element (as expected, since the return() function
interrupts the whole subroutine).

my @text = ('Þjóðlegir þýskir ferðamenn',
'líða ekki fúlar fréttir');
@text = fix_Icelandic_letters(@text);
print "@text\n";

Outputs:
Thjodlegir thyskir ferdamenn

But it can be fixed, of course:

sub fix_Icelandic_letters
{
for (@_)
{
tr/ÁÐÉÍÓÚÝÖáðéíóúýö/ADEIOUYOadeiouyo/;
s/Þ/Th/;
s/Æ/Ae/;
s/þ/th/;
s/æ/ae/;
}
wantarray ? @_ : $_[0];
}
 
W

Walter Roberson

: s/Þ/Th/;
: s/Æ/Ae/;
: s/þ/th/;
: s/æ/ae/;

What if there's more than one thorn in the text? Should those s//'s not
have /g ?
 
J

Joe Smith

Gunnar said:
my @text = ('Þjóðlegir þýskir ferðamenn',
'líða ekki fúlar fréttir');
@text = fix_Icelandic_letters(@text);
print "@text\n";

What about

my @orignal = ('Þjóðlegir þýskir ferðamenn',
'líða ekki fúlar fréttir');
my @modified = fix_Icelandic_letters(@original);
print "original = @original\n";
print "modified = @modified\n";

The posted solution modifies the originals.
-Joe
 
G

gnari

Joe Smith said:
What about

my @orignal = ('Þjóðlegir þýskir ferðamenn',
'líða ekki fúlar fréttir');
my @modified = fix_Icelandic_letters(@original);
print "original = @original\n";
print "modified = @modified\n";

The posted solution modifies the originals.
-Joe

I guess you are missing the point.
the problem with the (original) fix_Icelandic_letters()
is that it does not return
a fix for the second argument
(in addition to the missing /g in the s///'s)

or I am missing your point perhaps.
(apart from the typo in your code, of course)

gnari
 
G

Gunnar Hjalmarsson

Gunnar said:
sub fix_Icelandic_letters
{
for (@_)
{
tr/ÁÐÉÍÓÚÝÖáðéíóúýö/ADEIOUYOadeiouyo/;
s/Þ/Th/;
s/Æ/Ae/;
s/þ/th/;
s/æ/ae/;
}
wantarray ? @_ : $_[0];
}

Walter Roberson replied:
What if there's more than one thorn in the text?
Should those s//'s not have /g ?

Yes, they should. See new attempt below.

Joe Smith replied:
What about

my @orignal = ('Þjóðlegir þýskir ferðamenn',
'líða ekki fúlar fréttir');
my @modified = fix_Icelandic_letters(@original);
print "original = @original\n";
print "modified = @modified\n";

The posted solution modifies the originals.

That's undoubtedly a point, which I took care of in the new attempt
below: Now the original is modified only when the function is called
in void context, which is an approach that I personally like.

sub fix_Icelandic_letters {
my $ref = defined wantarray ? [ @_ ] : \@_;
for ( grep defined, @$ref ) {
tr/ÁÐÉÍÓÚÝÖáðéíóúýö/ADEIOUYOadeiouyo/;
s/Þ/Th/g;
s/Æ/Ae/g;
s/þ/th/g;
s/æ/ae/g;
}
@$ref > 1 ? @$ref : $$ref[0]
}

What do you say, Helgi?
 
H

Henry Law

my $text = 'Þjóðlegir þýskir ferðamenn líða ekki fúlar fréttir';

unless ($worried_about_flames) {
print "What a wonderful looking language\n",
"What does that mean in English?";
}


Henry Law <>< Manchester, England
 
G

gnari

Henry Law said:
unless ($worried_about_flames) {
print "What a wonderful looking language\n",
"What does that mean in English?";

Something like
Traditional german tourists do not tolerate bad news

gnari
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,146
Messages
2,570,832
Members
47,374
Latest member
EmeliaBryc

Latest Threads

Top