replace words in string with hash values

W

wana

foreach (keys %h)
{
$a =~ s/\b$_\b/$h{$_}/g;
}

I want to replace matches in string to hash key with hash value. I am
replacing acronyms with phrases where acronym is hash key. Is there a
better or different way?

thanks!

wana (on pda)
 
G

Gunnar Hjalmarsson

wana said:
foreach (keys %h)
{
$a =~ s/\b$_\b/$h{$_}/g;
}

I want to replace matches in string to hash key with hash value. I am
replacing acronyms with phrases where acronym is hash key. Is there a
better or different way?

You have Perl search the whole string as many times as there are keys in
the hash. With this approach, searching the string once is sufficient:

my $keys = join '|', keys %h;
$a =~ s/($keys)/$h{$1}/g;
 
P

Paul Lalli

wana said:
foreach (keys %h)
{
$a =~ s/\b$_\b/$h{$_}/g;
}

I want to replace matches in string to hash key with hash value. I am
replacing acronyms with phrases where acronym is hash key. Is there a
better or different way?

Well, here's one that's different, though not necessarily better (in
fact, quite likely worse)

$a=~ s/\b(\B+)\b/$h{$1} or $1/ge;

Rather than searching the string for each hash key, this one would
search each word in the string to determine if it is in the hash (more
correctly stated: if it has a true value in the hash). If so, replace
it with the hash value, if not, leave it as is.

Benchmarking is left as an excercise to the OP. ;-)

One comment, however: Be careful about the use of \b. While \b does
mean 'word boundary', it means Perl's definition of a 'word', which is:
[0-9a-zA-Z_]+ That means that "don't" is two words: "don" and "t".
This may or may not be what you actually want.

Paul Lalli
 
M

Michele Dondi

foreach (keys %h)
{
$a =~ s/\b$_\b/$h{$_}/g;
}

I want to replace matches in string to hash key with hash value. I am
replacing acronyms with phrases where acronym is hash key. Is there a
better or different way?

Don't know if it's "better" (depends on far too many different
things!), but

s/\b\w+\b/$h{$&}||$&/ge;

should do the job. Of course if your acronyms follow some convention
(e.g. 2 to 4 uppercase letters only) it could be improved
performance-wise:

s/\b[A-Z]{2,4}\b/$h{$&}||$&/ge;


HTH,
Michele
 
T

Tad McClellan

Gunnar Hjalmarsson said:
You have Perl search the whole string as many times as there are keys in
the hash. With this approach, searching the string once is sufficient:

my $keys = join '|', keys %h;
$a =~ s/($keys)/$h{$1}/g;


But you better put the word boundaries back in though!
 
T

Tad McClellan

The only thing I've got to add to what's already been
said is to watch out for regexp metacharacters in your
hash keys. I have no idea what you might count as an
"acronym". If any metacharacters do appear, you can
use \b\Q$_\E\b as your regexp to quote them.


But if a metachar could be the first or last char, then
the word boundary probably won't match where you want it to...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,898
Members
47,439
Latest member
shasuze

Latest Threads

Top