Substitution with Hash

L

Lee Jarvis

Ok i'll try to explain what i mean as well as i can

Lets say i have a hash like this

hash { 'a' => '1' } #just as example, its actually far bigger

and if a user inputs abcdabcd i was it to sub all of the a's with 1's..

As i said, the hash is far larger which is why i can't just do it with
gsub..

Any ideas?

Thanks in advance..

Lee
 
L

Lionel Bouton

Lee Jarvis wrote the following on 11.09.2007 12:41 :
Ok i'll try to explain what i mean as well as i can

Lets say i have a hash like this

hash { 'a' => '1' } #just as example, its actually far bigger

and if a user inputs abcdabcd i was it to sub all of the a's with 1's..

As i said, the hash is far larger which is why i can't just do it with
gsub..

Any ideas?

Thanks in advance..

Lee

yourstring.split(//).map{|c| hash[c] || c}.join
 
L

Lionel Bouton

Lionel Bouton wrote the following on 11.09.2007 12:48 :
Lee Jarvis wrote the following on 11.09.2007 12:41 :
Ok i'll try to explain what i mean as well as i can

Lets say i have a hash like this

hash { 'a' => '1' } #just as example, its actually far bigger

and if a user inputs abcdabcd i was it to sub all of the a's with 1's..

As i said, the hash is far larger which is why i can't just do it with
gsub..

Any ideas?

Thanks in advance..

Lee

yourstring.split(//).map{|c| hash[c] || c}.join
Note that if your hash is only used to convert single characters to
single characters, you can use String#tr (or tr!). If you are after
performance, as you must prepare the strings used by String#tr from your
hash, you'll have to bench it to see if it's worth it in your use case
even if String#tr is faster in itself.
If you are processing UTF-8 content, String#tr is probably not safe
(there are libraries out there for fixing this though IIRC), but my
first answer probably is (assuming $KCODE='u'; require 'jcode'...) as
the regexp processing is utf-8 aware, so the String#split should be safe.

Lionel
 
L

Lee Jarvis

Thanks that worked well, And no its not single chars, Which is the only
reason i'm doing it this way..

I have to split on whitespace (/ /) because spliting on characters would
obviously split the text i want to transform, which means it wont match
if the characters are trailing another word, HTML special chars for
example

h = {"~" => "~"}

"hmm ~'.split(/ /).map{|c| h[c] || c}.join(' ')

Outputs hmm ~, but obviously doing things like question marks wont work,
Maybe i'll have to use loops and string#tr
 
R

Robert Klemme

2007/9/11 said:
Thanks that worked well, And no its not single chars, Which is the only
reason i'm doing it this way..

I have to split on whitespace (/ /) because spliting on characters would
obviously split the text i want to transform, which means it wont match
if the characters are trailing another word, HTML special chars for
example

h = {"~" => "~"}

"hmm ~'.split(/ /).map{|c| h[c] || c}.join(' ')

Outputs hmm ~, but obviously doing things like question marks wont work,
Maybe i'll have to use loops and string#tr

I'd rather not do the split step, IMHO direct replacement will be faster:

h = {"#126" => "~"}
s.gsub(/&([^;]+);/) {|c| h[c] || "&#{c};"}

Btw, I believe there are standard classes that do this type of
replacement (entities in HTML documents) - maybe it's in CGI.

Kind regards

robert
 
L

Lionel Bouton

Robert said:
h = {"~" => "~"}

"hmm ~'.split(/ /).map{|c| h[c] || c}.join(' ')

Outputs hmm ~, but obviously doing things like question marks wont work,
Maybe i'll have to use loops and string#tr

I'd rather not do the split step, IMHO direct replacement will be faster:

If it's all for html entities yes. I'm not sure of what the actual use
case is though.
h = {"#126" => "~"}
s.gsub(/&([^;]+);/) {|c| h[c] || "&#{c};"}

Btw, I believe there are standard classes that do this type of
replacement (entities in HTML documents) - maybe it's in CGI.

The htmlentities gem (more robust than CGI with UTF-8...) is quite good.
 
D

Daniel DeLorme

Lee said:
Thanks that worked well, And no its not single chars, Which is the only
reason i'm doing it this way..

I have to split on whitespace (/ /) because spliting on characters would
obviously split the text i want to transform, which means it wont match
if the characters are trailing another word, HTML special chars for
example

h = {"~" => "~"}

If you're just trying to translate numeric html entities it's easy:
str.gsub(/&#(\d+);/){ [$1.to_i].pack('U') }
If you also want named entities I suggest the htmlentities gems.
If it's for a more general case, how about:
rx = Regexp.new(hash.keys.map{|k|Regexp.escape(k)}.join("|"))
str.gsub(rx){ hash[$&] }

Daniel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
474,264
Messages
2,571,323
Members
48,006
Latest member
MelinaLema

Latest Threads

Top