nasty regexp problem

F

Francis Hwang

Hi all,

I discovered a strange bug in Lafcadio which I'm trying to work out,
but the regexp is getting really nasty. Basically, to commit text
values to the database I need to double any apostrophe, except those
that are preceded by a backslash. This has to account for apostrophes
at the beginning of the string, and multiline strings where the
apostrophe is at the start of a new line. Examples would be:

"I can't drive 55" -> "I can''t drive 55"
"Don\'t escape here" -> "Don\'t escape here"
"'" -> "''"
"line 1\n' line 2" -> "line 1\n'' line 2"

So here's the regexp that does it:

value = value.gsub( /(^|[^\\\n])'/ ) { $& + "'" }

which works fine, except I just discovered that it fails in one odd
case:

"'''" -> "'''''"

(That is, a single line of three apostrophes should become six
apostrophes, but instead it becomes five.)

Any idea why this is doing it? I suspect I'm being too clever by
including the beginning of line in a grouping, and maybe that affects
how the regexp is processing the string?

F.
 
M

Mike Stok

Hi all,

I discovered a strange bug in Lafcadio which I'm trying to work out,
but the regexp is getting really nasty. Basically, to commit text
values to the database I need to double any apostrophe, except those
that are preceded by a backslash. This has to account for apostrophes
at the beginning of the string, and multiline strings where the
apostrophe is at the start of a new line. Examples would be:

"I can't drive 55" -> "I can''t drive 55"
"Don\'t escape here" -> "Don\'t escape here"
"'" -> "''"
"line 1\n' line 2" -> "line 1\n'' line 2"

So here's the regexp that does it:

value = value.gsub( /(^|[^\\\n])'/ ) { $& + "'" }

which works fine, except I just discovered that it fails in one odd
case:

"'''" -> "'''''"

(That is, a single line of three apostrophes should become six
apostrophes, but instead it becomes five.)

Any idea why this is doing it? I suspect I'm being too clever by
including the beginning of line in a grouping, and maybe that affects
how the regexp is processing the string?

F.

Well you can be crude and say

value.gsub(/(\\?')/) { |m| m.length == 1 ? "''" : m }

or some moral equivalent.

But I'm sure there's a nicer way.

Mike
 
S

Simon Strandgaard

Francis said:
I discovered a strange bug in Lafcadio which I'm trying to work out,
but the regexp is getting really nasty. Basically, to commit text
values to the database I need to double any apostrophe, except those
that are preceded by a backslash. This has to account for apostrophes
at the beginning of the string, and multiline strings where the
apostrophe is at the start of a new line. Examples would be:


irb(main):001:0> "''\\\\'z\\'\\z''".gsub(/ ( (?: [^\\] | \\ . )*? ) ' /x, '\1\'\'')
=> "''''\\\\''z\\'\\z''''"
 
D

David Alan Black

Hi --

Simon Strandgaard said:
Francis said:
I discovered a strange bug in Lafcadio which I'm trying to work out,
but the regexp is getting really nasty. Basically, to commit text
values to the database I need to double any apostrophe, except those
that are preceded by a backslash. This has to account for apostrophes
at the beginning of the string, and multiline strings where the
apostrophe is at the start of a new line. Examples would be:


irb(main):001:0> "''\\\\'z\\'\\z''".gsub(/ ( (?: [^\\] | \\ . )*? ) ' /x, '\1\'\'')
=> "''''\\\\''z\\'\\z''''"

The ' before the first z isn't supposed to get doubled though, is it?


David
 
K

Kaspar Schiess

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello Francis,

I have nothing to contribute to your problem, except that it looks
rather nasty. I would suggest at least two better ways of handling your
original problem in lafcadio (if I may):

a) if dbh is your DBI::DatabaseHandle, use dbh.quote to quote strings in
a db independent fashion. This however has failed on me on several
occasions, that is why I propose

b) Use ? and replace by bindings (much as I proposed in my patch to
lafcadio for BLOB's). That always works nicely, except in special places
in the sql statement.

String quoting is very database dependent.

Sorry for not answering any of your questions directly, you probably did
not want a roundabout answer.. :(

best regards,


- --
kaspar

semantics & semiotics
code manufacture

www.tua.ch/ruby
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFAo+HdFifl4CA0ImQRAtgXAKCunvBmHlauIm5M1Ap2j6zScfWBUACgpba/
bewum0SCv+tw6JGZbPA9QFU=
=GOqS
-----END PGP SIGNATURE-----
 
S

Simon Strandgaard

David said:
Simon Strandgaard said:
Francis said:
I discovered a strange bug in Lafcadio which I'm trying to work out,
but the regexp is getting really nasty. Basically, to commit text
values to the database I need to double any apostrophe, except those
that are preceded by a backslash. This has to account for apostrophes
at the beginning of the string, and multiline strings where the
apostrophe is at the start of a new line. Examples would be:


irb(main):001:0> "''\\\\'z\\'\\z''".gsub(/ ( (?: [^\\] | \\ . )*? ) ' /x, '\1\'\'')
=> "''''\\\\''z\\'\\z''''"

The ' before the first z isn't supposed to get doubled though, is it?

The behavier is correct.. because its an escaped slash right before it.
yes it can easily be confused with an ordinary slash.
 
A

Ara.T.Howard

Hi all,

I discovered a strange bug in Lafcadio which I'm trying to work out,
but the regexp is getting really nasty. Basically, to commit text
values to the database I need to double any apostrophe, except those
that are preceded by a backslash. This has to account for apostrophes
at the beginning of the string, and multiline strings where the
apostrophe is at the start of a new line. Examples would be:

"I can't drive 55" -> "I can''t drive 55"
"Don\'t escape here" -> "Don\'t escape here"
"'" -> "''"
"line 1\n' line 2" -> "line 1\n'' line 2"

So here's the regexp that does it:

value = value.gsub( /(^|[^\\\n])'/ ) { $& + "'" }

which works fine, except I just discovered that it fails in one odd
case:

"'''" -> "'''''"

(That is, a single line of three apostrophes should become six
apostrophes, but instead it becomes five.)

Any idea why this is doing it? I suspect I'm being too clever by
including the beginning of line in a grouping, and maybe that affects
how the regexp is processing the string?

F.

a "'" is escaped iff it is preceded with an odd number of "\" :


~ > cat a.rb
strings = <<-'txt'
I can't drive 55
Don\'t escape here
line 1\n' line 2
1 '
2 ''
3 '''
4 ''''
these are escaped \' \\\' \\\\\'
these are not escaped \\' \\\\' \\\\\\'
txt


pat = %r/([\\]*)'/o

strings.each do |string|
string.strip!
escaped =
string.gsub(pat) do
if $1.size % 2 == 0
"#{ $1 }''"
else
$&
end
end
puts "<#{ string }> -> <#{ escaped }>"
end


~ > ruby a.rb
<I can't drive 55> -> <I can''t drive 55>
<Don\'t escape here> -> <Don\'t escape here>
<line 1\n' line 2> -> <line 1\n'' line 2>
<1 '> -> <1 ''>
<2 ''> -> <2 ''''>
<3 '''> -> <3 ''''''>
<4 ''''> -> <4 ''''''''>
<these are escaped \' \\\' \\\\\'> -> <these are escaped \' \\\' \\\\\'>
<these are not escaped \\' \\\\' \\\\\\'> -> <these are not escaped \\'' \\\\'' \\\\\\''>


i have done this with a simple gsub - but's it's __really__ ugly... i think you
may be able to find it on the list if you look... in the end it's easier to
use a block and determine if you are dealing with and even or odd number of
'\'s

-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
| URL :: http://www.ngdc.noaa.gov/stp/
| TRY :: for l in ruby perl;do $l -e "print \"\x3a\x2d\x29\x0a\"";done
===============================================================================
 
F

Francis Hwang

Mike's recommendation ended up satisfying all my test cases, so that's
good enough for me. I have no problem with the solution being ugly,
particularly since in a difficult problem like this, ugly becomes
fairly relative.

Thanks for your notes on using DBI for this, Kaspar. Such changes seem
like they're not necessary now but will definitely be important at
some point, sooner or later ...

Francis
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,145
Messages
2,570,826
Members
47,373
Latest member
Desiree036

Latest Threads

Top