Method to groom a string to floating point representation

A

Alex DeCaria

I have a program that asks for the user to enter a string that
represents a floating point number. Everytime a new character is typed
I want a method that checks to make sure the string makes sense as a
floating point number, and if not, deletes any bad characters. For
instance, if the user enters '4.5e+6.7' I want the method to delete the
extra decimal place and return '4.5e+67'. Or, if the user enters
something like '4.5+e7' it deletes the misplaced plus sign and returens
'4.7e7'. In short, I want the method to only allow correct
representations of floating point numbers, but I want it to remain as a
string. Anything other than a number or +, -, ., or e or E, should be
deleted.

I wrote a method that works like I want (attached), but it is long and
cumbersome. I'm wondering if anyone has a shorter, better way to do
this.

--Alex

Attachments:
http://www.ruby-forum.com/attachment/4653/clean_string_lite.rb
 
J

Josh Cheek

[Note: parts of this message were removed to make it a legal post.]

I have a program that asks for the user to enter a string that
represents a floating point number. Everytime a new character is typed
I want a method that checks to make sure the string makes sense as a
floating point number, and if not, deletes any bad characters. For
instance, if the user enters '4.5e+6.7' I want the method to delete the
extra decimal place and return '4.5e+67'. Or, if the user enters
something like '4.5+e7' it deletes the misplaced plus sign and returens
'4.7e7'. In short, I want the method to only allow correct
representations of floating point numbers, but I want it to remain as a
string. Anything other than a number or +, -, ., or e or E, should be
deleted.

I wrote a method that works like I want (attached), but it is long and
cumbersome. I'm wondering if anyone has a shorter, better way to do
this.

--Alex

Attachments:
http://www.ruby-forum.com/attachment/4653/clean_string_lite.rb
It would probably be easier if you provided a set of tests we could check
our function against, where we could be confident our function was correct
once it passed all the tests.
 
A

Alex DeCaria

Josh said:
It would probably be easier if you provided a set of tests we could
check
our function against, where we could be confident our function was
correct
once it passed all the tests.

Here are some examples of what it should do:

Delete any characters other than digits, +, -, e, E, or .:
'-24.5fge4x'5 => '-24.5e45'

Delete any extra decimals:
'2.4.5' => '2.45'
'2..45' => '2.45'

Delete any decimals in an exponent:
'245e7.6' => '2.45e76'

Delete any extra or misplaced + or – signs:
'+45-68+e+45-' => '4568e+45'

Delete any extra or misplaced ‘e’ or ‘E’ characters (first occurance of
'e' or 'E' has precedence unless it doesn't make sense):
'4.67e6e-7' => '4.67e67'
'+e4.67e-7' => '+4.67e-7'

The motivation for this is for a GUI input textbox, so that if the user
enters a bad string it automatically corrects it to a valid
floating-point representation in string form before converting to a
floating-point for calculations. I toyed with just doing
str = str.to_f.to_s
and letting Ruby figure out the floating point respesentation, but I'd
like more control over how the string is converted to floating point
representation. For example, I want
'2..45e9' => '2.45e9', whereas '2..45e9'.to_f.to_s => '2.0'

--Alex
 
J

Josh Cheek

On Tue, Apr 13, 2010 at 6:20 AM, Alex DeCaria <[email protected]=
u
Delete any decimals in an exponent:
'245e7.6' =3D> '2.45e76'

Where did the dot in between 2 and 4 come from? Am I interpreting the Strin=
g
or just cleaning it?

Delete any extra or misplaced + or =96 signs:
'+45-68+e+45-' =3D> '4568e+45'

Delete any extra or misplaced =91e=92 or =91E=92 characters (first occura= nce of
'+e4.67e-7' =3D> '+4.67e-7'
Why does the plus in front of 45 in the first one go away, but the plus in
front of the e in the second one stays?

-----

This is what I have so far, please check and correct any tests that should
be different

def clean_string( str , options =3D Hash.new )
str =3D~ /\A([-+]?)([^eE.]*\.?)([^eE]*)((?:[eE][+-]?)?)([^Z]*)\Z/
posneg , prepre , postpre , e , post =3D $1 , $2 , $3 , $4 , $5
posneg + prepre + postpre.gsub(/[^0-9]/,'') + e + post.gsub(/[^0-9]/,'')
end

require 'test/unit'
class TestCleanString < Test::Unit::TestCase
def test_delete_chars
assert_equal '-24.5e45' , clean_string('-24.5fge4x5')
end
def test_delete_extra_decimal
assert_equal '2.45' , clean_string('2.4.5')
assert_equal '2.45' , clean_string('2..45')
assert_equal '2.45' , clean_string('2...45')
end
def test_delete_extra_decimal_in_exponent
assert_equal '245e76' , clean_string('245e7.6') # you said this should
be '2.45e76' , but where did first dot come from?
end
def test_delete_extra_or_misplaced_pos_and_neg_signs
assert_equal '4568e+45' , clean_string('+45-68+e+45-')
end
def test_delete_extra_or_misplaced_e_or_E
assert_equal '4.67e67' , clean_string('4.67e6e-7')
assert_equal '+4.67e-7' , clean_string('+e4.67e-7')
end
end
 
J

Jean-Julien Fleck

Hello,

2010/4/14 Josh Cheek said:
Where did the dot in between 2 and 4 come from? Am I interpreting the Str= ing
or just cleaning it?

As said Josh, here you are interpreting the string rather than
cleaning it. 245e76 is a valid float, just not in the usual 2.45e78
form.

BTW, I would rather not do any cleaning under the hood: let the user
correct its input himself. For example, give the input to Float() and
if an error is raised (which Float does as opposed to to_f which never
raise an error), rescue it by giving feedback to the user (where you
could use your method to propose an alternative if you want) but do
not continue without letting the user know he has made a mistake and
giving him the ability to change his mind.

Cheers,

--=20
JJ Fleck
PCSI1 Lyc=E9e Kl=E9ber
 
A

Alex DeCaria

Where did the dot in between 2 and 4 come from? Am I interpreting the
String
or just cleaning it?

This was a typo on my part. It should have read:
'245e7.6' => '245e76'
Why does the plus in front of 45 in the first one go away, but the plus
in
front of the e in the second one stays?

Again, a typo on my part. It should have been:
'+45-68+e+45-' => '+4568e+45'

This is what I have so far, please check and correct any tests that
should
be different

Thank! I'll check the code you gave me and see how it does.

--Alex
 
A

Alex DeCaria

BTW, I would rather not do any cleaning under the hood: let the user
correct its input himself. For example, give the input to Float() and
if an error is raised (which Float does as opposed to to_f which never
raise an error), rescue it by giving feedback to the user (where you
could use your method to propose an alternative if you want) but do
not continue without letting the user know he has made a mistake and
giving him the ability to change his mind.

Cheers,

I didn't realize the difference between Float() and .to_f. Thanks for
the suggestion.

The user is still aware if they entered an incorrect string, since they
are entering it into a GUI textbox, and the string cleaning is done
after each character is entered. Thus, if they try to enter a misplaced
+ sign or another bad character, they won't see it appear in the
textbox, which should cause them to notice it.

--Alex
 
A

Alex DeCaria

Josh said:
This is what I have so far, please check and correct any tests that
should
be different

Josh,

Your code works great! I knew there had to be a more elegant way to do
this rather than my brute force method.

The only test it didn't seem to work on was eliminating extra + or -
signs, such as '+45-2+8' => '+4528', but now that I see what you are
doing I can probably figure out how to do that. I definitely need to
learn more about regular expressions!

Thanks for your time and effort.

--Alex
 
J

Jean-Julien Fleck

Hello Alex,
The user is still aware if they entered an incorrect string, since they
are entering it into a GUI textbox, and the string cleaning is done
after each character is entered. =A0Thus, if they try to enter a misplace= d
+ sign or another bad character, they won't see it appear in the
textbox, which should cause them to notice it.

Well, then you can't use the Float() trick because 1.0e3 is a valid
but 1.0e is not.
Then there will be a lot of strings your user won't be able to type
even if they are valid in the end.

Cheers,

--=20
JJ Fleck
PCSI1 Lyc=E9e Kl=E9ber
 
A

Alex DeCaria

Jean-Julien Fleck said:
Hello Alex,


Well, then you can't use the Float() trick because 1.0e3 is a valid
but 1.0e is not.
Then there will be a lot of strings your user won't be able to type
even if they are valid in the end.

Cheers,

Yes, there has to be some additional logic to allow a trailing 'e' with
the assumption that the user will next enter a valid character
afterward. That's what makes it a little complicated (and fun) to
figure out. The goal is, as the user is entering data, to not allow
them to enter anything that is obviously not going to work as a floating
point representation.

--Alex
 
J

Jean-Julien Fleck

Hello Alex,
Yes, there has to be some additional logic to allow a trailing 'e' with
the assumption that the user will next enter a valid character
afterward. =A0That's what makes it a little complicated (and fun) to
figure out. =A0The goal is, as the user is entering data, to not allow
them to enter anything that is obviously not going to work as a floating
point representation.

Sure, fun it is :eek:)
But that's exactly the kind of software that could drive me mad (as a
user). You assume that your user is making a typo but what if he is
not ? What if he truly believe what he is writing is a perfectly
correct float ? He will retry again, and again and again untill he
decide that the whole software is just a fraud :eek:) So IMHO, it is more
efficient to let your user know what kind of error he is (possibly
repetitively) doing and propose an alternative rather than erase what
he believe could be right.

Cheers,

--=20
JJ Fleck
PCSI1 Lyc=E9e Kl=E9ber
 
J

Josh Cheek

[Note: parts of this message were removed to make it a legal post.]

Josh,

Your code works great! I knew there had to be a more elegant way to do
this rather than my brute force method.

The only test it didn't seem to work on was eliminating extra + or -
signs, such as '+45-2+8' => '+4528', but now that I see what you are
doing I can probably figure out how to do that. I definitely need to
learn more about regular expressions!

Thanks for your time and effort.

--Alex
It wasn't done, because I wanted clarification on the tests first.

Anyway, this one passes all tests.

def clean_string(str)
str =~ /\A([-+]?)([eE]?)([^eE.]*\.?)([^eE]*)((?:[eE][+-]?)?)([^Z]*)\Z/
posneg , misplaced_e , before_dec , after_dec , e , exponent = $1 , $2 ,
$3 , $4 , $5 , $6
posneg + before_dec.gsub(/[^0-9.]/,'') + after_dec.gsub(/[^0-9]/,'') + e +
exponent.gsub(/[^0-9]/,'')
end

require 'test/unit'
class TestCleanString < Test::Unit::TestCase
def test_delete_chars
assert_equal '-24.5e45' , clean_string('-24.5fge4x5')
end
def test_delete_extra_decimal
assert_equal '2.45' , clean_string('2.4.5')
assert_equal '2.45' , clean_string('2..45')
assert_equal '2.45' , clean_string('2...45')
end
def test_delete_extra_decimal_in_exponent
assert_equal '245e76' , clean_string('245e7.6')
end
def test_delete_extra_or_misplaced_pos_and_neg_signs
assert_equal '+4568e+45' , clean_string('+45-68+e+45-')
end
def test_delete_extra_or_misplaced_e_or_E
assert_equal '4.67e67' , clean_string('4.67e6e-7')
assert_equal '+4.67e-7' , clean_string('+e4.67e-7')
end
end
 
A

Alex DeCaria

Josh said:
On Wed, Apr 14, 2010 at 8:33 AM, Alex DeCaria
<[email protected]

It wasn't done, because I wanted clarification on the tests first.

Anyway, this one passes all tests.
Thanks again, Josh! May I use your code in my (non-commercial,
educational-use-only) app?

--Alex
 
A

Alex DeCaria

Jean-Julien Fleck said:
Sure, fun it is :eek:)
But that's exactly the kind of software that could drive me mad (as a
user). You assume that your user is making a typo but what if he is
not ? What if he truly believe what he is writing is a perfectly
correct float ? He will retry again, and again and again untill he
decide that the whole software is just a fraud :eek:) So IMHO, it is more
efficient to let your user know what kind of error he is (possibly
repetitively) doing and propose an alternative rather than erase what
he believe could be right.

Cheers,

I can't argue with the point you are making. I will continue to use the
automatic string grooming, but will probably include a message to the
user letting them know why what they are typing isn't showing up in the
textbox.

--Alex
 
J

Josh Cheek

[Note: parts of this message were removed to make it a legal post.]

On Wed, Apr 14, 2010 at 8:33 AM, Alex DeCaria
<[email protected]

It wasn't done, because I wanted clarification on the tests first.

Anyway, this one passes all tests.
Thanks again, Josh! May I use your code in my (non-commercial,
educational-use-only) app?

--Alex[/QUOTE]
Sure, go ahead and throw the wtfpl on there, if you feel more comfortable
with that. http://sam.zoy.org/wtfpl/

And I guarantee that it does nothing other than pass the set of tests it was
posted with, on my machine, with the settings that were used at the time of
testing. So no warranty of any kind.

Have fun :p
 
J

Josh Cheek

[Note: parts of this message were removed to make it a legal post.]

Josh,

Your code works great! I knew there had to be a more elegant way to do
this rather than my brute force method.

The only test it didn't seem to work on was eliminating extra + or -
signs, such as '+45-2+8' => '+4528', but now that I see what you are
doing I can probably figure out how to do that. I definitely need to
learn more about regular expressions!

Thanks for your time and effort.

--Alex
It wasn't done, because I wanted clarification on the tests first.

Anyway, this one passes all tests.

def clean_string(str)
str =~ /\A([-+]?)([eE]?)([^eE.]*\.?)([^eE]*)((?:[eE][+-]?)?)([^Z]*)\Z/
posneg , misplaced_e , before_dec , after_dec , e , exponent = $1 , $2 ,
$3 , $4 , $5 , $6
posneg + before_dec.gsub(/[^0-9.]/,'') + after_dec.gsub(/[^0-9]/,'') + e
+ exponent.gsub(/[^0-9]/,'')

end

require 'test/unit'
class TestCleanString < Test::Unit::TestCase
def test_delete_chars
assert_equal '-24.5e45' , clean_string('-24.5fge4x5')
end
def test_delete_extra_decimal
assert_equal '2.45' , clean_string('2.4.5')
assert_equal '2.45' , clean_string('2..45')
assert_equal '2.45' , clean_string('2...45')
end
def test_delete_extra_decimal_in_exponent
assert_equal '245e76' , clean_string('245e7.6')
end
def test_delete_extra_or_misplaced_pos_and_neg_signs
assert_equal '+4568e+45' , clean_string('+45-68+e+45-')
end
def test_delete_extra_or_misplaced_e_or_E
assert_equal '4.67e67' , clean_string('4.67e6e-7')
assert_equal '+4.67e-7' , clean_string('+e4.67e-7')
end
end
[/QUOTE]

Found a bug, the [^Z] in the last caputre group should be a [^\Z] (or you
prefer, you could just swap it out with .* I don't know if it makes a
difference, I just usually try to match based on the next thing I want to
hit, in this case it's the end of the string).



Here is another version, it does the same thing, but I think it's prettier.
I swapped out the plusses for << because they're much quicker when you don't
need a new object.

def digits_only(str)
str.gsub /[^0-9]/ , ''
end

def clean_string(str)
str =~ /\A([-+]?)([eE]?)([^eE.]*)(\.?)([^eE]*)((?:[eE][+-]?)?)([^\Z]*)\Z/
$1 << digits_only($3) << $4 << digits_only($5) << $6 << digits_only($7)
end




And here is the same thing, but it assigns them to variables first. It's
uglier, but if you have to sort through it later, it can be nice to know
what the regex is supposed to be capturing.

def digits_only(str)
str.gsub /[^0-9]/ , ''
end

def clean_string(str)
str =~ /\A([-+]?)([eE]?)([^eE.]*)(\.?)([^eE]*)((?:[eE][+-]?)?)([^\Z]*)\Z/
posneg , misplaced_e , before_dec , dec , after_dec , e ,
exponent =
$1 , $2 , digits_only($3) , $4 , digits_only($5) , $6 ,
digits_only($7)
$1 << digits_only($3) << $4 << digits_only($5) << $6 << digits_only($7)
end
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,225
Members
46,815
Latest member
treekmostly22

Latest Threads

Top