data type for a string value

J

Jack Su

when humans look at a string, they will know whether it is a date, a
float, a percentage, or a currency, or simply a string.

for example, when you see, 3,232, you will know it is meant to be a
number.

I wonder if there is easy way in ruby to figure out the data type for a
string. I guess I could let the string go through a list of the regular
expression -- but if I have a lot of strings to process, it is likely to
be costly.

any ideas on a good solution.
 
P

Phillip Gawlowski

when humans look at a string, they will know whether it is a date, a
float, a percentage, or a currency, or simply a string.

for example, when you see, 3,232, you will know it is meant to be a
number.

While true, what sort of number? Is it "three point two three two", or
"three thousand two hundred thirty 3"?

(Germany uses "." to format large numbers into three digit chunks,
while the UK and US use the ",". Floats use the "," in Germany, while
the US and UK use the ".".)

The type of data we see gets interpreted depending on context. In the
us a 2x4 has different dimensions than in almost every other nation in
the world.

So, you are already stumbling into localization issues, which can be
compounded by standards, the use of SI units vs Imperial, vs
colloquial units, and so on, and so forth. ;)

Parsing your data, number, e&. data with the help of a localization
library would get you quite a ways, already.
I wonder if there is easy way in ruby to figure out the data type for a
string. I guess I could let the string go through a list of the regular
expression -- but if I have a lot of strings to process, it is likely to
be costly.

It also breaks if you get data in formats that you didn't anticipate.

And I don't have a good solution for this, apart from forcing
particular data entry, but then you already know which field you get
contains which datatype.

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
 
X

Xavier Noria

This problem is too generic. Are we talking only numbers vs strings or
are there more data types involved? If the former, do you need to tell
integers from floats? Are the conventions on number formatting known
beforehand?
 
J

Jack Su

Xavier Noria wrote in post #955025:
This problem is too generic. Are we talking only numbers vs strings or
are there more data types involved?

as I mentioned, all data types, including dates, data times, percentage,
currency, etc. for example, there can be "$ 1.02". basically values you
can see in a spreadsheet.
If the former, do you need to tell
integers from floats?
No.

Are the conventions on number formatting known
beforehand?

No, and Yes. there are a set of them, such as '1,234' and '2010-09-09'
and '09/09/2010'. That is the point. a human can see it right away, but
a computer will need to need to figure this out.
 
J

Joel VanderWerf

No, and Yes. there are a set of them, such as '1,234' and '2010-09-09'
and '09/09/2010'. That is the point. a human can see it right away, but
a computer will need to need to figure this out.

Regexes may not be as costly as you assume for simple formats like
these. Or did you run benchmarks?
 
R

Robert Klemme

Xavier Noria wrote in post #955025:

No, and Yes. there are a set of them, such as '1,234' and '2010-09-09'
and '09/09/2010'. That is the point. a human can see it right away, but
a computer will need to need to figure this out.

The usual solution to this is to expect a number of types (possibly only
one), parse them and deal with them as needed. Even if there would be a
magical mechanism which would determine arbitrary types you would still
need code to process each type properly.

What kind of problem are you really trying to solve?

Kind regards

robert
 
J

Jack Su

The usual solution to this is to expect a number of types (possibly only
one), parse them and deal with them as needed. Even if there would be a
magical mechanism which would determine arbitrary types you would still
need code to process each type properly.

it would be nice to have a method String#parse.

'12/31/1999'.parse() => a date object
'$1.23'.parse() => a float object
'32.1%'.parse() => a float object

What kind of problem are you really trying to solve?

for parsing data from spreadsheets (or csvs for that matter) to sort
data based on their types, e.g. 21 > 3 not '21' < '3'.
 
R

Robert Klemme

it would be nice to have a method String#parse.

'12/31/1999'.parse() =3D> a date object
'$1.23'.parse() =3D> a float object
'32.1%'.parse() =3D> a float object

No, that would be a catastrophe! Class String is responsible for
string handling in general. Everybody has different requirements for
parsing strings (if at all) so there would be no way to implement this
in the standard library in any reasonable way. Just think about types
which have different representations (e.g. Time and DateTime, Float
and BigDecimal) let alone the question how many types are detected.
As Xavier said already: the problem is too generic.
for parsing data from spreadsheets (or csvs for that matter) to sort
data based on their types, e.g. 21 > 3 not '21' < '3'.

If you directly pull it from a spreadsheet there seems to be a minimal
chance that you can obtain type information from it. If you pull it
from CSV you need to define which types you want to recognize and code
accordingly.

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top