T
Ted
The context here is I need to create a script that validates data in
fields in plain text files where fields may be surrounded by double
quotes and may be separated by commas or tabs. In fact, one supplier
of a data feed we use has been known to switch between comma separated
values and tab delimited values, often without warning.
In one of the FAQs, I found the following regular expressions, but I
have some questions.
if (/\D/) { print "has nondigits\n" }
if (/^\d+$/) { print "is a whole number\n" }
if (/^-?\d+$/) { print "is an integer\n" }
if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
{ print "a C float\n" }
The first question is "What string is the regular expression applied
to?"
I can recognize '\d+' as representing an arbitrary number of digits,
but what are '^' and '$' for ?
I don't care about distinctions between float decimal and real numbers.
However, I may have a need to distinguish between float and double
precision numbers. If that need materializes, how might I modify one
of the regular expressions above to allow me to determine if the value
in a given variable is necessarily a double (assuming that any single
precision number can be treated as if it is a double precision number:
for the purpose of converting strings from a text file into an
appropriate number).
Is that right? What would I use to test, using a regular expression,
whether a given string contains only alphanumeric characters, and that
the total number of characters is less than or equal to 8? What about
testing for a string containing precisely 4 letters and 3 digits?
I will also need to be able to check to see whether or not a given
string represents a valid date or timestamp.
To put this back into my context, I'd be reading in the text file,
splitting each record into its fields. I'd also read in, from a
different file, information regarding the number of fields and the type
of each field. I'd then verify that there is the correct number of
fields and that each field has a valid string that contains the right
kind of data for that field. I still haven't decided how to handle the
fact that one of our suppliers sometimes switches between commas and
tabs, sometimes without warning. Suggestions are welcome, though.
Sorry if this seems basic, but it has been eons since I last looked at
regular expressions, and I have not found sufficient detail in the
documentation I have found.
Thanks,
Ted
fields in plain text files where fields may be surrounded by double
quotes and may be separated by commas or tabs. In fact, one supplier
of a data feed we use has been known to switch between comma separated
values and tab delimited values, often without warning.
In one of the FAQs, I found the following regular expressions, but I
have some questions.
if (/\D/) { print "has nondigits\n" }
if (/^\d+$/) { print "is a whole number\n" }
if (/^-?\d+$/) { print "is an integer\n" }
if (/^[+-]?\d+$/) { print "is a +/- integer\n" }
if (/^-?\d+\.?\d*$/) { print "is a real number\n" }
if (/^-?(?:\d+(?:\.\d*)?|\.\d+)$/) { print "is a decimal number\n" }
if (/^([+-]?)(?=\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?$/)
{ print "a C float\n" }
The first question is "What string is the regular expression applied
to?"
I can recognize '\d+' as representing an arbitrary number of digits,
but what are '^' and '$' for ?
I don't care about distinctions between float decimal and real numbers.
However, I may have a need to distinguish between float and double
precision numbers. If that need materializes, how might I modify one
of the regular expressions above to allow me to determine if the value
in a given variable is necessarily a double (assuming that any single
precision number can be treated as if it is a double precision number:
for the purpose of converting strings from a text file into an
appropriate number).
variable contains a string consisting only of alpha numeric characters.From what I have read, I expect I can use '\w' to test whether or not a
Is that right? What would I use to test, using a regular expression,
whether a given string contains only alphanumeric characters, and that
the total number of characters is less than or equal to 8? What about
testing for a string containing precisely 4 letters and 3 digits?
I will also need to be able to check to see whether or not a given
string represents a valid date or timestamp.
To put this back into my context, I'd be reading in the text file,
splitting each record into its fields. I'd also read in, from a
different file, information regarding the number of fields and the type
of each field. I'd then verify that there is the correct number of
fields and that each field has a valid string that contains the right
kind of data for that field. I still haven't decided how to handle the
fact that one of our suppliers sometimes switches between commas and
tabs, sometimes without warning. Suggestions are welcome, though.
Sorry if this seems basic, but it has been eons since I last looked at
regular expressions, and I have not found sufficient detail in the
documentation I have found.
Thanks,
Ted