Best way to extract numeric values from a report?

B

bobmct

I'm reasonably familiar with perl as I have been programming with it
for a number of years.

I am trying to extract information from a text based report file and
create a file of data records for use as input to some other perl
programs.

The report file in question has about 13 columns of financial amounts
such as -1,234,567.89 and I wish to extract each column and output to
a record in a signed, efficient format for later use by these other
perl programs.

I am currently using unpack and pack but I am not having luck with the
floating point values.

What are some suggestions as to how to extract and export these
values? Any good suggestions/recommendations/examples greatly
appreciated.

signed: just another perl monger
 
J

J. Gleixner

bobmct said:
I'm reasonably familiar with perl as I have been programming with it
for a number of years.

I am trying to extract information from a text based report file and
create a file of data records for use as input to some other perl
programs.

The report file in question has about 13 columns of financial amounts
such as -1,234,567.89 and I wish to extract each column and output to
a record in a signed, efficient format for later use by these other
perl programs.

What separates the columns?

Use whatever that is, with split, and you should be on your way.
I am currently using unpack and pack but I am not having luck with the
floating point values.

Show us what you've tried.
What are some suggestions as to how to extract and export these
values? Any good suggestions/recommendations/examples greatly
appreciated.


Treating it as a string, using a regular expression, would
be much easier than pack, IMHO.
 
J

Jürgen Exner

bobmct said:
I am trying to extract information from a text based report file and
create a file of data records for use as input to some other perl
programs.

The report file in question has about 13 columns of financial amounts
such as -1,234,567.89 and I wish to extract each column and output to
a record in a signed, efficient format for later use by these other
perl programs.

How are the columns identified/separated? By fixed position within the
line (i.e. characters 12-18 are always column 4) or by a known
separator, e.g. a comma, a semicolon, or a space character?

If the former then unpack is the tool of choice, if the latter then
split() will work nicely.
And if it's a CSV-format then one of the CSV-modules will work best,
because they will deal with all the oddities like quoted values or
escaped characters automatically.

jue
 
C

ccc31807

I am trying to extract information from a text based report file and
create a file of data records for use as input to some other perl
programs.

First, I assume that you are dealing with some kind of delimited
format. The algorithm is easy -- here is some pseudo-pseudo code:

open INFILE, '<', 'text_based_report_file.txt';
while(<INFILE>)
{
chomp;
my ($c1, $c2, etc. all the way to $c13) = split /delimiter/, $_;
#you can now access your values like $c1, $c2, ... $c13)
}
close INFILE;
The report file in question has about 13 columns of financial amounts
such as -1,234,567.89 and I wish to extract each column and output to
a record in a signed, efficient format for later use by these other
perl programs.

It's not clear to me exactly how you want your output, but assuming
that it's a signed float and all you need to do is delete the commas,
this will do it:
$c1 =~ s/,//g;
What are some suggestions as to how to extract and export these
values?  Any good suggestions/recommendations/examples greatly
appreciated.

How do you want to export the values? Seems to me that if you input
them in a delimited file your work is done, that you don't need to
export the values. If you need to 'export' the values into some data
structure in memory, use an array of array references. You can then
access each individual value like this:
$arr_ref->[0][0] #gets the first column in the first row
$arr_ref->[1][12] # gets the last column in the second row

CC
 
P

Peter J. Holzer

It's not clear to me exactly how you want your output, but assuming
that it's a signed float and all you need to do is delete the commas,
this will do it:
$c1 =~ s/,//g;

For monetary values, you often want to store the amount in the smallest
denomination. So -1,234,567.89 should be stored as -123456789 cents, not
as -1234567.89 Dollars. The reason for this is that decimal fractions
cannot be represented exactly in binary, so the amount is rounded to the
nearest multiple of a power of two (which happens to be
-1234567.88999999989755451679229736328125 in this case).

If the amounts are always given with two digits after the decimal point,
you can just strip the decimal point, too:

$c1 =~ s/[,.]//g;
or
$c1 =~ tr/.,//d;

If the number of digits after the decimal point varies, you need to
normalize them first. Be careful!

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,705
Latest member
Stefkari24

Latest Threads

Top