complex regular expression question

M

markpark

I know perl is good for what i need to do but i find regular
expressions very difficult and
and this is what i think is needed.

IBM2006-09-29 09:30:03.00000N7800081.90000000N003398C
IBM2006-09-29 09:30:04.00006N70081.90000000N003412C

I have a lines in a file such as above and all i want to pull is the
fields

date and time ( 1 field )

2006-09-29 09:30:03

and the price

81.9

and write them out top a file with a comma in between the two fields.

i don't even want the letters IBM in the output.

an additional complication is is that there are cases where the price
could be in the hundreds in
which case, it has to be taken out to 3 digits before the decimal
rather than just 2.

another additional complication is that the stock could be MSFT in
which
case therew are 4 characters first instead of 3.

so , essentially, the price is always the four numbers before the
second dot in the file
but the 2 digits after the second dot are part of the price also.

if that helps ?

this problem seems like a mess to me.

i have read about regular expressions and i've tried and i think i
might be able
to figure the above out after a couple of weeks. but, right now i
don['t have a couple of weeks to spare. i promise that, if someone is
kind enough to give the answer,
i will grind through it no matter how long it takes me to make sure i
understand every piece.
thanks a lot.
 
M

Martijn Lievaart

I know perl is good for what i need to do but i find regular
expressions very difficult and
and this is what i think is needed.

IBM2006-09-29 09:30:03.00000N7800081.90000000N003398C
IBM2006-09-29 09:30:04.00006N70081.90000000N003412C

I have a lines in a file such as above and all i want to pull is the
fields

date and time ( 1 field )

2006-09-29 09:30:03

and the price

81.9

and write them out top a file with a comma in between the two fields.

i don't even want the letters IBM in the output.

an additional complication is is that there are cases where the price
could be in the hundreds in
which case, it has to be taken out to 3 digits before the decimal
rather than just 2.

another additional complication is that the stock could be MSFT in
which
case therew are 4 characters first instead of 3.

so , essentially, the price is always the four numbers before the
second dot in the file
but the 2 digits after the second dot are part of the price also.

Easy.

if (/(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\..*(\d{4}\.\d)/) {
$date = $1;
$price = $2;
....
}

M4
 
M

markpark

thank you to everyone who replied. for what i am doign, i don't
have to worry about berkshre hathaway ( i know
which stocks i
am pulling data for ). i really appreciate everyone's
responses. i am keying on the one that breaks
it up into
bucks and cents because that seems like
the one i have the greatest chance of understanding.
thanks again.



Thus spoke Mirco Wahab (on 2006-10-26 19:18):
my ($d,$t) = $fields[0] =~ /\D+([-\d]+)\s+(.+?)$/;
my $buck = ($fields[1] =~ /(\d{4})$/)[0] + 0;
my ($cent) = $fields[2] =~ /^(\d{2})/;According to Stevens remark, one would better
write the extracion like:

...
while(<DATA>) {
if( (my @fields = split /\./) > 2 ) {
my ($d,$t) = $fields[0] =~ /(\d{4}-[-\d]+)\s+(.+?)$/;
my $buck = ($fields[1] =~ /(\d{4})$/)[0] + 0;
my ($cent) = $fields[2] =~ /^(\d{2})/;
printf "$d $t, %7.2f \$\n", $buck+$cent/100;
}
}
...

which would handle things like 4-digit stock prices
and company names like T3332006-09-29-...

Regards

Mirco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top