T
thisismyidentity
Hi all,
I am writing a Perl script that should parse each line of a file (which
unfortunately I cant modify) and split the line. The main problem is
that every line (nearly 10000 lines) of the file is not uniform. So
there doesnt seem to be a pattern or a delimiter on which I can simply
split the line and could do it in a loop over all lines .
Here is an example:
========================
A B C D E
d32 ab ae99 WB 89
d33 cd e787 WC 78
d34 ef WD
d35 gh ancjd WT 100
d36 ij WP
..
..
========================
My main intention is to extract the values in Column A, B,C..into an
array but since in some lines some values under a column may not be
present..I am unable to have a single regex on which i can split all
lines in a loop. I tried the (obvious) \s+ regex for splitting but
since the columns that r empty have spaces, I get different results for
a particular column on different lines. I am especially interested in
two columns for which it is guaranteed that each line will be non-empty
(like A,B,D) but coz of other empty columns cant get them on a
particular index of the array which is returned by split().
Please give suggestions for following:
What regex could I use which wud solve my problem?
Is there any other way apart from split by which i cud achieve this
(assuming that there is no single regex to spit on) ?
Any possible way (as far as I can loop..since no of lines is huge)
Thanks.
Greg
I am writing a Perl script that should parse each line of a file (which
unfortunately I cant modify) and split the line. The main problem is
that every line (nearly 10000 lines) of the file is not uniform. So
there doesnt seem to be a pattern or a delimiter on which I can simply
split the line and could do it in a loop over all lines .
Here is an example:
========================
A B C D E
d32 ab ae99 WB 89
d33 cd e787 WC 78
d34 ef WD
d35 gh ancjd WT 100
d36 ij WP
..
..
========================
My main intention is to extract the values in Column A, B,C..into an
array but since in some lines some values under a column may not be
present..I am unable to have a single regex on which i can split all
lines in a loop. I tried the (obvious) \s+ regex for splitting but
since the columns that r empty have spaces, I get different results for
a particular column on different lines. I am especially interested in
two columns for which it is guaranteed that each line will be non-empty
(like A,B,D) but coz of other empty columns cant get them on a
particular index of the array which is returned by split().
Please give suggestions for following:
What regex could I use which wud solve my problem?
Is there any other way apart from split by which i cud achieve this
(assuming that there is no single regex to spit on) ?
Any possible way (as far as I can loop..since no of lines is huge)
Thanks.
Greg