T
Tuxedo
I have a plain text file with each line in the format:
Start of line followed immediately by a string of character(s), a
whitespace, another string, a newline.
-------- file.txt -------
SOMESTRING XXX
SOMESTRING ZZZ
SOMEOTHERSTRING YYYZZ23
DIFFERENTSTRING HELLO
-----------
I would like to output each line that contains a string of a first
character sequence but not repeat any line(s) with the same string as a
first character sequence that appear further down the file. The output of
running a perl procedure against the above file would then be:
SOMESTRING XXX
SOMEOTHERSTRING YYYZZ23
DIFFERENTSTRING HELLO
In other words, no repetition should occur of any first word boundary on
each line in case the sequence happens to reappear on other line(s) as a
first character boundary before each line's first whitespace.
Alternatively, if given a parameter such as '^SOMESTRING' the output
against the file would be narrowed down to:
SOMESTRING XXX
The second character string boundary (XXX) after the whitespace is
arbitrary and should not affect the result but can be included in the
output even if it happens to match SOMESTRING. So the output becomes the
first occurence of ^SOMESTRING plus the remaining characters on the same
line up until newline.
Or if for example '^SOME' is passed as a parameter, the result would be:
SOMESTRING XXX
SOMEOTHERSTRING YYYZZ23
In which ways can this be done efficiently in Perl?
Many thanks for any ideas.
Tuxedo
Start of line followed immediately by a string of character(s), a
whitespace, another string, a newline.
-------- file.txt -------
SOMESTRING XXX
SOMESTRING ZZZ
SOMEOTHERSTRING YYYZZ23
DIFFERENTSTRING HELLO
-----------
I would like to output each line that contains a string of a first
character sequence but not repeat any line(s) with the same string as a
first character sequence that appear further down the file. The output of
running a perl procedure against the above file would then be:
SOMESTRING XXX
SOMEOTHERSTRING YYYZZ23
DIFFERENTSTRING HELLO
In other words, no repetition should occur of any first word boundary on
each line in case the sequence happens to reappear on other line(s) as a
first character boundary before each line's first whitespace.
Alternatively, if given a parameter such as '^SOMESTRING' the output
against the file would be narrowed down to:
SOMESTRING XXX
The second character string boundary (XXX) after the whitespace is
arbitrary and should not affect the result but can be included in the
output even if it happens to match SOMESTRING. So the output becomes the
first occurence of ^SOMESTRING plus the remaining characters on the same
line up until newline.
Or if for example '^SOME' is passed as a parameter, the result would be:
SOMESTRING XXX
SOMEOTHERSTRING YYYZZ23
In which ways can this be done efficiently in Perl?
Many thanks for any ideas.
Tuxedo