S
Sarah Tanembaum
Pointers, please...
I have this text in a comma delimited file with the following
characteristic:
ccc-123456, <multiline data>,
Field number:
1a - its always begin with 1 to 3 characters followed by
a dash, e.g JKL-, A-, NM-, PQ-
1b - after the dash, it follows by numbers starting from
1 to 99999
2 - a multiline data with either or both newline chars(\n)
and/or cariage-return char(\r), or both(\r\n). This field
might include special characters such as a
single(') or double(") quote, a space, characters
with ascii number > 127 - accented character,
umlaud, etc ...
3 - this field contain at least 2 line to at most 5 line of
data where each line might be
Begin with 2-3 chars, e.g GH@OPRJGPF1234
followed by an "@", 1-7chars, and followed by
1-4 numbers
My question is :
1a. how to parse the first field(field 1a) so I can manipulate/rename it to
a new label dending on what label they have currently
1b. in field 1b, instead of just 1 number, I'd like to pad
them with leading zero so, 1 -> 000001,
1494 -> 001494, 560987->560987(no change).
2. capture 2nd field and escape the special characters with ascii number
3. capture 3rd field and parse them as well just as field 1.
THanks
I have this text in a comma delimited file with the following
characteristic:
ccc-123456, <multiline data>,
Field number:
1a - its always begin with 1 to 3 characters followed by
a dash, e.g JKL-, A-, NM-, PQ-
1b - after the dash, it follows by numbers starting from
1 to 99999
2 - a multiline data with either or both newline chars(\n)
and/or cariage-return char(\r), or both(\r\n). This field
might include special characters such as a
single(') or double(") quote, a space, characters
with ascii number > 127 - accented character,
umlaud, etc ...
3 - this field contain at least 2 line to at most 5 line of
data where each line might be
Begin with 2-3 chars, e.g GH@OPRJGPF1234
followed by an "@", 1-7chars, and followed by
1-4 numbers
My question is :
1a. how to parse the first field(field 1a) so I can manipulate/rename it to
a new label dending on what label they have currently
1b. in field 1b, instead of just 1 number, I'd like to pad
them with leading zero so, 1 -> 000001,
1494 -> 001494, 560987->560987(no change).
2. capture 2nd field and escape the special characters with ascii number
3. capture 3rd field and parse them as well just as field 1.
THanks