Regex question(how easy/hard to do it in ruby)

  • Thread starter Sarah Tanembaum
  • Start date
S

Sarah Tanembaum

Pointers, please...

I have this text in a comma delimited file with the following
characteristic:

ccc-123456, <multiline data>,

Field number:

1a - its always begin with 1 to 3 characters followed by
a dash, e.g JKL-, A-, NM-, PQ-

1b - after the dash, it follows by numbers starting from
1 to 99999

2 - a multiline data with either or both newline chars(\n)
and/or cariage-return char(\r), or both(\r\n). This field
might include special characters such as a
single(') or double(") quote, a space, characters
with ascii number > 127 - accented character,
umlaud, etc ...

3 - this field contain at least 2 line to at most 5 line of
data where each line might be
Begin with 2-3 chars, e.g GH@OPRJGPF1234
followed by an "@", 1-7chars, and followed by
1-4 numbers

My question is :

1a. how to parse the first field(field 1a) so I can manipulate/rename it to
a new label dending on what label they have currently

1b. in field 1b, instead of just 1 number, I'd like to pad
them with leading zero so, 1 -> 000001,
1494 -> 001494, 560987->560987(no change).

2. capture 2nd field and escape the special characters with ascii number

3. capture 3rd field and parse them as well just as field 1.

THanks
 
A

Ara.T.Howard

Pointers, please...

I have this text in a comma delimited file with the following
characteristic:

ccc-123456, <multiline data>,

Field number:

1a - its always begin with 1 to 3 characters followed by
a dash, e.g JKL-, A-, NM-, PQ-

1b - after the dash, it follows by numbers starting from
1 to 99999

2 - a multiline data with either or both newline chars(\n)
and/or cariage-return char(\r), or both(\r\n). This field
might include special characters such as a
single(') or double(") quote, a space, characters
with ascii number > 127 - accented character,
umlaud, etc ...

3 - this field contain at least 2 line to at most 5 line of
data where each line might be
Begin with 2-3 chars, e.g GH@OPRJGPF1234
followed by an "@", 1-7chars, and followed by
1-4 numbers

My question is :

1a. how to parse the first field(field 1a) so I can manipulate/rename it to
a new label dending on what label they have currently

what exactly do you mean by this? if you want to parse the fields themselves
out use the 'csv' module included with ruby...
1b. in field 1b, instead of just 1 number, I'd like to pad
them with leading zero so, 1 -> 000001,
1494 -> 001494, 560987->560987(no change).

~ > ruby -e 'p(sprintf("%06.6d", 42))'
"000042"

~ > man 3 printf
2. capture 2nd field and escape the special characters with ascii number

esc = '\\'[0]
munged = ''
field_2.each_byte{|c| munged << esc if c > 127; munged << c}
field_2 = munged

you could also use a regex to do this...

special = %r/([#{ 127.chr }-#{ 255.chr })]/o
field_2.gsub!(special){|match| "\\#{ match }"}
3. capture 3rd field and parse them as well just as field 1.

THanks


can you post some sample data? we could probably say more then...


-a
--
===============================================================================
| EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
| PHONE :: 303.497.6469
| ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
| URL :: http://www.ngdc.noaa.gov/stp/
| TRY :: for l in ruby perl;do $l -e "print \"\x3a\x2d\x29\x0a\"";done
===============================================================================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,233
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top