convert non-delimited to delimited

R

RyanL

I'm a newbie! I have a non-delimited data file that I'd like to
convert to delimited.

Example...
Line in non-delimited file:
0139725635999992000010100534+42050-102800FM-15+1198KAIA

Should be:
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

What is the best way to go about this? I've looked all over for
examples, help, suggestions, but have not found much. CSV module
doesn't seem to do exactly what I want. Maybe I'm just missing
something or not using the correct terminology in my searches. Any
assistance is greatly appreaciated! Using Python 2.4
 
M

Mark Elston

* RyanL wrote (on 8/27/2007 10:59 AM):
I'm a newbie! I have a non-delimited data file that I'd like to
convert to delimited.

Example...
Line in non-delimited file:
0139725635999992000010100534+42050-102800FM-15+1198KAIA

Should be:
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

What is the best way to go about this? I've looked all over for
examples, help, suggestions, but have not found much. CSV module
doesn't seem to do exactly what I want. Maybe I'm just missing
something or not using the correct terminology in my searches. Any
assistance is greatly appreaciated! Using Python 2.4

Since you have to know, a priori, how to break the input string I
assume that these fields are of fixed length. You can use the following
to do what you want:
(a[0:4],a[4:10],a[10:15],a[15:19],a[19:21],a[21:23],a[23:25],
a[25:27],a[27],a[28:34],a[34:41],a[41:46],a[46:51],a[51:])

which results in the following output:

0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

Mark
 
M

mensanator

* RyanL wrote (on 8/27/2007 10:59 AM):




I'm a newbie! I have a non-delimited data file that I'd like to
convert to delimited.
Example...
Line in non-delimited file:
0139725635999992000010100534+42050-102800FM-15+1198KAIA
Should be:
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA
What is the best way to go about this? I've looked all over for
examples, help, suggestions, but have not found much. CSV module
doesn't seem to do exactly what I want. Maybe I'm just missing
something or not using the correct terminology in my searches. Any
assistance is greatly appreaciated! Using Python 2.4

Since you have to know, a priori, how to break the input string I
assume that these fields are of fixed length. You can use the following
to do what you want:
(a[0:4],a[4:10],a[10:15],a[15:19],a[19:21],a[21:23],a[23:25],
a[25:27],a[27],a[28:34],a[34:41],a[41:46],a[46:51],a[51:])

which results in the following output:

0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

Mark

Or try this:

import struct
test = '0139725635999992000010100534+42050-102800FM-15+1198KAIA'
template = '4s6s5s4s2s2s2s2s1s6s7s5s5s4s'
the_line = struct.unpack(template,test)
print the_line
print ','.join(the_line)

## ('0139', '725635', '99999', '2000', '01', '01', '00', '53', '4',
'+42050', '-102800', 'FM-15', '+1198', 'KAIA')
##
##
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA
 
M

Matimus

I'm a newbie! I have a non-delimited data file that I'd like to
convert to delimited.

Example...
Line in non-delimited file:
0139725635999992000010100534+42050-102800FM-15+1198KAIA

Should be:
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

What is the best way to go about this? I've looked all over for
examples, help, suggestions, but have not found much. CSV module
doesn't seem to do exactly what I want. Maybe I'm just missing
something or not using the correct terminology in my searches. Any
assistance is greatly appreaciated! Using Python 2.4

I don't think you are going to find anything that will just do this
for you. You are going to have read the file, figure out where to
split the string, and reprint it delimited with commas. As for
suggesting code... I can't tell how you actually want to delimit the
stuff from the above example? Are the fields always a fixed number of
characters? If they aren't then is there some other method for
determining how many characters to group into a field? From the looks
of it you could split that string any way you want and get something
that looks right, but isn't.

Matt
 
M

Michael Bentley

I'm a newbie! I have a non-delimited data file that I'd like to
convert to delimited.

Example...
Line in non-delimited file:
0139725635999992000010100534+42050-102800FM-15+1198KAIA

Should be:
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

What is the best way to go about this? I've looked all over for
examples, help, suggestions, but have not found much. CSV module
doesn't seem to do exactly what I want. Maybe I'm just missing
something or not using the correct terminology in my searches. Any
assistance is greatly appreaciated! Using Python 2.4

Is each data element a fixed size?
 
N

Neil Cerutti

I'm a newbie! I have a non-delimited data file that I'd like to
convert to delimited.

Example...
Line in non-delimited file:
0139725635999992000010100534+42050-102800FM-15+1198KAIA

Should be:
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

It looks like a fixed format data file, also called flat-record
files, in which fields are of fixed lengths, and records are
separated by newlines.
What is the best way to go about this?

Check out chapter 2 of _Text Processing in Python_ for one
solution.

http://gnosis.cx/TPiP/chap2.txt
 
P

Paul McGuire

I'm a newbie! I have a non-delimited data file that I'd like to
convert to delimited.

Example...
Line in non-delimited file:
0139725635999992000010100534+42050-102800FM-15+1198KAIA

Should be:
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA

What is the best way to go about this? I've looked all over for
examples, help, suggestions, but have not found much. CSV module
doesn't seem to do exactly what I want. Maybe I'm just missing
something or not using the correct terminology in my searches. Any
assistance is greatly appreaciated! Using Python 2.4

I'm guessing that these lines *aren't* fixed-length, especially those
signed integer fields. I used the patented Paul McGuire CrystalBall
module to come up with this pyparsing rendition. (OP may adjust to
suit.)

-- Paul

data = "0139725635999992000010100534+42050-102800FM-15+1198KAIA"
"""to be parsed as:
0139,725635,99999,2000,01,01,00,53,4,+42050,-102800,FM-15,+1198,KAIA"""

from pyparsing import *
import time
def convertTimeStamp(t):
t["date"] = map(int,t.date)
t["time"] = map(int,t.time)
return time.strftime("%Y-%m-%dT%H:%M",
tuple(t.date)+tuple(t.time)+(0,0,0,0))

yearMonthDay = Word(nums,exact=4) + Word(nums,exact=2) +
Word(nums,exact=2)
hourMinuteSecond = Word(nums,exact=2) + Word(nums,exact=2)
timestamp = ( yearMonthDay("date") + hourMinuteSecond("time") )
timestamp.setParseAction(convertTimeStamp)
signedInteger = Word("+-",nums)

fieldA = Word(nums,exact=4)("A")
fieldB = Word(nums,exact=6)("B")
fieldC = Word(nums,exact=5)("C")
fieldD = timestamp("timestamp")
fieldE = Word(nums)("E")
fieldF = signedInteger("latitude").setParseAction(lambda t : int(t[0])/
1000.0)
fieldG = signedInteger("longitude").setParseAction(lambda t :
int(t[0])/1000.0)
fieldH = Combine(Word(alphas,exact=2) + "-" + Word(nums,exact=2))("H")
fieldI = signedInteger("I")
fieldJ = Word(alphas)("J")
dataFields = fieldA + fieldB + fieldC + fieldD + fieldE + \
fieldF + fieldG + fieldH + fieldI + fieldJ

res = dataFields.parseString(data)
print res.dump()

prints:

['0139', '725635', '99999', '2000-01-01T00:53', '4',
42.049999999999997, -102.8, 'FM-15', '+1198', 'KAIA']
- A: 0139
- B: 725635
- C: 99999
- E: 4
- H: FM-15
- I: +1198
- J: KAIA
- latitude: 42.05
- longitude: -102.8
- timestamp: 2000-01-01T00:53
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top