file io (lagged values) newbie question

H

hiro

Hey there, I'm currently doing data preprocessing (generating lagged
values for a time series) and I'm having some difficulties trying to
write a file to disk. A friend of mine, wrote this quick example for
me:
-----------------------------------------------------------------------------------------------------------
array = ['1','2','3','4','5','6','7']
lineSize = 4
skip = 4
condition = 1
startIndex = 0

for letter in array:
line = []
startIndex = array.index(letter)

for indexNum in range(startIndex, startIndex + (skip - 1), 1):
#print "first loop"
#print
if indexNum > (len(array) - 1):
break
else:
line.append(array[indexNum])

#print "startIndex"
#print startIndex + skip

for indexNum in range(startIndex + skip, (startIndex +
lineSize) + 1, 1):
#print "second loop"
#print
if indexNum > (len(array) - 1):
break
else:
line.append(array[indexNum])
print line

----------------------------------------------------------------------------------------------------------------------
which outputs to the console:

['1', '2', '3', '5']
['2', '3', '4', '6']
['3', '4', '5', '7']
['4', '5', '6']
['5', '6', '7']
['6', '7']
['7']

This is exactly what I want and need, but when modified to read and
write files from/to disk, I run into problems.
example text file for reading:

C:\>more kaka.txt
1
2
3
4
5
6
7

tweaked code:
-------------------------------------------------------------------------------------------------------------------
f=open('c:/kaka.txt','r')
array=f.readlines()
f.close()
f=open('c:/kakaDump.txt','w')
lineSize = 4
skip = 4
condition = 1
startIndex = 0

for letter in array:
line = []
startIndex = array.index(letter)

for indexNum in range(startIndex, startIndex + (skip - 1), 1):
if indexNum > (len(array) - 1):
break
else:
line.append(array[indexNum])

for indexNum in range(startIndex + skip, (startIndex +
lineSize) + 1, 1):
if indexNum > (len(array) - 1):
break
else:
line.append(array[indexNum])

f.writelines(line)

-------------------------------------------------------------------------------------------------------------------------------
C:\>more kakaDump.txt
1
2
3
5
2
3
4
6
3
4
5
74
5
6
5
6
76
77

For those familiar with neural networks, the input file is a time
series and the output file needs to have 3 lagged variables for
training and a (two time steps ahead) variable for the target. Ie:

input file
1
2
3
4
5
6
7

output file
1 2 3 5
2 3 4 6
3 4 5 7
4 5 6
5 6 7
6 7
7

Thanks in advanced,


D.
 
S

Steven D'Aprano

Hey there, I'm currently doing data preprocessing (generating lagged
values for a time series) and I'm having some difficulties trying to
write a file to disk. A friend of mine, wrote this quick example for
me:

If that's a quick example (well over 100 lines), I'd hate to see your
idea of a long example.

Can you cut out all the irrelevant cruft and just show:

(1) the actual error you are getting
(2) the SMALLEST amount of code that demonstrates the error

Try to isolate if the problem is in *writing* the file or *generating*
the time series.

Hint: instead of one great big lump of code doing everything, write AT
LEAST two functions: one to read values from a file and generate a time
series, and a second to write it to a file.

That exercise will probably help you discover what the problem is, and
even if it doesn't, you'll have narrowed it down from one great big lump
of code to a small function.

To get you started, here's my idea for the second function:
(warning: untested)

def write_series(data, f):
"""Write a time series data to file f.

data should be a list of integers.
f should be an already opened file-like object.
"""
# Convert data into a string for writing.
s = str(data)
s = s[1:-1] # strip the leading and trailing [] delimiters
s = s.replace(',', '') # delete the commas
# Now write it to the file object
f.write(s)
f.write('\n')


And this is how you would use it:

f = file('myfile.txt', 'w')
# get some time series data somehow...
data = [1, 2, 3, 4, 5] # or something else
write_series(data, f)
# get some more data
data = [2, 3, 4, 5, 6]
write_series(data, f)
# and now we're done
f.close()


Hope this helps.
 
J

Jussi Salmela

hiro kirjoitti:
Hey there, I'm currently doing data preprocessing (generating lagged
values for a time series) and I'm having some difficulties trying to
write a file to disk. A friend of mine, wrote this quick example for
me:
tweaked code:
-------------------------------------------------------------------------------------------------------------------
f=open('c:/kaka.txt','r')
array=f.readlines()
f.close()
f=open('c:/kakaDump.txt','w')
lineSize = 4
skip = 4
condition = 1
startIndex = 0

for letter in array:
line = []
startIndex = array.index(letter)

for indexNum in range(startIndex, startIndex + (skip - 1), 1):
if indexNum > (len(array) - 1):
break
else:
line.append(array[indexNum])

for indexNum in range(startIndex + skip, (startIndex +
lineSize) + 1, 1):
if indexNum > (len(array) - 1):
break
else:
line.append(array[indexNum])

f.writelines(line)

-------------------------------------------------------------------------------------------------------------------------------
C:\>more kakaDump.txt
1
2
3
5
2
3
4
6
3
4
5
74
5
6
5
6
76
77

For those familiar with neural networks, the input file is a time
series and the output file needs to have 3 lagged variables for
training and a (two time steps ahead) variable for the target. Ie:

input file
1
2
3
4
5
6
7

output file
1 2 3 5
2 3 4 6
3 4 5 7
4 5 6
5 6 7
6 7
7

Thanks in advanced,


D.

I think your file kaka.txt lacks a CR-LF i.e. '\n' i.e. "Enter" after
the last line.

To get the desired output format you also need to drop the CR-LF:s after
each line to have the required values printed on the same line. Here's
my version of your code with a couple remarks added:

#---------------------------------------------------------
f = open('kaka.txt','r')
# The Windows root directory C:\ is a special directory
# designed to be used by Windows itself. To put it
# politely: it's unwise to do program development in
# that directory
array = f.readlines()
f.close()
# This drops the '\n' from each line:
array = [x[:-1] for x in array]
#print array
f = open('kakaDump.txt','w')
lineSize = 4
skip = 4
condition = 1
startIndex = 0

for letter in array:
line = []
startIndex = array.index(letter)

for indexNum in range(startIndex, startIndex + (skip - 1), 1):
if indexNum > (len(array) - 1):
break
else:
line.append(array[indexNum])
# This adds a space between each item in a row
# and after the last item, but it's "only" a space:
line.append(' ')

for indexNum in range(startIndex + skip, (startIndex +
lineSize) + 1, 1):
if indexNum > (len(array) - 1):
break
else:
line.append(array[indexNum])
# This completes the line:
line.append('\n')

f.writelines(line)
f.close()
#---------------------------------------------------------

I also have my own completely different version which to me looks
cleaner than yours but as they say: "Beauty is in the eye of the beholder"

#---------------------------------------------------------
lineSize = 4
lsm1 = lineSize - 1

f = open('kaka.txt','r')
inData = f.read()
f.close()

inLst = inData.split()
inCount = len(inLst)
inLst += [' ']*lineSize
print inLst

f = open('kakaDump.txt','w')
for ind,elem in enumerate(inLst):
if ind == inCount: break
for i in range(lsm1): f.write('%s ' % inLst[ind + i])
f.write('%s\n' % inLst[ind + lineSize])
f.close()
#---------------------------------------------------------

HTH,
Jussi
 
J

John Machin

Hey there, I'm currently doing data preprocessing (generating lagged
values for a time series) and I'm having some difficulties trying to
write a file to disk. A friend of mine, wrote this quick example for
me:

If that's a quick example (well over 100 lines), I'd hate to see your
idea of a long example.

Can you cut out all the irrelevant cruft and just show:

(1) the actual error you are getting
(2) the SMALLEST amount of code that demonstrates the error

Try to isolate if the problem is in *writing* the file or *generating*
the time series.

Hint: instead of one great big lump of code doing everything, write AT
LEAST two functions: one to read values from a file and generate a time
series, and a second to write it to a file.

That exercise will probably help you discover what the problem is, and
even if it doesn't, you'll have narrowed it down from one great big lump
of code to a small function.

To get you started, here's my idea for the second function:
(warning: untested)

def write_series(data, f):
"""Write a time series data to file f.

data should be a list of integers.
f should be an already opened file-like object.
"""
# Convert data into a string for writing.
s = str(data)
s = s[1:-1] # strip the leading and trailing [] delimiters
s = s.replace(',', '') # delete the commas
# Now write it to the file object
f.write(s)
f.write('\n')

And that's not cruft?

Try this: f.write(' '.join(str(x) for x in data) + '\n')
And this is how you would use it:

f = file('myfile.txt', 'w')
# get some time series data somehow...
data = [1, 2, 3, 4, 5] # or something else
write_series(data, f)
# get some more data
data = [2, 3, 4, 5, 6]
write_series(data, f)
# and now we're done
f.close()

Or for a more general solution, use the csv module:

C:\junk>\python25\python
Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
import csv
wtr = csv.writer(open('fubar.txt', 'wb'), delimiter=' ')
data = [0, 1, 42, 666]
wtr.writerow(data)
wtr.writerow([9, 8, 7, 6])
^Z

C:\junk>type fubar.txt
0 1 42 666
9 8 7 6

HTH,
John
 
S

Steven D'Aprano

def write_series(data, f):
"""Write a time series data to file f.

data should be a list of integers.
f should be an already opened file-like object.
"""
# Convert data into a string for writing.
s = str(data)
s = s[1:-1] # strip the leading and trailing [] delimiters
s = s.replace(',', '') # delete the commas
# Now write it to the file object
f.write(s)
f.write('\n')

And that's not cruft?

No. Why do you think it is crufty?

Would it be less crufty if I wrote it as a cryptic one liner without
comments?

f.write(str(data)[1:-1].replace(',', '') + '\n')

Okay, it depends on the string conversion of a list. But that's not going
to change any time soon.
Try this: f.write(' '.join(str(x) for x in data) + '\n')

That will only work in Python 2.5 or better.
 
P

Paul Rubin

Steven D'Aprano said:
Would it be less crufty if I wrote it as a cryptic one liner without
comments?

f.write(str(data)[1:-1].replace(',', '') + '\n')

That doesn't look terribly cryptic to me, but maybe I'm used to it.
That will only work in Python 2.5 or better.

It should work in 2.4, I think. Though, I haven't tried 2.5 yet so I
can't say from experience whether 2.4 is better than 2.5. Anyway,

f.write(' '.join(map(str, data)) + '\n')

should work in the whole 2.x series, if I'm not mistaken.
 
B

Bruno Desthuilliers

Steven D'Aprano a écrit :
def write_series(data, f):
"""Write a time series data to file f.

data should be a list of integers.
f should be an already opened file-like object.
"""
# Convert data into a string for writing.
s = str(data)
s = s[1:-1] # strip the leading and trailing [] delimiters
s = s.replace(',', '') # delete the commas
# Now write it to the file object
f.write(s)
f.write('\n')

And that's not cruft?


No. Why do you think it is crufty?

Because it is ?
>
Would it be less crufty if I wrote it as a cryptic one liner without
comments?

f.write(str(data)[1:-1].replace(',', '') + '\n')

Nope. It's still a WTF.
Okay, it depends on the string conversion of a list.

Nope. It depends on the *representation* of a list.
But that's not going
to change any time soon.


That will only work in Python 2.5 or better.

Really ?

Python 2.4.1 (#1, Jul 23 2005, 00:37:37)
[GCC 3.3.4 20040623 (Gentoo Linux 3.3.4-r1, ssp-3.3.2-2, pie-8.7.6)] on
linux2
Type "help", "copyright", "credits" or "license" for more information.
 
S

Steven D'Aprano

On Tue, 20 Feb 2007 22:03:43 +0100, Bruno Desthuilliers wrote:

[snip]
Because it is ?

*shrug*

"Is not."
"Is too."
"Is not."

Why is converting a list of integers to a string all at once more crufty
than converting them one at a time? Because you have to remove the
delimiters? That's no bigger a deal than adding spaces or newlines, and
if removing the commas worries you, change the output format to separate
the numbers with comma instead of space.


Would it be less crufty if I wrote it as a cryptic one liner without
comments?

f.write(str(data)[1:-1].replace(',', '') + '\n')

Nope. It's still a WTF.
Okay, it depends on the string conversion of a list.

Nope. It depends on the *representation* of a list.

No, that would be repr(data) instead of str(data). An insignificant detail
for lists, but potentially very different for other data types.


[snip demonstration]

Well, I'll be hornswaggled. Who snuck generator expressions into 2.4? I
thought they were only in 2.5 and up.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,202
Messages
2,571,057
Members
47,666
Latest member
selsetu

Latest Threads

Top