T
timlash
Still fairly new to Python. I wrote a program that used a class
called RectangularArray as described here:
class RectangularArray:
def __init__(self, rows, cols, value=0):
self.arr = [None]*rows
self.row = [value]*cols
def __getitem__(self, (i, j)):
return (self.arr or self.row)[j]
def __setitem__(self, (i, j), value):
if self.arr==None: self.arr = self.row[:]
self.arr[j] = value
This class was found in a 14 year old post:
http://www.python.org/search/hypermail/python-recent/0106.html
This worked great and let me process a few hundred thousand data
points with relative ease. However, I soon wanted to start sorting
arbitrary portions of my arrays and to transpose others. I turned to
Numpy rather than reinventing the wheel with custom methods within the
serviceable RectangularArray class. However, once I refactored with
Numpy I was surprised to find that the execution time for my program
doubled! I expected a purpose built array module to be more efficient
rather than less.
I'm not doing any linear algebra with my data. I'm working with
rectangular datasets, evaluating individual rows, grouping, sorting
and summarizing various subsets of rows.
Is a Numpy implementation overkill for my data handling uses? Should
I evaluate prior array modules such as Numeric or Numarray? Are there
any other modules suited to handling tabular data? Would I be best
off expanding the RectangularArray class for the few data
transformation methods I need?
Any guidance or suggestions would be greatly appreciated!
Cheers,
Tim
called RectangularArray as described here:
class RectangularArray:
def __init__(self, rows, cols, value=0):
self.arr = [None]*rows
self.row = [value]*cols
def __getitem__(self, (i, j)):
return (self.arr or self.row)[j]
def __setitem__(self, (i, j), value):
if self.arr==None: self.arr = self.row[:]
self.arr[j] = value
This class was found in a 14 year old post:
http://www.python.org/search/hypermail/python-recent/0106.html
This worked great and let me process a few hundred thousand data
points with relative ease. However, I soon wanted to start sorting
arbitrary portions of my arrays and to transpose others. I turned to
Numpy rather than reinventing the wheel with custom methods within the
serviceable RectangularArray class. However, once I refactored with
Numpy I was surprised to find that the execution time for my program
doubled! I expected a purpose built array module to be more efficient
rather than less.
I'm not doing any linear algebra with my data. I'm working with
rectangular datasets, evaluating individual rows, grouping, sorting
and summarizing various subsets of rows.
Is a Numpy implementation overkill for my data handling uses? Should
I evaluate prior array modules such as Numeric or Numarray? Are there
any other modules suited to handling tabular data? Would I be best
off expanding the RectangularArray class for the few data
transformation methods I need?
Any guidance or suggestions would be greatly appreciated!
Cheers,
Tim