extending file?

C

Chris Cioffi

Hello all,

Are there any docs or examples of extending the file type? I work
with EDI messages that are very like text files, just with a few
quirks. ;-) and I was wondering if I could strech and twist the built
in file type to make things a bit faster and more full featured.

Specifically I would need to alter the iterator and ideally the line terminitor.

Chris
 
D

Diez B. Roggisch

Chris said:
Are there any docs or examples of extending the file type? I work
with EDI messages that are very like text files, just with a few
quirks. ;-) and I was wondering if I could strech and twist the built
in file type to make things a bit faster and more full featured.

I'm not sure what you want to accomplish here - but one of the nicer aspects
of python is that you don't have to inherit from file when you want to pass
a file-like object to some lib. See the cStringIO module. So if you create
a edi-class that _behaves_ like a file, thats all you need - you then can
pass it to any file-accepting pieco of code.

Is that what you've been asking for?
 
P

Peter Hansen

Chris said:
Are there any docs or examples of extending the file type? I work
with EDI messages that are very like text files, just with a few
quirks. ;-) and I was wondering if I could strech and twist the built
in file type to make things a bit faster and more full featured.

Specifically I would need to alter the iterator and ideally the line terminitor.

It's unclear what you want to do. Can you provide an example?

Also consider whether you can achieve what you want merely
by creating your own "file-like" object that wraps the
standard file type. This is the usual way to proceed.

-Peter
 
J

Jeremy Jones

Peter said:
It's unclear what you want to do. Can you provide an example?

Also consider whether you can achieve what you want merely
by creating your own "file-like" object that wraps the
standard file type. This is the usual way to proceed.

-Peter

I should really wait for the OP, but I've had too much caffeine this
morning to just sit still. An EDI message is a (fairly well) structured
string of text (let's just say a file for now) that may consist of
multiple interchanges, each interchange consisting of multiple segments
(at least two, and each segment having a specific character denoting its
end - all segments in an interchange will have the same segment
terminator) and each segment consisting of multiple elements. I believe
the OP wants to be able to specify what character the file object will
recognize as a line terminator (rather than the standard \n or \r\n),
presumably so he can tell it that a segment terminator is the line
terminator, do a readline(), and get an EDI segment instead of a
"traditional" line of text. Having dealt with EDI for nearly 6 years, I
could see the benefit of this. While I'm currently headed down the FSM
route, it would be interesting to see the above mentioned alternative.

Jeremy Jones
 
C

Chris Cioffi

Thanks for both your comments.

My rationale for being able to change the line terminator, was 2 fold,
as you guessed:

1. It's easy and I'm lazy ;)
2. I'm guessing that lots of that code is relatively optimized C, so
it should be fairly fast.

I had considered the FSM tactic, and in fact the rest of my EDI code
usually uses that technique. Why didn't I? Again, 2 reasons:

1. I'm lazy (Is there a patten here? :)
2. I was thinking that OS buffering would help me avoid the worst of
the .read() overhead. Still, the function calling overhead in Python
probably hurts more.

I'll probably go back and add a buffer (a few K) and run some tests.
While any speed improvement would be welcome, I won't consider less
than 1.5x faster being a real success.

The psyco optimizations are very significant, like the recipe says, it
more or less doubles the speed. I'm wondering how that will compare
with a buffered FSM.

I'd love to see a different implementation as well, Jeremy. Some of
the issues I've had to deal with may have been atypical and caused me
to lose some efficiency for an odd case. You also have more
experience with EDI than I (~1.5 yrs) so I'd like to see what kind of
design decisions you'll make.

Chris
 
J

Jeremy Jones

<snip the whole thread>

So, with everything being said that's been said, here's are two
questions for myself and for Chris Cioffi:

1. How difficult would it be to modify file.readline() so it would read
until a specific character rather than the standard end of line
character(s)?
2. What would be the best way to go about the above?

If anyone has any ideas, I think we would both listen with great interest.


Jeremy Jones
 
B

Bengt Richter

<snip the whole thread>

So, with everything being said that's been said, here's are two
questions for myself and for Chris Cioffi:

1. How difficult would it be to modify file.readline() so it would read
until a specific character rather than the standard end of line
character(s)?
2. What would be the best way to go about the above?

If anyone has any ideas, I think we would both listen with great interest.
Maybe this general idea could give you a way to install your own custom solution
without an app-specific custom mod to file or open:

I'd like to see a file/open hook in sys (or maybe os), so that you could intercept the open
operation for specific file paths registered with the hook. E.g., if it were a
simple directory

sys.filehook['foo.txt'] = MyFileClass # or callable factory

would cause file('foo.txt', mode, kwarg=something) to look in sys.filehook before going to the normal
file system, and return MyFileClass('foo.txt', mode, kwarg) instead of the normal file object. (I.e.,
file & open would be extended to accept *args, **kw to pass through to hooks if present)

This would allow passing custom input sequences as file-like objects to programs whose interface
is a file path string, and obviously you could use the file path arg passed through to open/file
to open a real file that you wanted to filter in some way (assuming you temporarily at least
removed the hook, or had an alias to the unhooked open function).

I don't think this would be hard to implement, but I haven't thought it through.
Gotta go vote. And other stuff...

An extension of this idea would be to allow registering patterns like
sys.filehook['*.cfg'] = MyConfigFilter
where a mathing open would pass through the actual matched path.


Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,209
Messages
2,571,089
Members
47,689
Latest member
kilaocrhtbfnr

Latest Threads

Top