Raw strings as input from File?

utabintarbo · Nov 24, 2009

I have a log file with full Windows paths on a line. eg:
K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
\somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ; 1259006416

As I try to pull in the line and process it, python changes the "\10"
to a "\x08". This is before I can do anything with it. Is there a way
to specify that incoming lines (say, when using .readlines() ) should
be treated as raw strings?

TIA

MRAB · Nov 24, 2009

utabintarbo said:
I have a log file with full Windows paths on a line. eg:
K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
\somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ; 1259006416

As I try to pull in the line and process it, python changes the "\10"
to a "\x08". This is before I can do anything with it. Is there a way
to specify that incoming lines (say, when using .readlines() ) should
be treated as raw strings?

..readlines() doesn't change the "\10" in a file to "\x08" in the string
it returns.

Could you provide some code which shows your problem?

Carsten Haese · Nov 24, 2009

utabintarbo said:
I have a log file with full Windows paths on a line. eg:
K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
\somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ; 1259006416

As I try to pull in the line and process it, python changes the "\10"
to a "\x08".

Python does no such thing. When Python reads bytes from a file, it
doesn't interpret or change those bytes in any way. Either there is
something else going on here that you're not telling us, or the file
doesn't contain what you think it contains. Please show us the exact
code you're using to process this file, and show us the exact contents
of the file you're processing.

utabintarbo · Nov 24, 2009

.readlines() doesn't change the "\10" in a file to "\x08" in the string
it returns.

Could you provide some code which shows your problem?

Here is the code block I have so far:
for l in open(CONTENTS, 'r').readlines():
f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
shutil.rmtree(os.path.join(DIR1,f))
if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
shutil.rmtree(os.path.join(DIR2,f))

I am trying to find dirs with the basename of the initial path less
the extension in both DIR1 and DIR2

A minimally obfuscated line from the log file:
K:\sm\SMI\des\RS\Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz->/arch_m1/
smi/des/RS/Pat/10DJ/121.D5-30\1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
11/24/2009 08:16:42 ; 1259068602

What I get from the debugger/python shell:
'K:\\sm\\SMI\\des\\RS\\Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz->/arch_m1/
smi/des/RS/Pat/10DJ/121.D5-30/1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
11/24/2009 08:16:42 ; 1259068602'

TIA

Jon Clements · Nov 24, 2009

.readlines() doesn't change the "\10" in a file to "\x08" in the string
it returns.

Click to expand...

Could you provide some code which shows your problem?

Click to expand...

Here is the code block I have so far:
for l in open(CONTENTS, 'r').readlines():
f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
shutil.rmtree(os.path.join(DIR1,f))
if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
shutil.rmtree(os.path.join(DIR2,f))

I am trying to find dirs with the basename of the initial path less
the extension in both DIR1 and DIR2

A minimally obfuscated line from the log file:
K:\sm\SMI\des\RS\Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz->/arch_m1/
smi/des/RS/Pat/10DJ/121.D5-30\1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
11/24/2009 08:16:42 ; 1259068602

What I get from the debugger/python shell:
'K:\\sm\\SMI\\des\\RS\\Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz->/arch_m1/
smi/des/RS/Pat/10DJ/121.D5-30/1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
11/24/2009 08:16:42 ; 1259068602'

TIA

jon@jon-desktop:~/pytest$ cat log.txt
K:\sm\SMI\des\RS\Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz->/arch_m1/
smi/des/RS/Pat/10DJ/121.D5-30\1215B-B-D5-BSHOE-MM.smz ; t9480rc ;
11/24/2009 08:16:42 ; 1259068602

['K:\\sm\\SMI\\des\\RS\\Pat\\10DJ\\121.D5-30\\1215B-B-D5-BSHOE-MM.smz-
/arch_m1/\n', 'smi/des/RS/Pat/10DJ/121.D5-30\\1215B-B-D5-BSHOE-

MM.smz ; t9480rc ;\n', '11/24/2009 08:16:42 ; 1259068602\n']

See -- it's not doing anything

Although, "Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz" and "Pat
\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz" seem to be fairly different -- are
you sure you're posting the correct output!?

Jon.

Jon Clements · Nov 24, 2009

Although, "Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz" and "Pat
\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz" seem to be fairly different -- are
you sure you're posting the correct output!?

Ugh... let's try that...

Pat\10DJ\121.D5-30\1215B-B-D5-BSHOE-MM.smz
Pat\x08DJQ.D5-30Q5B-B-D5-BSHOE-MM.smz

Jon.

Terry Reedy · Nov 24, 2009

utabintarbo said:
I have a log file with full Windows paths on a line. eg:
K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
\somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ; 1259006416

As I try to pull in the line and process it, python changes the "\10"
to a "\x08".

This should only happen if you paste the test into your .py file as a
string literal.

This is before I can do anything with it. Is there a way
to specify that incoming lines (say, when using .readlines() ) should
be treated as raw strings?

Or if you use execfile or compile and ask Python to interprete the input
as code.

There are no raw strings, only raw string code literals marked with an
'r' prefix for raw processing of the quoted text.

Grant Edwards · Nov 25, 2009

.readlines() doesn't change the "\10" in a file to "\x08" in the string
it returns.

Could you provide some code which shows your problem?

Click to expand...

Here is the code block I have so far:
for l in open(CONTENTS, 'r').readlines():
f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
shutil.rmtree(os.path.join(DIR1,f))
if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
shutil.rmtree(os.path.join(DIR2,f))

Click to expand...

Ahem. This doesn't run. os.path.split() returns a tuple, and calling
os.path.splitext() doesn't work. Given that replacing the entire loop
contents with "print l" readily disproves your assertion, I suggest you
cut and paste actual code if you want an answer. Otherwise we're just
going to keep saying "No, it doesn't", because no, it doesn't.

It's, um, rewarding to see my recent set of instructions being
followed.

When you do what, exactly?

Dennis Lee Bieber · Nov 25, 2009

Here is the code block I have so far:
for l in open(CONTENTS, 'r').readlines():
f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
shutil.rmtree(os.path.join(DIR1,f))
if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
shutil.rmtree(os.path.join(DIR2,f))

I am trying to find dirs with the basename of the initial path less
the extension in both DIR1 and DIR2

And just what are DIR1 and DIR2?

So far as I can tell, the likely position of your problem is that
THEY are the source of the problem, and you are joining them to a
perfectly valid item.

Jon Clements · Nov 25, 2009

.readlines() doesn't change the "\10" in a file to "\x08" in the string
it returns.
Could you provide some code which shows your problem?
Here is the code block I have so far:
for l in open(CONTENTS, 'r').readlines():
f = os.path.splitext(os.path.split(l.split('->')[0]))[0]
if f in os.listdir(DIR1) and os.path.isdir(os.path.join(DIR1,f)):
shutil.rmtree(os.path.join(DIR1,f))
if f in os.listdir(DIR2) and os.path.isdir(os.path.join(DIR2,f)):
shutil.rmtree(os.path.join(DIR2,f))

Click to expand...

Click to expand...

Ahem. This doesn't run. os.path.split() returns a tuple, and calling
os.path.splitext() doesn't work. Given that replacing the entire loop
contents with "print l" readily disproves your assertion, I suggest you
cut and paste actual code if you want an answer. Otherwise we're just
going to keep saying "No, it doesn't", because no, it doesn't.

Click to expand...

It's, um, rewarding to see my recent set of instructions being
followed.

When you do what, exactly?

Click to expand...

Can't remember if this thread counts as "Edwards' Law 5[b|c]"

I'm sure I pinned it up on my wall somewhere, right next to
http://imgs.xkcd.com/comics/tech_support_cheat_sheet.png

Jon.

rzed · Dec 2, 2009

om:

I have a log file with full Windows paths on a line. eg:
K:\A\B\C\10xx\somerandomfilename.ext->/a1/b1/c1/10xx
\somerandomfilename.ext ; t9999xx; 11/23/2009 15:00:16 ;
1259006416

As I try to pull in the line and process it, python changes the
"\10" to a "\x08". This is before I can do anything with it. Is
there a way to specify that incoming lines (say, when using
.readlines() ) should be treated as raw strings?

TIA

Despite all the ragging you're getting, it is a pretty flakey thing
that Python does in this context:
(from a python shell)'\x08'

If you are pasting your string as a literal, then maybe it does the
same. It still seems weird to me. I can accept that '\1' means x01,
but \10 seems to be expanded to \010 and then translated from octal
to get to x08. That's just strange. I'm sure it's documented
somewhere, but it's not easy to search for.

Oh, and this:'8'
.... is realy odd.

Dave Angel · Dec 2, 2009

rzed said:
om:

Despite all the ragging you're getting, it is a pretty flakey thing

When the OP specified readline(), which does *not* behave this way, he
probably deserved what you call "ragging." The backslash escaping is
for string literals, which are in code, not in data files.

In any case, there's a big difference between surprising (to you), and
flakey.

that Python does in this context:
(from a python shell)

'\x08'

If you are pasting your string as a literal, then maybe it does the
same. It still seems weird to me. I can accept that '\1' means x01,
but \10 seems to be expanded to \010 and then translated from octal
to get to x08. That's just strange. I'm sure it's documented
somewhere, but it's not easy to search for.

Check in the help for "escape Strings". It's documented (in vers. 2.6,
anyway) in a nice chart that backslash followed by 3 digits, is
interpreted as octal. I don't like it much either, but it's inherited
from C, which has worked that way for 30+ years.

Online, see
http://www.python.org/doc/2.6.4/reference/lexical_analysis.html, and
look in section 2.4.1 for the chart.

Oh, and this:

'8'
... is realy odd.

Octal 70 is hex 38 (or decimal 56), which is the character '8'.

DaveA

raw Strings from XML attributes	5	Jun 11, 2004
geting error as unxpected symbol read: ". in array initialization	0	Mar 27, 2016
How to read strings cantaining escape character from a file and useit as escape sequences?	5	Dec 1, 2007
A 'raw' codec for binary "strings" in Python?	2	Mar 1, 2004
Enumeration of strings and export of the constants	2	Nov 10, 2010
query from sqlalchemy returns AttributeError: 'NoneType' object	2	May 2, 2013
Problems reading strings from files	4	Jun 11, 2006
Sharing: File Reader Generator with & w/o Policy	14	Mar 15, 2014

Raw strings as input from File?

utabintarbo

MRAB

Carsten Haese

utabintarbo

Jon Clements

Jon Clements

Terry Reedy

Grant Edwards

Dennis Lee Bieber

Jon Clements

rzed

Dave Angel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads