Changing filenames from Greeklish => Greek (subprocess complain)

  • Thread starter Íéêüëáïò Êïýñáò
  • Start date
Í

Íéêüëáïò Êïýñáò

Ôç ÊõñéáêÞ, 2 Éïõíßïõ 2013 8:05:32 ì.ì.UTC+3, ï ÷ñÞóôçò Chris Angelico Ýãñáøå:
A programmer chooses his own clients, and you are the Atherton Wing to
my Inara Serra.

You might want to explain this mystique call-name you inprovised for me.
 
M

Michael Torrie

Ôç ÊõñéáêÞ, 2 Éïõíßïõ 2013 8:05:32 ì.ì. UTC+3, ï ÷ñÞóôçò Chris Angelico Ýãñáøå:


You might want to explain this mystique call-name you inprovised for me?

Maybe you could research it a bit.
 
C

Chris Angelico

Ôç ÊõñéáêÞ, 2 Éïõíßïõ 2013 8:05:32 ì.ì. UTC+3, ï ÷ñÞóôçò Chris Angelico Ýãñáøå:


You might want to explain this mystique call-name you inprovised for me.

Or you could do a quick web search, like we keep telling you to do for
other things. I'm pretty sure Google is familiar with both those
names, and will point you to a mighty fine shindig.

ChrisA
 
Í

Íéêüëáïò Êïýñáò

Ôç ÊõñéáêÞ, 2 Éïõíßïõ 2013 8:21:46 ì.ì.UTC+3, ï ÷ñÞóôçò Chris Angelico Ýãñáøå:
Or you could do a quick web search, like we keep telling you to do for
other things. I'm pretty sure Google is familiar with both those
names, and will point you to a mighty fine shindig.


I see, nice one! :)
I though ti was somehtign you invented yourself :)

Now, as yo my initial question, can you suggest anything please?
I dont know what else to try.

The whole subprocess fails when it has to deal with a greek lettered filename.
I just dont get it...
I didn't ask for metrites.py to open via subprocess the greek filename, i just told metrites.py to call files.py which in turn tries to open that greek filename.

Thats indirectly call...
 
C

Carlos Nepomuceno

Hey guys! Come on!!!

Repeat with me: "Googsfraba!"


lol

----------------------------------------
 
D

Dennis Lee Bieber

Ôç ÊõñéáêÞ, 2 Éïõíßïõ 2013 6:15:16 ì.ì. UTC+3, ï ÷ñÞóôçò Chris Angelico Ýãñáøå:



I'am an OP like everybody else in here and i don't demand anything.

For this thread you are -- since my understanding of OP is "original
poster"; ie, the person that started the thread. But many of the
regulars in this group haven't originated a thread in years, we are just
responders.
But ome people instead of being helpfull or even ignore, think its better to belittle me. You are a special case, because you do both(you ahve actually provided me hits manh times), but that is not the case of some other regulars.

"Snippets" that rely upon CGI processing, et al; are not really
useful for debugging remotely. Yes, you were asked to provide the code
in-line (which I agree with), but the corollary is that said code should
be the minimal /runnable/ code the reproduces the problem.

The best I can interpret is that you spawned a process, and that
process is failing. So... Can you run the spawned program stand-alone
(at a command line) providing needed input and output streams, and
determine what, if any, error code it produces.

Then move up to some sample code that does the spawning /without/
using a web-server. See if that code runs or reports an error. Maybe
substitute the spawned task with one that does nothing more than print
out the environment contents (what was received as command line, what
was found in environment variables, etc.).

If it still works from command line, try the environment dump from
the web server...
 
N

nagia.retsina

Τη ΚυÏιακή, 2 Ιουνίου 2013 10:44:05 μ.μ. UTC+3, ο χÏήστης Carlos Nepomuceno έγÏαψε:
Hey guys! Come on!!!
Repeat with me: "Googsfraba!"

You are not and Jack Nicholson and this is not an Anger Management lesson (which was a great movie btw).

I'am the one that i should chant that mantra because instead of receiving helpfull replies i get that kind of responses.

Now, is anyone willing to help me on this please?
I also accept hints on how to solve this!
 
C

Chris Angelico

Now, is anyone willing to help me on this please?
I also accept hints on how to solve this!

Hints I can do. Here, I've set up a scavenger hunt for you. You'll go
to a number of web sites that I nominate, type in keywords, and hit
enter.

The first web site is Google: www.google.com
Type in: suexec

The second web site is DuckDuckGo: www.duckduckgo.com
Type in: paid apache technical support

The third web site is ESR Question (not its official name):
http://www.catb.org/esr/faqs/smart-questions.html
Unlike the other two, this one is so smart (it says it right in the
URL!) that it doesn't need you to type anything in!

Read through these pages. You will find clues scattered throughout;
some of them will be solved by typing them into one of the first two
web sites. When you complete the scavenger hunt, you will find your
prize.

ChrisA
 
T

Tim Delaney

A programmer chooses his own clients, and you are the Atherton Wing to
my Inara Serra.

I've just been watching this train wreck (so glad I didn't get involved at
the start) but I have to say - that's brilliant Chris. Thank you for
starting my week off so well.

Tim Delaney
 
T

Tim Delaney

A programmer chooses his own clients, and you are the Atherton Wing to

I've just been watching this train wreck (so glad I didn't get involved at
the start) but I have to say - that's brilliant Chris. Thank you for
starting my week off so well.

And I just realised I missed a shiny opportunity to wrangle "train job" in
there ...

Tim Delaney
 
Í

Íéêüëáïò Êïýñáò

I did all the google searh i could, but iwht no luxk, this is no suexec issue.
Why you say it is one?
The other problem i had was 'suexec' related, not this one, this is a subprocess issue.
 
C

Chris Angelico

I've just been watching this train wreck (so glad I didn't get involved at
the start) but I have to say - that's brilliant Chris. Thank you for
starting my week off so well.

That went well.

*sighs happily*

ChrisA
 
Í

Íéêüëáïò Êïýñáò

Chris can you please help me solve this problem?

Why subprocess fails when it has to deal with a greek flename? and that an indirect call too....
 
Í

Íéêüëáïò Êïýñáò

Ok, this email is something of a recital of how I approached this.

The apache error log:

I restarted the apache:
/etc/init.d/httpd restart

Then a:
ps axf
gave me the PID of a running httpd. Examining its open files:
lsof -p 9287

shows me:
httpd 9287 nobody 2w REG 0,192 12719609 56510510 /usr/local/apache/logs/error_log
httpd 9287 nobody 7w REG 0,192 7702310 56510512 /usr/local/apache/logs/access_log
among many others.

So, to monitor these logs:

tail -F /usr/local/apache/logs/error_log /usr/local/apache/logs/access_log &

placing the tail in the background so I can still use that shell.

Watching the log while fetching the page:

http://superhost.gr/

says:

==> /usr/local/apache/logs/error_log <==
[Tue Apr 23 12:11:40 2013] [error] [client 54.252.27.86] suexec policy violation: see suexec log for more details
[Tue Apr 23 12:11:40 2013] [error] [client 54.252.27.86] Premature end ofscript headers: metrites.py
[Tue Apr 23 12:11:40 2013] [error] [client 54.252.27.86] File does not exist: /home/nikos/public_html/500.shtml
[Tue Apr 23 12:11:43 2013] [error] [client 107.22.40.41] suexec policy violation: see suexec log for more details
[Tue Apr 23 12:11:43 2013] [error] [client 107.22.40.41] Premature end ofscript headers: metrites.py
[Tue Apr 23 12:11:43 2013] [error] [client 107.22.40.41] File does not exist: /home/nikos/public_html/500.shtml
[Tue Apr 23 12:11:45 2013] [error] [client 79.125.63.121] suexec policy violation: see suexec log for more details
[Tue Apr 23 12:11:45 2013] [error] [client 79.125.63.121] Premature end of script headers: metrites.py
[Tue Apr 23 12:11:45 2013] [error] [client 79.125.63.121] File does not exist: /home/nikos/public_html/500.shtml

So:

You're using suexec in your Apache. This greatly complicates your debugging..

Suexec seems to be a facility for arranging that CGI script run as the user
who owns them. Because that has a lot of potential for ghastly
security holes, suexec performs a large number of strict checks on
CGI script locations, permissions and locations before running a
CGI script. At a guess the first hurdle would be that metrites.py
is owned by root. Suexec is very picky about what users it is
prepared to become. "root" is not one of them, as you might imagine.

I've chowned metrites.py to nikos:nikos. Suexec not lets it run, producing this:

Traceback (most recent call last):
File "metrites.py", line 9, in <module>
sys.stderr = open('/home/nikos/public_html/cgi.err.out', 'a')
PermissionError: [Errno 13] Permission denied: '/home/nikos/public_html/cgi..err.out'

That file is owned by root. metrites.py is being run as nikos.

So:

chown nikos:nikos /home/nikos/public_html/cgi.err.out


A page reload now shows this:

Error in sys.excepthook:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 2334-2342: ordinal not in range(128)

Original exception was:
Traceback (most recent call last):
File "metrites.py", line 226, in <module>
print( template )
UnicodeEncodeError: 'ascii' codec can't encode characters in position 30-38: ordinal not in range(128)

This shows you writing the string in template to stdout. The default
encoding for stdout is 'ascii', accepting only characters of values
0..127. I expect template contains more than this, since the ASCII
range is very US Anglocentric; Greek characters for example won't
encode into ascii.

As mentioned in the thread on python-list, python will adopt your
terminal's encoding it used interactively but will be pretty
conservation if the output is not a terminal; ascii as you see
above.

What you want is probably UTF-8 in the output byte stream. But
let's check what the HTTP headers are saying, because _they_ tell
the browser the byte stream encoding. The headers and your program's
encoding must match. So:

% wget -S -O - http://superhost.gr/
--2013-04-23 19:34:38-- http://superhost.gr/
Resolving superhost.gr (superhost.gr)... 82.211.30.133
Connecting to superhost.gr (superhost.gr)|82.211.30.133|:80... connected.
HTTP request sent, awaiting response...
HTTP/1.1 200 OK
Date: Tue, 23 Apr 2013 09:34:46 GMT
Server: Apache/2.2.24 (Unix) mod_ssl/2.2.24 OpenSSL/1.0.0-fips mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
Length: unspecified [text/html]
Saving to: ‘STDOUT’

<!--: spam
Content-Type: text/html

<body bgcolor="#f0f0f8"><font color="#f0f0f8" size="-5"> -->
<body bgcolor="#f0f0f8"><font color="#f0f0f8" size="-5"> --> -->
</font> </font> </font> </script> </object> </blockquote> </pre>

So, the Content-Type: header says: "text/html; charset=utf-8". So that's good.

So I've imported codecs and added this line:

sys.stdout = os.fdopen(1, 'w', encoding='utf-8')

under the setting of sys.stderr. If the cgi libraries run under
python 3 there is probably a cleaner way to do this but i don't know how.

This just opens UNIX file descriptor 1 (standard output) from scratch
for write ('w') using the 'utf-8' encoding.

And now your CGI script runs, accepting strings sent to print().
sys.stdout now takes care of transcoding those strings (Unicode
character code points inside Python) into the utf-8 encoding required
in the output bytes.
 
Í

Íéêüëáïò Êïýñáò

But still when it comes to subprocess trying to call files.py which in turn tries to call a greek filename i get the same response i posted at my initial post.
 
M

Michael Torrie

The whole subprocess fails when it has to deal with a greek lettered
filename.

No it doesn't. subprocess is working correctly. It's the program
subprocess is running that is erring out. subprocess is merely
reporting to you that it erred out, just as it should.
I just dont get it... I didn't ask for metrites.py to open via
subprocess the greek filename, i just told metrites.py to call
files.py which in turn tries to open that greek filename.

Seems to me, then it's clear how you should go about debugging this.
Have to run each layer separately, and see where the error occurs.
Start with files.py.

that's what any of us would do if the code were ours. But we can't do
the debugging for you. It's not possible! We don't have access to your
code nor to your server environment. Nor is it appropriate for any of
us who have no contractual relationship with you to have such access to
your server.

Good luck.
 
M

Michael Torrie

Ok, this email is something of a recital of how I approached this.

Excellent. You're making progress. Keep doing research, and learn how
to debug your python programs.

One thing I've done as a last resort when I just can't get good error
reporting because of subprocess or something else, is to have my script
open a file in /tmp and write messages to it as the program executes.
Sometimes I write out what the standard input was so I know for doing a
standalone run. There's your free hint for the day.
 
Í

Íéêüëáïò Êïýñáò

Thankls Michael,

are these two behave the same in your opinion?

sys.stdout = os.fdopen(1, 'w', encoding='utf-8')

which is what i have now
opposed to this one

import ocdecs
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())

Which one should i keep and why?
 
S

Steven D'Aprano

Why subprocess fails when it has to deal with a greek flename? and that
an indirect call too....

It doesn't. The command you are calling fails, not subprocess.


The code you show is this:


/home/nikos/public_html/cgi-bin/metrites.py in ()
217 template = htmldata + counter
218 elif page.endswith('.py'):
=> 219 htmldata = subprocess.check_output( '/home/nikos/
public_html/cgi-bin/' + page )
220 template = htmldata.decode('utf-8').replace
( 'Content-type: text/html; charset=utf-8', '' ) + counter



The first step is to inspect the value of the file name. Normally I would
just call print, but since this is live code, and a web server, you
probably don't want to use print directly. But you can print to a file,
and then inspect the file. Using logging is probably better, but here's a
real quick and dirty way to get the same result:

elif page.endswith('.py'):
name = '/home/nikos/public_html/cgi-bin/' + page
print(name, file=open('/home/nikos/out.txt', 'w'))
htmldata = subprocess.check_output(name)



Now inspect /tmp/out.txt using the text editor of your choice. What does
it contain? Is the file name of the executable what you expect? Does it
exist, and is it executable?


The next step, after checking that, is to check the executable .py file.
It may contain a bug which is causing this problem. However, I think I
can guess what the nature of the problem is.


The output you show includes:

cmd = '/home/nikos/public_html/cgi-bin/files.py'
output = b'Content-type: text/html; charset=utf-8\n\n<bod...n
position 74: surrogates not allowed\n\n-->\n\n'


My *guess* of your problem is this: your file names have invalid bytes in
them, when interpreted as UTF-8.

Remember, on a Linux system, file names are stored as bytes. So the file-
name-as-a-string need to be *encoded* into bytes. My *guess* is that
somehow, when renaming your files, you gave them a name which may be
correctly encoded in some other encoding, but not in UTF-8. Then, when
you try to read the file names in UTF-8, you hit an illegal byte, half of
a surrogate pair perhaps, and everything blows up.

Something like this:

py> s = "Îικόλαος ΚοÏÏας"
py> b = s.encode('ISO-8859-7') # Oh oh, wrong encoding!
py> print(b)
b'\xcd\xe9\xea\xfc\xeb\xe1\xef\xf2 \xca\xef\xfd\xf1\xe1\xf2'
py> b.decode('UTF-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcd in position 0:
invalid continuation byte


Obviously the error is a little different, because the original string is
different.

If I am right, the solution is to fix the file names to ensure that they
are all valid UTF-8 names. If you view the directory containing these
files in a file browser that supports UTF-8, do you see any file names
containing Mojibake?

http://en.wikipedia.org/wiki/Mojibake


Fix those file names, and hopefully the problem will go away.
 
C

Chris Angelico

Then, when
you try to read the file names in UTF-8, you hit an illegal byte, half of
a surrogate pair perhaps, and everything blows up.

Minor quibble: Surrogates are an artifact of UTF-16, so they're 16-bit
values like 0xD808 or 0xDF45. Possibly what you're talking about here
is a continuation byte, which in UTF-8 are used only after a lead
byte. For instance: 0xF0 0x92 0x8D 0x85 is valid UTF-8, but 0x41 0x92
is not.

There is one other really annoying thing to deal with, and that's the
theoretical UTF-8 encoding of a UTF-16 surrogate. (I say "theoretical"
because strictly, these are invalid; UTF-8 does not encode invalid
codepoints.) 0xED 0xA0 0x88 and 0xED 0xBD 0x85 encode the two I
mentioned above. Depending on what's reading the filename, you might
actually have these throw errors, or maybe not. Python's decoder is
correctly strict:
Traceback (most recent call last):
File "<pyshell#9>", line 1, in <module>
str(b'\xed\xa0\x88','utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-2:
invalid continuation byte

Actually, I'm not sure here, but I think that error message may be
wrong, or at least unclear. It's perfectly possible to decode those
bytes using the UTF-8 algorithm; you end up with the value 0xD808,
which you then reject because it's a surrogate. But maybe the Python
UTF-8 decoder simplifies some of that.

ChrisA
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,139
Messages
2,570,805
Members
47,351
Latest member
LolaD32479

Latest Threads

Top