Changing filenames from Greeklish => Greek (subprocess complain)

  • Thread starter Íéêüëáïò Êïýñáò
  • Start date
Í

Íéêüëáïò Êïýñáò

Ôç Ôñßôç, 4 Éïõíßïõ 2013 4:01:48 ì.ì. UTC+3, ï ÷ñÞóôçò Nobody Ýãñáøå:
$ wget -S -O - http://superhost.gr/data/apps/

--2013-06-04 14:00:10-- http://superhost.gr/data/apps/

Resolving superhost.gr... 82.211.30.133

Connecting to superhost.gr|82.211.30.133|:80... connected.

HTTP request sent, awaiting response...

HTTP/1.1 200 OK

Server: ApacheBooster/1.6

Date: Tue, 04 Jun 2013 13:00:19 GMT

Content-Type: text/html;charset=ISO-8859-1

Transfer-Encoding: chunked

Connection: keep-alive

Vary: Accept-Encoding

X-Cacheable: YES

X-Varnish: 2000177813

Via: 1.1 varnish

age: 0

X-Cache: MISS

Ah, you were talking for the '/www/data/apps subfolder', yes indeed, i though you were talking about superhost.gr.

That how Apache chooses to show them, but this is not so much important because visitors wont enter directly that page, but they will have a chnace toget those files from within http://superhost.gr/cgi-bin/files.py

which in turn gives me this:

[Tue Jun 04 16:36:09 2013] [error] [client 46.12.95.59] Error in sys.excepthook:
[Tue Jun 04 16:36:09 2013] [error] [client 46.12.95.59] ValueError: underlying buffer has been detached
[Tue Jun 04 16:36:09 2013] [error] [client 46.12.95.59]
[Tue Jun 04 16:36:09 2013] [error] [client 46.12.95.59] Original exception was:
[Tue Jun 04 16:36:09 2013] [error] [client 46.12.95.59] Traceback (most recent call last):
[Tue Jun 04 16:36:09 2013] [error] [client 46.12.95.59] File "files.py", line 67, in <module>
[Tue Jun 04 16:36:09 2013] [error] [client 46.12.95.59] cur.execute('''SELECT url FROM files WHERE url = %s''', (fullpath,) )
[Tue Jun 04 16:36:09 2013] [error] [client 46.12.95.59] File "/usr/local/lib/python3.3/site-packages/PyMySQL3-0.5-py3.3.egg/pymysql/cursors.py", line 108, in execute
[Tue Jun 04 16:36:09 2013] [error] [client 46.12.95.59] query = query..encode(charset)
[Tue Jun 04 16:36:09 2013] [error] [client 46.12.95.59] UnicodeEncodeError:'utf-8' codec can't encode character '\\udcd3' in position 61: surrogates not allowed
[Tue Jun 04 16:36:09 2013] [error] [client 46.12.95.59] Premature end of script headers: files.py

:(
 
Í

Íéêüëáïò Êïýñáò

Steven said:
It looks like your client is ignoring the charset header, and
interpreting the bytes as Latin-1 when they are actually ISO-8859-7.
py> s = 'Eυχή του ΙησοÏ.mp3'
py> print(s.encode('ISO-8859-7').decode('latin-1'))
Eõ÷Þ ôïõ Éçóïý.mp3
which matches what you see. If you can manually tell your client to use
ISO-8859-7, you should see it correctly.

I think this is the case too Steven, but it suprises me to see that Chrome ignores the charset header.

Actually when i told explicitly Chrome to display everythign as utf-8 it presented the filaname properly.

py> print(s.encode('ISO-8859-7').decode('latin-1'))

Why you are encoding the 's' string to greek-iso?
Isn't it set by itself in greek-iso since it uses greek-iso lettering?
I think you are very close to solution but i cannot clearly see it yet.
 
Í

Íéêüëáïò Êïýñáò

I' just tried to implment your idea by correcting file names as:

# Compute a set of current fullpaths
fullpaths = set()
path = "/home/nikos/www/data/apps/"

for root, dirs, files in os.walk(path):
for fullpath in files:
fullpaths.add( os.path.join(root, fullpath) )


# Load'em
for fullpath in fullpaths:
try:
# Check the presence of a file against the database and insert if it doesn't exist
cur.execute('''SELECT url FROM files WHERE url = %s''', ( fullpath.encode('ISO-8859-7').decode('latin-1'), )
data = cur.fetchone() #URL is unique, so should only be one

This gave me this error:

root@nikos [~]# [Tue Jun 04 16:55:51 2013] [error] [client 46.12.95.59] File "files.py", line 68
[Tue Jun 04 16:55:51 2013] [error] [client 46.12.95.59] data = cur.fetchone() #URL is unique, so should only be one
[Tue Jun 04 16:55:51 2013] [error] [client 46.12.95.59] ^
[Tue Jun 04 16:55:51 2013] [error] [client 46.12.95.59] SyntaxError: invalid syntax
[Tue Jun 04 16:55:51 2013] [error] [client 46.12.95.59] Premature end of script headers: files.py
[Tue Jun 04 16:55:51 2013] [error] [client 46.12.95.59] File does not exist: /home/nikos/public_html/500.shtml


It seem that this approach overcame the error, won't you agree?
But i see no syntax error in the exact follow up line.

data = cur.fetchone() #URL is unique, so should only be one
 
M

Mark Lawrence

I' just tried to implment your idea by correcting file names as:

# Compute a set of current fullpaths
fullpaths = set()
path = "/home/nikos/www/data/apps/"

for root, dirs, files in os.walk(path):
for fullpath in files:
fullpaths.add( os.path.join(root, fullpath) )


# Load'em
for fullpath in fullpaths:
try:
# Check the presence of a file against the database and insert if it doesn't exist
cur.execute('''SELECT url FROM files WHERE url = %s''', ( fullpath.encode('ISO-8859-7').decode('latin-1'), )
data = cur.fetchone() #URL is unique, so should only be one

This gave me this error:

root@nikos [~]# [Tue Jun 04 16:55:51 2013] [error] [client 46.12.95.59] File "files.py", line 68
[Tue Jun 04 16:55:51 2013] [error] [client 46.12.95.59] data = cur.fetchone() #URL is unique, so should only be one
[Tue Jun 04 16:55:51 2013] [error] [client 46.12.95.59] ^
[Tue Jun 04 16:55:51 2013] [error] [client 46.12.95.59] SyntaxError: invalid syntax
[Tue Jun 04 16:55:51 2013] [error] [client 46.12.95.59] Premature end of script headers: files.py
[Tue Jun 04 16:55:51 2013] [error] [client 46.12.95.59] File does not exist: /home/nikos/public_html/500.shtml


It seem that this approach overcame the error, won't you agree?
But i see no syntax error in the exact follow up line.

data = cur.fetchone() #URL is unique, so should only be one

The syntax error is often in the preceeding line, typically because
you're missed a closing bracket.

--
"Steve is going for the pink ball - and for those of you who are
watching in black and white, the pink is next to the green." Snooker
commentator 'Whispering' Ted Lowe.

Mark Lawrence
 
Í

Íéêüëáïò Êïýñáò

No, brackets are all there. Just tried:

# Compute a set of current fullpaths
fullpaths = set()
path = "/home/nikos/www/data/apps/"

for root, dirs, files in os.walk(path):
for fullpath in files:
fullpaths.add( os.path.join(root, fullpath) )
print (fullpath )
print (fullpath.encode('iso-8859-7').decode('latin-1') )

sys.exit(0)

=========================

root@nikos [~]# [Tue Jun 04 17:16:18 2013] [error] [client 46.12.95.59] Error in sys.excepthook:
[Tue Jun 04 17:16:18 2013] [error] [client 46.12.95.59] ValueError: underlying buffer has been detached
[Tue Jun 04 17:16:18 2013] [error] [client 46.12.95.59]
[Tue Jun 04 17:16:18 2013] [error] [client 46.12.95.59] Original exception was:
[Tue Jun 04 17:16:18 2013] [error] [client 46.12.95.59] Traceback (most recent call last):
[Tue Jun 04 17:16:18 2013] [error] [client 46.12.95.59] File "files.py", line 61, in <module>
[Tue Jun 04 17:16:18 2013] [error] [client 46.12.95.59] print (fullpath )
[Tue Jun 04 17:16:18 2013] [error] [client 46.12.95.59] File "/usr/local/lib/python3.3/codecs.py", line 355, in write
[Tue Jun 04 17:16:18 2013] [error] [client 46.12.95.59] data, consumed = self.encode(object, self.errors)
[Tue Jun 04 17:16:18 2013] [error] [client 46.12.95.59] UnicodeEncodeError: 'utf-8' codec can't encode character '\\udcc5' in position 0: surrogates not allowed
[Tue Jun 04 17:16:18 2013] [error] [client 46.12.95.59] Premature end of script headers: files.py
[Tue Jun 04 17:16:18 2013] [error] [client 46.12.95.59] File does not exist: /home/nikos/public_html/500.shtml

=================

What are these 'surrogate' things?
 
Í

Íéêüëáïò Êïýñáò

Know i tries the decode thing the moment the string join.

for root, dirs, files in os.walk(path):
for fullpath in files:
fullpaths.add( os.path.join(root, fullpath).decode('latin-1') )

But the /www/data/apps folder have inside them both english & greek filenames

It's clear to me now that this is a matter of encoding-decoding issue.
But encode to what and decode to what other?

How can the script encode and decode properly when it has mix of both english and weird-greek endoing filanems inside its corresponding folder?
 
M

Michael Torrie

No, brackets are all there. Just tried:

# Compute a set of current fullpaths
fullpaths = set()
path = "/home/nikos/www/data/apps/"

for root, dirs, files in os.walk(path):
for fullpath in files:
fullpaths.add( os.path.join(root, fullpath) )
print (fullpath )
print (fullpath.encode('iso-8859-7').decode('latin-1') )

^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is wrong. You are converting unicode to iso-8859-7 bytes, then
trying to convert those bytes back to unicode by pretending they are
latin-1 bytes. Even if this worked it will generate garbage.
What are these 'surrogate' things?

It means that when you tried to decode greek bytes using latin-1, there
were some invalid unicode letters created (which is expected, since the
bytes are not latin-1, they are iso-8859-7!).

If you want the browser to use a particular encoding scheme (utf-8),
then you have to print out an HTTP header before you start printing your
other HTML data:

print("Content-Type: text/html;charset=UTF-8\r\n")
print("\r\n)

print("html data goes here)
 
Í

Íéêüëáïò Êïýñáò

Ôç Ôñßôç, 4 Éïõíßïõ 2013 6:07:19 ì.ì. UTC+3, ï ÷ñÞóôçò Michael Torrie Ýãñáøå:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This is wrong. You are converting unicode to iso-8859-7 bytes, then

trying to convert those bytes back to unicode by pretending they are

latin-1 bytes. Even if this worked it will generate garbage.






It means that when you tried to decode greek bytes using latin-1, there

were some invalid unicode letters created (which is expected, since the

bytes are not latin-1, they are iso-8859-7!).



If you want the browser to use a particular encoding scheme (utf-8),

then you have to print out an HTTP header before you start printing your

other HTML data:



print("Content-Type: text/html;charset=UTF-8\r\n")

print("\r\n)



print("html data goes here)

Thanks for the clear explanation about encode and decode, i never understood it more clear.

and yes of course i know that a header must be printed before any other actual print statement. Here is how i have it:

-------------------------------------------------
print( '''Content-type: text/html; charset=utf-8\n''' )

# Compute a set of current fullpaths
fullpaths = set()
path = "/home/nikos/www/data/apps/"

for root, dirs, files in os.walk(path):
for fullpath in files:
fullpaths.add( os.path.join(root, fullpath) )


Your unicode explanation is clear but we do have to deal with file's contents but rather filenames themselves.

root@nikos [~]# ls -l /home/nikos/www/data/apps/
total 368548
drwxr-xr-x 2 nikos nikos 4096 Jun 4 14:49 ./
drwxr-xr-x 6 nikos nikos 4096 May 26 21:13 ../
-rwxr-xr-x 1 nikos nikos 13157283 Mar 17 12:57 100\ Mythoi\ tou\ Aiswpou.pdf*
-rwxr-xr-x 1 nikos nikos 29524686 Mar 11 18:17 Anekdotologio.exe*
-rw-r--r-- 1 nikos nikos 42413964 Jun 2 20:29 Battleship.exe
-rw-r--r-- 1 nikos nikos 236032 Jun 4 14:10 \323\352\335\370\357\365\ \335\355\341\355\ \341\361\351\350\354\374.exe
-rwxr-xr-x 1 nikos nikos 66896732 Mar 17 13:13 Kosmas\ o\ Aitwlos\ -\ Profiteies.pdf*
-rw-r--r-- 1 nikos nikos 51819750 Jun 2 20:04 Luxor\ Evolved.exe
-rw-r--r-- 1 nikos nikos 60571648 Jun 2 14:59 Monopoly.exe
-rw-r--r-- 1 nikos nikos 3511233 Jun 4 14:11 \305\365\367\336\ \364\357\365\ \311\347\363\357\375.mp3
-rwxr-xr-x 1 nikos nikos 1788164 Mar 14 11:31 Online\ Movie\ Player.zip*
-rw-r--r-- 1 nikos nikos 5277287 Jun 1 18:35 O\ Nomos\ tou\ Merfy\ v1-2-3..zip
-rwxr-xr-x 1 nikos nikos 16383001 Jun 22 2010 Orthodoxo\ Imerologio.exe*
-rw-r--r-- 1 nikos nikos 6084806 Jun 1 18:22 Pac-Man.exe
-rw-r--r-- 1 nikos nikos 25476584 Jun 2 19:50 Scrabble.exe
-rwxr-xr-x 1 nikos nikos 49141166 Mar 17 12:48 To\ 1o\ mou\ vivlio\ gia\ to\ skaki.pdf*
-rwxr-xr-x 1 nikos nikos 3298310 Mar 17 12:45 Vivlos\ gia\ Atheofovous.pdf*
-rw-r--r-- 1 nikos nikos 1764864 May 29 21:50 V-Radio\ v2.4.msi
root@nikos [~]#
-------------------------------------------------

As you see the subdirectory 'apps' contain both ebglish and greek lettered filenames.

Are those both unicode? Are the filenames of the actuals files also encodedas byte streams,much like the contents inside them?

if they are unicode then i really see no trouble when trying to:

cur.execute('''SELECT url FROM files WHERE url = %s''', ( fullpath, )

but his is what i'm still getting:


-------------------------------------------------

root@nikos [~]# [Tue Jun 04 19:50:16 2013] [error] [client 46.12.95.59] File "files.py", line 72
[Tue Jun 04 19:50:16 2013] [error] [client 46.12.95.59] data = cur.fetchone() #URL is unique, so should only be one
[Tue Jun 04 19:50:16 2013] [error] [client 46.12.95.59] ^
[Tue Jun 04 19:50:16 2013] [error] [client 46.12.95.59] SyntaxError: invalid syntax
[Tue Jun 04 19:50:16 2013] [error] [client 46.12.95.59] Premature end of script headers: files.py
[Tue Jun 04 19:50:16 2013] [error] [client 46.12.95.59] File does not exist: /home/nikos/public_html/500.shtml
-------------------------------------------------

What is the problem in your opinion Michael since verythign is encoded in utf-8?

why the cur.execute fail?
cur.execute('''SELECT url FROM files WHERE url = %s''', ( fullpath, )
data = cur.fetchone() #URL is unique, so should only be one
 
Í

Íéêüëáïò Êïýñáò

What on eart is this damn error: Michael tried to explain to me about surrogates but dont think i understand it.

Encoding giving me trouble years now.

[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] Original exception was:
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] Traceback (most recent call last):
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] File "files.py", line 72, in <module>
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] cur.execute('''SELECT url FROM files WHERE url = %s''', (fullpath,) )
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] File "/usr/local/lib/python3.3/site-packages/PyMySQL3-0.5-py3.3.egg/pymysql/cursors.py", line 108, in execute
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] query = query.encode(charset)
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] UnicodeEncodeError: 'utf-8' codec can't encode character '\\udcd3' in position 61: surrogates not allowed



PLEASE TELL EM WHAT TO TRY, PLEASE FOR THE LOVE OF GOD, IAM SO FRUSTRATED NOT BEING ABLE TO DEAL WITH THIS.
 
C

Chris “Kwpolska†Warrick

What on eart is this damn error: Michael tried to explain to me about surrogates but dont think i understand it.

Encoding giving me trouble years now.

[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] Original exception was:
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] Traceback (most recent call last):
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] File "files.py", line 72, in <module>
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] cur.execute('''SELECT url FROM files WHERE url = %s''', (fullpath,) )
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] File "/usr/local/lib/python3.3/site-packages/PyMySQL3-0.5-py3.3.egg/pymysql/cursors.py", line 108, in execute
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] query = query.encode(charset)
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] UnicodeEncodeError: 'utf-8' codec can't encode character '\\udcd3' in position 61: surrogates not allowed



PLEASE TELL EM WHAT TO TRY, PLEASE FOR THE LOVE OF GOD, IAM SO FRUSTRATEDNOT BEING ABLE TO DEAL WITH THIS.

1. Try re-naming the files to real utf-8. Make sure your terminal
works on UTF-8 characters.
2. Get rid of your bullshit system and use Flask or Pyramid. It will
make your life much easier.
3. Put the files in a directory on your server and tell Apache to
create an index, making your life easier, but not as easy as it would
if you did (2) and anywhere near as easy as (2) and (3) combined.
 
L

Lele Gaifax

Îικόλαος ΚοÏÏας said:
root@nikos [~]# [Tue Jun 04 19:50:16 2013] [error] [client 46.12.95.59] File "files.py", line 72
[Tue Jun 04 19:50:16 2013] [error] [client 46.12.95.59] data = cur.fetchone() #URL is unique, so should only be one
[Tue Jun 04 19:50:16 2013] [error] [client 46.12.95.59] ^
[Tue Jun 04 19:50:16 2013] [error] [client 46.12.95.59] SyntaxError: invalid syntax

Some kind soul already said you the reason. What follows is the longest
way I could think to spot your error:
from collections import Counter
stmt = "cur.execute('''SELECT url FROM files WHERE url = %s''', ( fullpath, )"
chars_count = Counter(stmt)
print("Number of '(': %d" % chars_count['('])
print("Number of ')': %d" % chars_count[')'])
Number of '(': 2
Number of ')': 1

ciao, lele.
 
Í

Íéêüëáïò Êïýñáò

Ôç Ôñßôç, 4 Éïõíßïõ 2013 8:53:38 ì.ì. UTC+3, ï ÷ñÞóôçò Chris "Kwpolska" Warrick Ýãñáøå:
What on eart is this damn error: Michael tried to explain to me about surrogates but dont think i understand it.
Encoding giving me trouble years now.
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] Original exception was:
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] Traceback (mostrecent call last):
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] File "files.py", line 72, in <module>
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] cur.execute('''SELECT url FROM files WHERE url = %s''', (fullpath,) )
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] File "/usr/local/lib/python3.3/site-packages/PyMySQL3-0.5-py3.3.egg/pymysql/cursors.py",line 108, in execute
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] query = query.encode(charset)
[Tue Jun 04 20:19:53 2013] [error] [client 46.12.95.59] UnicodeEncodeError: 'utf-8' codec can't encode character '\\udcd3' in position 61: surrogates not allowed
PLEASE TELL EM WHAT TO TRY, PLEASE FOR THE LOVE OF GOD, IAM SO FRUSTRATED NOT BEING ABLE TO DEAL WITH THIS.

http://mail.python.org/mailman/listinfo/python-list



1. Try re-naming the files to real utf-8. Make sure your terminal

works on UTF-8 characters.

2. Get rid of your bullshit system and use Flask or Pyramid. It will

make your life much easier.

3. Put the files in a directory on your server and tell Apache to

create an index, making your life easier, but not as easy as it would

if you did (2) and anywhere near as easy as (2) and (3) combined.

--

Kwpolska <http://kwpolska.tk> | GPG KEY: 5EAAEA16

stop html mail | always bottom-post

http://asciiribbon.org | http://caliburn.nl/topposting.html


1. local is set to utf-8 and the renaming from english to greek happnend inthe webhost.

2. No idea wht is flask or pyramid or wsgi

3. Files are located in '/home/nikos/www/data/apps' and they appear in browser direcory listing. Create an index.html you mean?
 
Í

Íéêüëáïò Êïýñáò

Ôç Ôñßôç, 4 Éïõíßïõ 2013 9:18:29 ì.ì. UTC+3, ï ÷ñÞóôçò Lele Gaifax Ýãñáøå:
from collections import Counter
stmt = "cur.execute('''SELECT url FROM files WHERE url = %s''', (fullpath, )"
chars_count = Counter(stmt)
print("Number of '(': %d" % chars_count['('])
print("Number of ')': %d" % chars_count[')'])
Number of '(': 2
Number of ')': 1


Hello Lele, you have proven helpfull many times lets hope once more:

# Compute a set of current fullpaths
fullpaths = set()
path = "/home/nikos/www/data/apps/"

for root, dirs, files in os.walk(path):
for fullpath in files:
fullpaths.add( os.path.join(root, fullpath) )


stmt = "cur.execute('''SELECT url FROM files WHERE url = %s''', ( fullpath, )"
chars_count = Counter(stmt)
print("Number of '(': %d" % chars_count['('])
print("Number of ')': %d" % chars_count[')'])

sys.exit(0)

outputs this:


http://superhost.gr/cgi-bin/files.py

I dont even understand what that means though.
 
Í

Íéêüëáïò Êïýñáò

Ôç Ôñßôç, 4 Éïõíßïõ 2013 9:45:05 ì.ì. UTC+3, ï ÷ñÞóôçò Chris "Kwpolska" Warrick Ýãñáøå:
If they do, they are already indexed. Now link stuff to that

directory instead of your fancy files.py thing.

No. The file grabbing must be from withing 'files.py' so database inserts happen and counter addition takes place.

Also, it is a good torchering to me to leatn about this damn encoding issues iam having trouble to all my scripr syear now.
 
L

Lele Gaifax

Îικόλαος ΚοÏÏας said:
Τη ΤÏίτη, 4 Ιουνίου 2013 9:18:29 μ.μ. UTC+3, ο χÏήστης Lele Gaifax έγÏαψε:
from collections import Counter
stmt = "cur.execute('''SELECT url FROM files WHERE url = %s''', ( fullpath, )"
chars_count = Counter(stmt)
print("Number of '(': %d" % chars_count['('])
print("Number of ')': %d" % chars_count[')'])
Number of '(': 2
Number of ')': 1


Hello Lele, you have proven helpfull many times lets hope once more:

With due respect, you need to *improve* your ability to *understand*
what people answer to your questions, otherwise it is a double (at a
minimum) waste of time.

The code above was my (failed) attempt to focus your attention on why
one of your scripts raised a SyntaxError: translating that code in plain
english, that line (the "stmt" variable above) contains *two* open
brackets, and *one* close bracket.

ciao, lele.
 
Í

Íéêüëáïò Êïýñáò

Lele the output of:

stmt = "cur.execute('''SELECT url FROM files WHERE url = %s''', ( fullpath, )"
chars_count = Counter(stmt)
print("Number of '(': %d" % chars_count['('])
print("Number of ')': %d" % chars_count[')'])

is:

Number of '(': 2 Number of ')': 1

What do you make out of this please?
 
Í

Íéêüëáïò Êïýñáò

UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc5' in position >61: surrogates not allowed

This indicates that i'am reading the filenames in a different encoding than what they actually are? What is i try to use bytes for path specifications, and have Python decode them in 'utf-8' ?

fullpaths.add( os.path.join(root, fullpath).encode('utf-8') )

Will this work? As Michael said encoding is a process which you take unicode characters and conver them to bytestream using some charset(utf8 here)

Will this work?
 
F

Fábio Santos

Lele the output of:

stmt = "cur.execute('''SELECT url FROM files WHERE url = %s''', ( fullpath, )"
chars_count = Counter(stmt)
print("Number of '(': %d" % chars_count['('])
print("Number of ')': %d" % chars_count[')'])

is:

Number of '(': 2 Number of ')': 1

What do you make out of this please?

He couldn't have been more obvious. You are missing a closing parenthesis.

http://xkcd.com/859/
 
C

Chris Angelico

This indicates that i'am reading the filenames in a different encoding than what they actually are? What is i try to use bytes for path specifications, and have Python decode them in 'utf-8' ?

fullpaths.add( os.path.join(root, fullpath).encode('utf-8') )

For some reason you have an invalid Unicode codepoint in your string. Fix that.

ChrisA
 
A

alex23

Lele the output of:

stmt = "cur.execute('''SELECT url FROM files WHERE url = %s''', ( fullpath, )"
chars_count = Counter(stmt)
print("Number of '(': %d" % chars_count['('])
print("Number of ')': %d" % chars_count[')'])

is:

Number of '(': 2 Number of ')': 1

What do you make out of this please?

Just a reminder to everyone that the OP originally went by the name of
Ferrous Cranus:
http://redwing.hutman.net/~mreed/warriorshtm/ferouscranus.htm

He's told there's a missing parenthesis, he dismisses the claim. He's
given code that demonstrates the missing parenthesis, and he acts
confused. The list is rapidly becoming his support group for _his
business_, and the bulk of it has very little to do with Python
itself.

I've been struggling for a month to get an inheritance chain working
with fresnel lenses, should I be posting every single bug I hit here
every 10 minutes then bump them 10 minutes later when no one responds?
Is that what the list is for now? We don't do people's home work for
them, so why are we doing his _work_ for him?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,138
Messages
2,570,803
Members
47,348
Latest member
nethues

Latest Threads

Top