Ruby 1.9 still cannot list all files on Vista or XP?

  • Thread starter SpringFlowers AutumnMoon
  • Start date
S

SpringFlowers AutumnMoon

I just tried using Ruby 1.9 and it seemed that it still cannot list all
files in a folder on XP or Vista when the filenames contain Chinese
characters, Japanese characters, or any foreign characters other than
English.

These two methods are used: entries and glob

files = Dir.new(basedir).entries

Dir.chdir(basedir)
files = Dir.glob("*");

both methods show ????????.txt when the filename has foreign
characters. Can Ruby 1.9 readily handle this task rather than resorting
to Win32API? Thanks.
 
B

Bosko Ivanisevic

I just tried using Ruby 1.9 and it seemed that it still cannot list all
files in a folder on XP or Vista when the filenames contain Chinese
characters, Japanese characters, or any foreign characters other than
English.

These two methods are used:   entries and glob

files = Dir.new(basedir).entries

Dir.chdir(basedir)
files = Dir.glob("*");

both methods show ????????.txt   when the filename has foreign
characters.  Can Ruby 1.9 readily handle this task rather than resorting
to Win32API?  Thanks.

What is code page of your command prompt? When ??????.txt is shown in
the console on Windows it is usually problem of code page settings.
 
R

Ryan Davis

both methods show ????????.txt when the filename has foreign
characters. Can Ruby 1.9 readily handle this task rather than =20
resorting
to Win32API? Thanks.

show where? on what? what text encodings does it handle? what text =20
encodings did you set ruby up for?

On OSX:

% cd x
% touch =E2=98=83
% ls
=E2=98=83
% ruby -e 'p Dir["*"]'
["\342\230\203"]
% ruby -KU -e 'p Dir["*"]'
["=E2=98=83"]
% ~/.multiruby/install/1.9.1-p0/bin/ruby -e 'p Dir["*"]'
["=E2=98=83"]

You've got 2 sides to this equation, ruby's encodings, and your =20
environment's encodings.
 
H

Heesob Park

2009/4/8 Ryan Davis said:
both methods show ????????.txt =C2=A0 when the filename has foreign
characters. =C2=A0Can Ruby 1.9 readily handle this task rather than reso= rting
to Win32API? =C2=A0Thanks.

show where? on what? what text encodings does it handle? what text encodi= ngs
did you set ruby up for?

On OSX:

% cd x
% touch =E2=98=83
% ls
=E2=98=83
% ruby -e 'p Dir["*"]'
["\342\230\203"]
% ruby -KU -e 'p Dir["*"]'
["=E2=98=83"]
% ~/.multiruby/install/1.9.1-p0/bin/ruby -e 'p Dir["*"]'
["=E2=98=83"]

You've got 2 sides to this equation, ruby's encodings, and your
environment's encodings.
This is Windows specific issue.
Refer to the OP's original posting http://www.ruby-forum.com/topic/163681

As far as I know, this issue is not fixed in ruby 1.9.1

Regards,

Park Heesbob
 
S

SpringFlowers AutumnMoon

Ryan said:
then his email doesn't belong here, it should go to ruby-core@

then can somebody file it in ruby-core... maybe as a bug or improvement?
for my love of Ruby... i'd like to see it work fine on Windows XP or
Vista... it is the year 2009... and we are a long way into unicode and
i18n issues... if Ruby cannot handle listing of files properly in its
latest version for Windows which is probably the most popular OS...
then... please can it be made to work well?
 
S

SpringFlowers AutumnMoon

Bill said:
From: "Heesob Park said:
As far as I know, this issue is not fixed in ruby 1.9.1

Hmm. If I have correctly understood matz in [ruby-core:20110] ,
Unicode path support for windows was supposed to be fixed:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/20110

"In short, if you're using UTF-8 for your program encoding, you
should not see any problem (if you do, it's a bug)."

is it by
# coding: utf-8
or
# encoding: utf-8

? are those for specifying that the current program file is in UTF8 ?
 
S

SpringFlowers AutumnMoon

SpringFlowers said:
Bill said:
From: "Heesob Park said:
As far as I know, this issue is not fixed in ruby 1.9.1

Hmm. If I have correctly understood matz in [ruby-core:20110] ,
Unicode path support for windows was supposed to be fixed:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/20110

"In short, if you're using UTF-8 for your program encoding, you
should not see any problem (if you do, it's a bug)."

is it by
# coding: utf-8
or
# encoding: utf-8

? are those for specifying that the current program file is in UTF8 ?

does someone know how to solve this? to make some file have
international characters, it is really simple: can go to Google News
and look at news from China or Taiwan or Hong Kong, and then copy and
paste the text into a filename on Windows XP or Vista. thanks.
 
B

Bill Kelly

From: "SpringFlowers AutumnMoon said:
does someone know how to solve this? to make some file have
international characters, it is really simple: can go to Google News
and look at news from China or Taiwan or Hong Kong, and then copy and
paste the text into a filename on Windows XP or Vista. thanks.

Sorry, I haven't made the time to experiment with ruby1.9 much yet.
(Even though I am interested in this feature.)

Here are a couple threads from ruby-core that show examples of
using the # encoding: UTF-8 tag.

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/23119

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/22784


(Warning: It looks like the ruby-core mailing list archive software itself
doesn't handle the encoding, and so the messages are displayed with
bogus remnants of the quoted-printable syntax left over like =3D and
=20. )


But anyway... As I understand it, before you paste characters into
your editor, you'll need to make sure your editor is using UTF-8
encoding for the file you're editing. And put the #encoding: UTF-8
tag at the top of the file.


Hope this helps,

Bill
 
S

SpringFlowers AutumnMoon

Bill said:
But anyway... As I understand it, before you paste characters into
your editor, you'll need to make sure your editor is using UTF-8
encoding for the file you're editing. And put the #encoding: UTF-8
tag at the top of the file.

ah it is not really about using UTF-8 in my program file... it is about
getting UTF-8 file listing on Vista and XP.
 
B

Bill Kelly

From: "SpringFlowers AutumnMoon said:
ah it is not really about using UTF-8 in my program file... it is about
getting UTF-8 file listing on Vista and XP.

Oh. When you said:
to make some file have
international characters, it is really simple: can go to Google News
and look at news from China or Taiwan or Hong Kong, and then copy and
paste the text into a filename on Windows XP or Vista.

...I misunderstood and thought you meant pasting the characters into
your ruby source file. (I see now you were talking about a filename.)


Well OK - so I built the latest from the ruby 1.9.1 branch in subversion,
and attempted to have ruby read a directory containing a filename
with chinese characters, and then open and read the contents of the
file...

My script was:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (win32_unicode.rb) ~~~
# encoding: UTF-8

files = Dir["T:/zz/*.txt"]

x = files.first
p x, x.encoding

dat = open(x, "r:UTF-8") {|f| f.read}

p dat, dat.encoding
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The result was:

ruby19 win32_unicode.rb
"T:/zz/???????.txt"
#<Encoding:UTF-8>
win32_unicode.rb:8:in `initialize': Invalid argument - T:/zz/???????.txt (Errno::EINVAL)
from win32_unicode.rb:8:in `open'
from win32_unicode.rb:8:in `<main>'

I also tried with the -U flag and -E UTF-8 flag:

ruby19 -E UTF-8 win32_unicode.rb
"T:/zz/???????.txt"
#<Encoding:UTF-8>
win32_unicode.rb:8:in `initialize': Invalid argument - T:/zz/???????.txt (Errno::EINVAL)
from win32_unicode.rb:8:in `open'
from win32_unicode.rb:8:in `<main>'

ruby19 -U win32_unicode.rb
"T:/zz/???????.txt"
#<Encoding:UTF-8>
win32_unicode.rb:8:in `initialize': Invalid argument - T:/zz/???????.txt (Errno::EINVAL)
from win32_unicode.rb:8:in `open'
from win32_unicode.rb:8:in `<main>'

ruby19 -v
ruby 1.9.1p0 (2009-03-04) [i386-mswin32_71]


Note, it doesn't bother me that the filename displays as ???????.txt in the
command window, but rather the issue that ruby seems unable to open
a filename it just obtained via Dir[].

So, unless I have bungled my test somehow, it seems likely there is a
problem.

If so, as Ryan pointed out, we should move this to the ruby-core list.


Regards,

Bill
 
S

SpringFlowers AutumnMoon

Bill said:
Note, it doesn't bother me that the filename displays as ???????.txt in
the
command window, but rather the issue that ruby seems unable to open
a filename it just obtained via Dir[].

So, unless I have bungled my test somehow, it seems likely there is a
problem.

If so, as Ryan pointed out, we should move this to the ruby-core list.


yeah, it looks like the file name is actually stored as ???????.txt, not
just when printed out.

Mr. Park Heesbob had a solution but it involved using Win32. If Ruby
1.9 can handle it without using Win32 that'd be great.
 
S

SpringFlowers AutumnMoon

SpringFlowers said:
Bill said:
Note, it doesn't bother me that the filename displays as ???????.txt in
the
command window, but rather the issue that ruby seems unable to open
a filename it just obtained via Dir[].

So, unless I have bungled my test somehow, it seems likely there is a
problem.

If so, as Ryan pointed out, we should move this to the ruby-core list.


yeah, it looks like the file name is actually stored as ???????.txt, not
just when printed out.

Mr. Park Heesbob had a solution but it involved using Win32. If Ruby
1.9 can handle it without using Win32 that'd be great.


actually... the solution that Park posted involved

files = `cmd /u /c dir /b `.split("\r\000\n\000")
which is to execute a system exe... gee...

the complete solution at:
http://www.ruby-forum.com/topic/163681

(need to use the line above and some Win32API calls)
 
S

SpringFlowers AutumnMoon

I wonder if people use Ruby in Japan or France, how is the characters
handled on Win XP or Vista?

For example, to write a script that will look at all files and if the
file name contains a word in Japanese or French, then back it up to
another hard disk. Even this task is not possible?
 
S

SpringFlowers AutumnMoon

Bosko said:
What is code page of your command prompt? When ??????.txt is shown in
the console on Windows it is usually problem of code page settings.

i actually used each_byte to dump out the bytes in the file name...
they actually show the ASCII of the question mark... so the string got
back really is question mark, not related to the command prompt.
 
S

SpringFlowers AutumnMoon

Mr. Park Heesbob, I wonder if you are actually changing Ruby 1.9 so that
it will handle it? I see you file a bug on Ruby Core related to this.
thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top