Ruby 1.9.2 with Rails 3.0.1

B

Bharat Ruparel

Is there a problem with Ruby 1.9.2 p0 being able to process unicode
strings? We have an application that is using the unicode characters.
Ruby 1.8.7 has no problems with it whereas ruby 1.9.2 refuses to process
them both with SQLite3 and Postgres databases in a Rails 3.0.1
application?

Thanks.

Bharat
 
A

Ammar Ali

Is there a problem with Ruby 1.9.2 p0 being able to process unicode
strings? =C2=A0We have an application that is using the unicode character= s.
Ruby 1.8.7 has no problems with it whereas ruby 1.9.2 refuses to process
them both with SQLite3 and Postgres databases in a Rails 3.0.1
application?

What do you mean by "refuses to process them"? Are you seeing
mojibake? Or nothing at all?

Some questions come to mind:
- Is the DB connection set to use utf-8, in the case of Postgres? Not
sure how this is set for sqlite, but presume there is a way.
- Is your environment somehow using an encoding besides utf8?

Regards,
Ammar
 
B

Bharat Ruparel

Sorry about that. Was tired towards the end of the day and therefore a =

not so bright post. I owe it to the group to clear things up.

To make a long story short, it was the character encoding problem with =

ruby 1.9.2.

The following is a snippet of code from seeds.rb file

courses =3D [ {:title =3D> 'Principles of Good Cooking 1', :course_code =3D=

'PGC1',
:lessons =3D> [{:title =3D> 'Getting Started'},
{:title =3D> 'Saut=E2=88=9A=C2=A9ing',
:topics =3D> [ {:tag =3D> "Lecture", :title =3D> "Introduction to =

Saut=E2=88=9A=C2=A9ing",
:pages =3D> [ {:title =3D> "Video Lecture" }] },
{:tag =3D> "Quiz", :title =3D> "Test Your Saut=E2=88=9A=C2=A9ing =
IQ",
:pages =3D> [ {:title =3D> "Questions" }] },
{:tag =3D> "Taste Test", :title =3D> "Cooking With Wine",
:pages =3D> [ {:title =3D> "Introduction"},
{:title =3D> "Instructions"},
{:title =3D> "Taste Wine"},
{:title =3D> "Reduce Wine"},
{:title =3D> "Taste Reduced Wine"},
{:title =3D> "Your Results" },

See that 'Saut=E2=88=9A=C2=A9ing', string?

That is Sauteing with funny symbols over e for french. That was causing =

the

rake db:seed command to fail (throw exception) as follows:

bruparel:~/school
=E2=86=92 rake db:seed
(in /Users/bruparel/school)
rake aborted!
/Users/bruparel/school/db/seeds.rb:3: invalid multibyte char (US-ASCII)
/Users/bruparel/school/db/seeds.rb:3: invalid multibyte char (US-ASCII)
/Users/bruparel/school/db/seeds.rb:3: syntax error, unexpected $end, =

expecting '}'
{:title =3D> 'Saut=C3=A9ing',
^

The solution was to put the following line at the top of this file =

(seeds.rb)

# encoding: utf-8

Now rake db:seed ran fine and indeed populated the tables. I could see =

the correct character encoding in the databases (both SQLite3 and =

Postgres) but the display was coming out with plain "Sauteing" instead =

of the French rendition of "e", that was because of the following line =

in database.yml file.

development:
adapter: sqlite3
pool: 5
timeout: 5000
encoding: utf8 <--- because of this
database: db/atk_school_development

Instead it should be as follows:

development:
adapter: sqlite3
pool: 5
timeout: 5000
encoding: unicode <--- this works
database: db/atk_school_development

If someone can articulate some simple rules for character encoding in =

Ruby 1.9.2 p0 and Rails 3.0.1 environment, that will be quite useful.

Thanks.

Bharat

-- =

Posted via http://www.ruby-forum.com/.=
 
L

Luis Lavena

Sorry about that.  Was tired towards the end of the day and therefore a
not so bright post.  I owe it to the group to clear things up.

To make a long story short, it was the character encoding problem with
ruby 1.9.2.

The following is a snippet of code from seeds.rb file

courses = [ {:title => 'Principles of Good Cooking 1', :course_code =>
'PGC1',
  :lessons => [{:title => 'Getting Started'},
    {:title => 'Saut√©ing',
      :topics => [ {:tag => "Lecture", :title => "Introduction to
Sautéing",
          :pages => [ {:title => "Video Lecture" }] },

See that 'Sautéing', string?

That is Sauteing with funny symbols over e for french.  That was causing
the

rake db:seed command to fail (throw exception) as follows:

bruparel:~/school
→ rake db:seed
(in /Users/bruparel/school)
rake aborted!
/Users/bruparel/school/db/seeds.rb:3: invalid multibyte char (US-ASCII)
/Users/bruparel/school/db/seeds.rb:3: invalid multibyte char (US-ASCII)
/Users/bruparel/school/db/seeds.rb:3: syntax error, unexpected $end,
expecting '}'
    {:title => 'Sautéing',
                      ^

The solution was to put the following line at the top of this file
(seeds.rb)

# encoding: utf-8

If someone can articulate some simple rules for character encoding in
Ruby 1.9.2 p0 and Rails 3.0.1 environment, that will be quite useful.

Ruby interprets each file 'encoding' or magical comments to decide
which encoding is going to use for that particular file.

If the file lacks encoding it assumes the one provided by
Encoding.default_external, which in your case seems US-ASCII.

sqlite3-ruby, since 1.3.0 is quite aware of character encoding and
should work properly.

If Rails is not doing the right thing, that is another question.

You can double check that doing:

ActiveRecord::Base.connection.execute 'PRAGMA encoding'

That can tell you which encoding SQLite3 was open.

Further than that and about Rails specific issues, ask Rails-Talk:

http://groups.google.com/group/rubyonrails-talk
 
B

Brian Candler

Luis Lavena wrote in post #959207:
Ruby interprets each file 'encoding' or magical comments to decide
which encoding is going to use for that particular file.

If the file lacks encoding it assumes the one provided by
Encoding.default_external, which in your case seems US-ASCII.

That answer is wrong - but I don't blame you for giving a wrong answer,
since the whole encoding nonsense in ruby 1.9 is ridiculously
complicated.

The correct answer is: the encoding of a ruby 1.9 source file (and hence
the String literals within that file) is *always* US-ASCII, unless you
tag it with a #encoding line which says otherwise.

I have so far collected about 200 rules for how encodings work in ruby
1.9: https://github.com/candlerb/string19/blob/master/string19.rb

Unfortunately, this list is just the tip of the iceberg. To be complete,
it would have to describe the encoding-related behaviour of every method
on String, every method which accepts a String, and every method which
returns a String.

Regards,

Brian.
 
B

Bharat Ruparel

Hello Brian,
You wrote:
"The correct answer is: the encoding of a ruby 1.9 source file (and
hence
the String literals within that file) is *always* US-ASCII, unless you
tag it with a #encoding line which says otherwise."

This works for me and is consistent with my observation. Rails does set
a default encoding in one of the files config/application.rb as shown
below:

# configure the defaulting encoding used in templates for Ruby 1.9
config.encoding = "utf-8"

It seems like the seeds.rb file which is conventionally used to
initialize data is unaware of this setting. Further, it seems like that
is not what the Rails team intended.
Regards,
Bharat
 
B

Brian Candler

Bharat Ruparel wrote in post #959309:
This works for me and is consistent with my observation. Rails does set
a default encoding in one of the files config/application.rb as shown
below:

# configure the defaulting encoding used in templates for Ruby 1.9
config.encoding = "utf-8"

It seems like the seeds.rb file which is conventionally used to
initialize data is unaware of this setting. Further, it seems like that
is not what the Rails team intended.

As the comment says, that setting is used for templates, but seeds.rb is
ruby source code.

When you read a ruby 1.9 source file using load() or require(), then the
encoding is always forced to US-ASCII unless you tag it with a
#encoding. That is actually a sane default - imagine what would happen
if the same source file were parsed differently depending on what system
it ran on (*).

It gets more complex if instead of using load() or require(), you read
the file into a String and then eval() that String. In that case, the
encoding of the String is used as the source encoding, unless overridden
by a #encoding line.

Regards,

Brian.

(*) However, the same program may still behave differently on different
systems, even if parsed identically. This is because the default is to
allow the environment to decide the encoding of data files. You need to
explicitly override this if you want your program to behave in a sane
fashion, and that's what Rails is doing: whenever it reads a template,
it applies its own config.encoding setting instead of letting Ruby pick
an (essentially arbitrary) encoding.
 
L

Luis Lavena

Luis Lavena wrote in post #959207:



That answer is wrong - but I don't blame you for giving a wrong answer,
since the whole encoding nonsense in ruby 1.9 is ridiculously
complicated.

Thank you Brian for correcting me. Encoding has always been in my TODO
list.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top