Problem with Base64 decoding

A

alexander

hi there,
last time i accidentely posted this question as a reply to another one..
i´m really sorry for that. i will not make that mistake again.

so here´s my question again in a fresh new thread :)

i´m having a small problem with base64 decoding a string.
i´m porting a php script over to ruby and the decoding gives me
different results in ruby and in php. the problem is that the php
results works for the processing i do afterwards while the ruby version
doesn´t.
here´s the scripts in question:

php:
<?

$bytes = file_get_contents("test.rgb");
$bitmap = base64_decode($bytes);

$header = "";
$header .= "\xFF\xFE";
$header .= pack("n2",120,97);
$header .= "\x01";
$header .= "\xFF\xFF\xFF\xFF";

$header .= $bitmap;

file_put_contents("test_php.gd",$header);
?>

ruby:
require 'rubygems'
require 'fileutils'
require 'base64'

all_bytes = Base64.decode64(IO.read("test.rgb"))

bitmap = "\xFF\xFE"
bitmap << [120,97].pack("n2")
bitmap << "\x01"
bitmap << "\xFF\xFF\xFF\xFF"
bitmap << all_bytes

File.new("test_ruby.gd","w").puts(bitmap)

the ruby version is one byte shorter.

i´m probably missing something rather obvious here, but any pointers to
how i can make the ruby output be like the php output would be greatly
appreciated :)

i´ve uploaded the test.rgb file i´m using to here:

http://rss.fork.de/test.rgb if that´s even needed :)

thanks a lot,

alexander
 
J

Jan Svitok

hi there,
last time i accidentely posted this question as a reply to another one..
i=B4m really sorry for that. i will not make that mistake again.

so here=B4s my question again in a fresh new thread :)

i=B4m having a small problem with base64 decoding a string.
i=B4m porting a php script over to ruby and the decoding gives me
different results in ruby and in php. the problem is that the php
results works for the processing i do afterwards while the ruby version
doesn=B4t.
here=B4s the scripts in question:

php:
<?

$bytes =3D file_get_contents("test.rgb");
$bitmap =3D base64_decode($bytes);

$header =3D "";
$header .=3D "\xFF\xFE";
$header .=3D pack("n2",120,97);
$header .=3D "\x01";
$header .=3D "\xFF\xFF\xFF\xFF";

$header .=3D $bitmap;

file_put_contents("test_php.gd",$header);
?>

ruby:
require 'rubygems'
require 'fileutils'
require 'base64'

all_bytes =3D Base64.decode64(IO.read("test.rgb"))

bitmap =3D "\xFF\xFE"
bitmap << [120,97].pack("n2")
bitmap << "\x01"
bitmap << "\xFF\xFF\xFF\xFF"
bitmap << all_bytes

File.new("test_ruby.gd","w").puts(bitmap)

the ruby version is one byte shorter.

i=B4m probably missing something rather obvious here, but any pointers to
how i can make the ruby output be like the php output would be greatly
appreciated :)

i=B4ve uploaded the test.rgb file i=B4m using to here:

http://rss.fork.de/test.rgb if that=B4s even needed :)

Hi,

1. have a look at the differences in those two files. By that you
should be able to tell where's the problem: either in the decoding
part or in the assembling.

2. you are using puts that appends a newline, so it seems to me that
ruby version is one byte LONGER. if that's the problem, replace puts
with write.

3. File.open("test_ruby.gd","w") {|f| f.puts(bitmap) } should be
safer, as it doesn't rely on garbage collector for closing the file,
it is closed immediately after the block finishes. This will be
helpful when you'll work with large number of files (and you'll run
out of free descriptors)

4. I guess you don't need rubygems nor fileutils for this to work
(that's ok if you use them for some other code not posted)
 
A

alexander

hi there,
thank you for your tips!

and indeed using write instead of puts i atleast got the filesize right.
sadly everything else is still wrong.

i think the problem is definitely in the decoding, but i don´t even know
where to start there since the resulting files vary to a great degree.
inspecting a hexdump of both decoded files show that they are not even
remotely the same.

right now i use a php script that i call with system. it´s kind of an
ugly solution, but at least it works ;)

i will keep trying though to get a 100% ruby solution to this problem.

kind regards and thanks again,

alexander


Jan said:
hi there,
last time i accidentely posted this question as a reply to another one..
i´m really sorry for that. i will not make that mistake again.

so here´s my question again in a fresh new thread :)

i´m having a small problem with base64 decoding a string.
i´m porting a php script over to ruby and the decoding gives me
different results in ruby and in php. the problem is that the php
results works for the processing i do afterwards while the ruby version
doesn´t.
here´s the scripts in question:

php:
<?

$bytes = file_get_contents("test.rgb");
$bitmap = base64_decode($bytes);

$header = "";
$header .= "\xFF\xFE";
$header .= pack("n2",120,97);
$header .= "\x01";
$header .= "\xFF\xFF\xFF\xFF";

$header .= $bitmap;

file_put_contents("test_php.gd",$header);
?>

ruby:
require 'rubygems'
require 'fileutils'
require 'base64'

all_bytes = Base64.decode64(IO.read("test.rgb"))

bitmap = "\xFF\xFE"
bitmap << [120,97].pack("n2")
bitmap << "\x01"
bitmap << "\xFF\xFF\xFF\xFF"
bitmap << all_bytes

File.new("test_ruby.gd","w").puts(bitmap)

the ruby version is one byte shorter.

i´m probably missing something rather obvious here, but any pointers to
how i can make the ruby output be like the php output would be greatly
appreciated :)

i´ve uploaded the test.rgb file i´m using to here:

http://rss.fork.de/test.rgb if that´s even needed :)


Hi,

1. have a look at the differences in those two files. By that you
should be able to tell where's the problem: either in the decoding
part or in the assembling.

2. you are using puts that appends a newline, so it seems to me that
ruby version is one byte LONGER. if that's the problem, replace puts
with write.

3. File.open("test_ruby.gd","w") {|f| f.puts(bitmap) } should be
safer, as it doesn't rely on garbage collector for closing the file,
it is closed immediately after the block finishes. This will be
helpful when you'll work with large number of files (and you'll run
out of free descriptors)

4. I guess you don't need rubygems nor fileutils for this to work
(that's ok if you use them for some other code not posted)
 
J

Jan Svitok

hi there,
thank you for your tips!

and indeed using write instead of puts i atleast got the filesize right.
sadly everything else is still wrong.

i think the problem is definitely in the decoding, but i don=B4t even kno= w
where to start there since the resulting files vary to a great degree.
inspecting a hexdump of both decoded files show that they are not even
remotely the same.

If you post your code along with expected and actual output (e.g.
those hexdumps), perhaps somebody will have a look... just post as
short data file as possible (meaning that it still decodes wrong).
That reminds me: did you try decoding an empty file?
 
B

Brian Candler

thank you for your tips!

and indeed using write instead of puts i atleast got the filesize right.
sadly everything else is still wrong.

i think the problem is definitely in the decoding, but i don´t even know
where to start there since the resulting files vary to a great degree.

Firstly, use hexdump -C on both the output files.

If they both start with FF FE 00 78 00 61 01 FF FF FF FF
then you know that the headers are right and it's the base64-decoded bit
which is wrong.

BTW there's a built-in alternative:

all_bytes = IO.read("test.rgb").unpack("m")[0]

But on your test file they give the same results.

If this is a Windows platform, use "wb" instead of "w". However you say that
now you're using write instead of puts, the files are the same size anyway.

I can see two issues with that file:

(1) It has no line breaks, but I don't think that matters.

(2) It starts with the three-byte sequence ef bb bf, which is a unicode
<FEFF> character according to my editor.

Stripping this off gives a completely different answer to the base64
decoding:

irb(main):027:0> a=IO.read("test.rgb"); nil
=> nil
irb(main):028:0> b=a.unpack("m")[0]; b.size
=> 46560
irb(main):029:0> c=a[3..-1].unpack("m")[0]; c.size
=> 46560
irb(main):030:0> b[0..5]
=> "\304\000\000={u"
irb(main):031:0> c[0..5]
=> "\000\365\355\326\000\342"

and perhaps this second one is the answer you're looking for.

If so, I would say that unpack("m") is badly broken. Either it should give
an exception when presented with characters outside of the base64 set, or it
should ignore them. According to RFC 2045 section 6.8,

The encoded output stream must be represented in lines of no more
than 76 characters each. All line breaks or other characters not
found in Table 1 must be ignored by decoding software. In base64
data, characters other than those in Table 1, line breaks, and other
white space probably indicate a transmission error, about which a
warning message or even a message rejection might be appropriate
under some circumstances.

I would consider the unicode BOM as "white space", but in any case it must
either be ignored or cause a warning or error; it must not cause the data to
be decoded wrongly!

BTW, I did the above test under ruby 1.8.4 (2005-12-24) [i486-linux] from
Ubuntu 6.06. It's possible that it has been fixed in a later version.

HTH,

Brian.
 
B

Brian Candler

Here's a more concise summary of the bug.

irb(main):001:0> RUBY_VERSION
=> "1.8.4"
irb(main):002:0> a = "b2s="
=> "b2s="
irb(main):003:0> b = "\xef\xbb\xbf" + a
=> "\357\273\277b2s="
irb(main):004:0> a.unpack("m")
=> ["ok"]
irb(main):005:0> b.unpack("m")
=> ["\304\000\e\332"]
 
J

Jan Svitok

If you post your code along with expected and actual output (e.g.
those hexdumps), perhaps somebody will have a look... just post as

Sorry, I didn't read your first post properly... I guess I'm doing too
manyu things at once...
 
A

alexander

whee!
thank you!
the three byte sequence you pointed out at the start of the file was the
culprit.

i just needed to [3..-1] that out of the way and everything works
perfectly now... (crossing my fingers now that the app that´s producing
those files doesn´t put illegal characters somewhere in the middle of
the files, but that hasn´t happened yet.)

according to the rfc this still seems like a bug to me.
is there anywhere i should report that bug (if it is one)?

thank you guys again for looking into this!
really made my day that it´s solved now.

kind regards,
alexander






Brian said:
thank you for your tips!

and indeed using write instead of puts i atleast got the filesize right.
sadly everything else is still wrong.

i think the problem is definitely in the decoding, but i don´t even know
where to start there since the resulting files vary to a great degree.


Firstly, use hexdump -C on both the output files.

If they both start with FF FE 00 78 00 61 01 FF FF FF FF
then you know that the headers are right and it's the base64-decoded bit
which is wrong.



BTW there's a built-in alternative:

all_bytes = IO.read("test.rgb").unpack("m")[0]

But on your test file they give the same results.



If this is a Windows platform, use "wb" instead of "w". However you say that
now you're using write instead of puts, the files are the same size anyway.



I can see two issues with that file:

(1) It has no line breaks, but I don't think that matters.

(2) It starts with the three-byte sequence ef bb bf, which is a unicode
<FEFF> character according to my editor.

Stripping this off gives a completely different answer to the base64
decoding:

irb(main):027:0> a=IO.read("test.rgb"); nil
=> nil
irb(main):028:0> b=a.unpack("m")[0]; b.size
=> 46560
irb(main):029:0> c=a[3..-1].unpack("m")[0]; c.size
=> 46560
irb(main):030:0> b[0..5]
=> "\304\000\000={u"
irb(main):031:0> c[0..5]
=> "\000\365\355\326\000\342"

and perhaps this second one is the answer you're looking for.

If so, I would say that unpack("m") is badly broken. Either it should give
an exception when presented with characters outside of the base64 set, or it
should ignore them. According to RFC 2045 section 6.8,

The encoded output stream must be represented in lines of no more
than 76 characters each. All line breaks or other characters not
found in Table 1 must be ignored by decoding software. In base64
data, characters other than those in Table 1, line breaks, and other
white space probably indicate a transmission error, about which a
warning message or even a message rejection might be appropriate
under some circumstances.

I would consider the unicode BOM as "white space", but in any case it must
either be ignored or cause a warning or error; it must not cause the data to
be decoded wrongly!

BTW, I did the above test under ruby 1.8.4 (2005-12-24) [i486-linux] from
Ubuntu 6.06. It's possible that it has been fixed in a later version.

HTH,

Brian.
 
B

Brian Candler

i just needed to [3..-1] that out of the way and everything works
perfectly now... (crossing my fingers now that the app that´s producing
those files doesn´t put illegal characters somewhere in the middle of
the files, but that hasn´t happened yet.)

Maybe just gsub! everything else out. Untested:

gsub!(/[^A-Za-z0-9+\/=]/, '')
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,740
Latest member
JudsonFrie

Latest Threads

Top