Problem with Base64 decoding

alexander · Jan 29, 2007

hi there,
last time i accidentely posted this question as a reply to another one..
i´m really sorry for that. i will not make that mistake again.

so here´s my question again in a fresh new thread

i´m having a small problem with base64 decoding a string.
i´m porting a php script over to ruby and the decoding gives me
different results in ruby and in php. the problem is that the php
results works for the processing i do afterwards while the ruby version
doesn´t.
here´s the scripts in question:

php:
<?

$bytes = file_get_contents("test.rgb");
$bitmap = base64_decode($bytes);

$header = "";
$header .= "\xFF\xFE";
$header .= pack("n2",120,97);
$header .= "\x01";
$header .= "\xFF\xFF\xFF\xFF";

$header .= $bitmap;

file_put_contents("test_php.gd",$header);
?>

ruby:
require 'rubygems'
require 'fileutils'
require 'base64'

all_bytes = Base64.decode64(IO.read("test.rgb"))

bitmap = "\xFF\xFE"
bitmap << [120,97].pack("n2")
bitmap << "\x01"
bitmap << "\xFF\xFF\xFF\xFF"
bitmap << all_bytes

File.new("test_ruby.gd","w").puts(bitmap)

the ruby version is one byte shorter.

i´m probably missing something rather obvious here, but any pointers to
how i can make the ruby output be like the php output would be greatly
appreciated

i´ve uploaded the test.rgb file i´m using to here:

http://rss.fork.de/test.rgb if that´s even needed

thanks a lot,

alexander

Jan Svitok · Jan 29, 2007

hi there,
last time i accidentely posted this question as a reply to another one..
i=B4m really sorry for that. i will not make that mistake again.

so here=B4s my question again in a fresh new thread

i=B4m having a small problem with base64 decoding a string.
i=B4m porting a php script over to ruby and the decoding gives me
different results in ruby and in php. the problem is that the php
results works for the processing i do afterwards while the ruby version
doesn=B4t.
here=B4s the scripts in question:

php:
<?

$bytes =3D file_get_contents("test.rgb");
$bitmap =3D base64_decode($bytes);

$header =3D "";
$header .=3D "\xFF\xFE";
$header .=3D pack("n2",120,97);
$header .=3D "\x01";
$header .=3D "\xFF\xFF\xFF\xFF";

$header .=3D $bitmap;

file_put_contents("test_php.gd",$header);
?>

ruby:
require 'rubygems'
require 'fileutils'
require 'base64'

all_bytes =3D Base64.decode64(IO.read("test.rgb"))

bitmap =3D "\xFF\xFE"
bitmap << [120,97].pack("n2")
bitmap << "\x01"
bitmap << "\xFF\xFF\xFF\xFF"
bitmap << all_bytes

File.new("test_ruby.gd","w").puts(bitmap)

the ruby version is one byte shorter.

i=B4m probably missing something rather obvious here, but any pointers to
how i can make the ruby output be like the php output would be greatly
appreciated

i=B4ve uploaded the test.rgb file i=B4m using to here:

http://rss.fork.de/test.rgb if that=B4s even needed

Hi,

1. have a look at the differences in those two files. By that you
should be able to tell where's the problem: either in the decoding
part or in the assembling.

2. you are using puts that appends a newline, so it seems to me that
ruby version is one byte LONGER. if that's the problem, replace puts
with write.

3. File.open("test_ruby.gd","w") {|f| f.puts(bitmap) } should be
safer, as it doesn't rely on garbage collector for closing the file,
it is closed immediately after the block finishes. This will be
helpful when you'll work with large number of files (and you'll run
out of free descriptors)

4. I guess you don't need rubygems nor fileutils for this to work
(that's ok if you use them for some other code not posted)

alexander · Feb 9, 2007

hi there,
thank you for your tips!

and indeed using write instead of puts i atleast got the filesize right.
sadly everything else is still wrong.

i think the problem is definitely in the decoding, but i don´t even know
where to start there since the resulting files vary to a great degree.
inspecting a hexdump of both decoded files show that they are not even
remotely the same.

right now i use a php script that i call with system. it´s kind of an
ugly solution, but at least it works

i will keep trying though to get a 100% ruby solution to this problem.

kind regards and thanks again,

alexander

Jan said:
hi there,
last time i accidentely posted this question as a reply to another one..
i´m really sorry for that. i will not make that mistake again.

so here´s my question again in a fresh new thread

i´m having a small problem with base64 decoding a string.
i´m porting a php script over to ruby and the decoding gives me
different results in ruby and in php. the problem is that the php
results works for the processing i do afterwards while the ruby version
doesn´t.
here´s the scripts in question:

php:
<?

$bytes = file_get_contents("test.rgb");
$bitmap = base64_decode($bytes);

$header = "";
$header .= "\xFF\xFE";
$header .= pack("n2",120,97);
$header .= "\x01";
$header .= "\xFF\xFF\xFF\xFF";

$header .= $bitmap;

file_put_contents("test_php.gd",$header);
?>

ruby:
require 'rubygems'
require 'fileutils'
require 'base64'

all_bytes = Base64.decode64(IO.read("test.rgb"))

bitmap = "\xFF\xFE"
bitmap << [120,97].pack("n2")
bitmap << "\x01"
bitmap << "\xFF\xFF\xFF\xFF"
bitmap << all_bytes

File.new("test_ruby.gd","w").puts(bitmap)

the ruby version is one byte shorter.

i´m probably missing something rather obvious here, but any pointers to
how i can make the ruby output be like the php output would be greatly
appreciated

i´ve uploaded the test.rgb file i´m using to here:

http://rss.fork.de/test.rgb if that´s even needed

Click to expand...

Hi,

1. have a look at the differences in those two files. By that you
should be able to tell where's the problem: either in the decoding
part or in the assembling.

2. you are using puts that appends a newline, so it seems to me that
ruby version is one byte LONGER. if that's the problem, replace puts
with write.

3. File.open("test_ruby.gd","w") {|f| f.puts(bitmap) } should be
safer, as it doesn't rely on garbage collector for closing the file,
it is closed immediately after the block finishes. This will be
helpful when you'll work with large number of files (and you'll run
out of free descriptors)

4. I guess you don't need rubygems nor fileutils for this to work
(that's ok if you use them for some other code not posted)

Jan Svitok · Feb 9, 2007

hi there,
thank you for your tips!

and indeed using write instead of puts i atleast got the filesize right.
sadly everything else is still wrong.

i think the problem is definitely in the decoding, but i don=B4t even kno= w
where to start there since the resulting files vary to a great degree.
inspecting a hexdump of both decoded files show that they are not even
remotely the same.

If you post your code along with expected and actual output (e.g.
those hexdumps), perhaps somebody will have a look... just post as
short data file as possible (meaning that it still decodes wrong).
That reminds me: did you try decoding an empty file?

Brian Candler · Feb 9, 2007

thank you for your tips!

and indeed using write instead of puts i atleast got the filesize right.
sadly everything else is still wrong.

i think the problem is definitely in the decoding, but i don´t even know
where to start there since the resulting files vary to a great degree.

Firstly, use hexdump -C on both the output files.

If they both start with FF FE 00 78 00 61 01 FF FF FF FF
then you know that the headers are right and it's the base64-decoded bit
which is wrong.

BTW there's a built-in alternative:

all_bytes = IO.read("test.rgb").unpack("m")[0]

But on your test file they give the same results.

If this is a Windows platform, use "wb" instead of "w". However you say that
now you're using write instead of puts, the files are the same size anyway.

I can see two issues with that file:

(1) It has no line breaks, but I don't think that matters.

(2) It starts with the three-byte sequence ef bb bf, which is a unicode
<FEFF> character according to my editor.

Stripping this off gives a completely different answer to the base64
decoding:

irb(main):027:0> a=IO.read("test.rgb"); nil
=> nil
irb(main):028:0> b=a.unpack("m")[0]; b.size
=> 46560
irb(main):029:0> c=a[3..-1].unpack("m")[0]; c.size
=> 46560
irb(main):030:0> b[0..5]
=> "\304\000\000={u"
irb(main):031:0> c[0..5]
=> "\000\365\355\326\000\342"

and perhaps this second one is the answer you're looking for.

If so, I would say that unpack("m") is badly broken. Either it should give
an exception when presented with characters outside of the base64 set, or it
should ignore them. According to RFC 2045 section 6.8,

The encoded output stream must be represented in lines of no more
than 76 characters each. All line breaks or other characters not
found in Table 1 must be ignored by decoding software. In base64
data, characters other than those in Table 1, line breaks, and other
white space probably indicate a transmission error, about which a
warning message or even a message rejection might be appropriate
under some circumstances.

I would consider the unicode BOM as "white space", but in any case it must
either be ignored or cause a warning or error; it must not cause the data to
be decoded wrongly!

BTW, I did the above test under ruby 1.8.4 (2005-12-24) [i486-linux] from
Ubuntu 6.06. It's possible that it has been fixed in a later version.

HTH,

Brian.

Brian Candler · Feb 9, 2007

Here's a more concise summary of the bug.

irb(main):001:0> RUBY_VERSION
=> "1.8.4"
irb(main):002:0> a = "b2s="
=> "b2s="
irb(main):003:0> b = "\xef\xbb\xbf" + a
=> "\357\273\277b2s="
irb(main):004:0> a.unpack("m")
=> ["ok"]
irb(main):005:0> b.unpack("m")
=> ["\304\000\e\332"]

Jan Svitok · Feb 9, 2007

If you post your code along with expected and actual output (e.g.
those hexdumps), perhaps somebody will have a look... just post as

Sorry, I didn't read your first post properly... I guess I'm doing too
manyu things at once...

alexander · Feb 9, 2007

whee!
thank you!
the three byte sequence you pointed out at the start of the file was the
culprit.

i just needed to [3..-1] that out of the way and everything works
perfectly now... (crossing my fingers now that the app that´s producing
those files doesn´t put illegal characters somewhere in the middle of
the files, but that hasn´t happened yet.)

according to the rfc this still seems like a bug to me.
is there anywhere i should report that bug (if it is one)?

thank you guys again for looking into this!
really made my day that it´s solved now.

kind regards,
alexander

Brian said:
thank you for your tips!

and indeed using write instead of puts i atleast got the filesize right.
sadly everything else is still wrong.

i think the problem is definitely in the decoding, but i don´t even know
where to start there since the resulting files vary to a great degree.

Click to expand...

Firstly, use hexdump -C on both the output files.

If they both start with FF FE 00 78 00 61 01 FF FF FF FF
then you know that the headers are right and it's the base64-decoded bit
which is wrong.

BTW there's a built-in alternative:

all_bytes = IO.read("test.rgb").unpack("m")[0]

But on your test file they give the same results.

If this is a Windows platform, use "wb" instead of "w". However you say that
now you're using write instead of puts, the files are the same size anyway.

I can see two issues with that file:

(1) It has no line breaks, but I don't think that matters.

(2) It starts with the three-byte sequence ef bb bf, which is a unicode
<FEFF> character according to my editor.

Stripping this off gives a completely different answer to the base64
decoding:

irb(main):027:0> a=IO.read("test.rgb"); nil
=> nil
irb(main):028:0> b=a.unpack("m")[0]; b.size
=> 46560
irb(main):029:0> c=a[3..-1].unpack("m")[0]; c.size
=> 46560
irb(main):030:0> b[0..5]
=> "\304\000\000={u"
irb(main):031:0> c[0..5]
=> "\000\365\355\326\000\342"

and perhaps this second one is the answer you're looking for.

If so, I would say that unpack("m") is badly broken. Either it should give
an exception when presented with characters outside of the base64 set, or it
should ignore them. According to RFC 2045 section 6.8,

The encoded output stream must be represented in lines of no more
than 76 characters each. All line breaks or other characters not
found in Table 1 must be ignored by decoding software. In base64
data, characters other than those in Table 1, line breaks, and other
white space probably indicate a transmission error, about which a
warning message or even a message rejection might be appropriate
under some circumstances.

I would consider the unicode BOM as "white space", but in any case it must
either be ignored or cause a warning or error; it must not cause the data to
be decoded wrongly!

BTW, I did the above test under ruby 1.8.4 (2005-12-24) [i486-linux] from
Ubuntu 6.06. It's possible that it has been fixed in a later version.

HTH,

Brian.

Brian Candler · Feb 9, 2007

i just needed to [3..-1] that out of the way and everything works
perfectly now... (crossing my fingers now that the app that´s producing
those files doesn´t put illegal characters somewhere in the middle of
the files, but that hasn´t happened yet.)

Maybe just gsub! everything else out. Untested:

gsub!(/[^A-Za-z0-9+\/=]/, '')

Encoding/decoding a image as Base64 (fails under Ruby1.9 but worksunder Ruby1.8)	7	Dec 3, 2009
Problem with decoding a compressed string	1	Dec 2, 2007
problem with jsp - Base64 decode POST Parameter	3	Sep 22, 2010
Can't encode base64 for a jpg file	4	Sep 24, 2010
Problem with base64 mimeparts and email.*	0	Apr 22, 2009
Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
Ruby Newbie Problems with deflate, base64...	4	Mar 13, 2007
problem with logic in reading a binary file	9	Mar 29, 2008

Problem with Base64 decoding

alexander

Jan Svitok

alexander

Jan Svitok

Brian Candler

Brian Candler

Jan Svitok

alexander

Brian Candler

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads