Big endian convention in Ruby

Z

Zangief Ief

Hello,

I would like to convert an input (like a file, or a password) into
binary format. After reading my Ruby in a nutshell book, I belive I can
use this method:
unpack('C*')

So according the documentation, this is great for unsigned chars. But do
the binary representation will respect the big endian convention?

Thank you.
 
R

Robert Klemme

I would like to convert an input (like a file, or a password) into
binary format. After reading my Ruby in a nutshell book, I belive I can
use this method:
unpack('C*')

This will give you an array of integer byte values. I am not sure where
there you see the binary format. What exactly do you want to achieve?
So according the documentation, this is great for unsigned chars. But do
the binary representation will respect the big endian convention?

What exactly do you mean by this? Are you referring to bits inside a
byte or to the ordering of multiple bytes? If the latter, there is no
point in talking about big or little endian when encoding byte wise
because there are not multiple bytes belonging together. If the former,
I am not sure whether there is any platform that reverses bits but it
could be possible. OTOH, how would you notice?

Kind regards

robert
 
Z

Zangief Ief

Thanks you for you answer.
Actually I would like to rewrite the SHA1 algorithm
(http://en.wikipedia.org/wiki/SHA_hash_functions#SHA-1_pseudocode) in a
pure ruby implementation. And in this way, I would need the ability to
accomplish the "Pre-processing" step by converting the input as a 64-bit
big-endian integer. I believe that's could be more simple to do in Ruby
then in an other language such as in C. But I am not really sure about
the way to do so.
 
B

Brian Candler

Zangief said:
Thanks you for you answer.
Actually I would like to rewrite the SHA1 algorithm
(http://en.wikipedia.org/wiki/SHA_hash_functions#SHA-1_pseudocode) in a
pure ruby implementation. And in this way, I would need the ability to
accomplish the "Pre-processing" step by converting the input as a 64-bit
big-endian integer. I believe that's could be more simple to do in Ruby
then in an other language such as in C. But I am not really sure about
the way to do so.

ri String#unpack

Unfortunately, the q/Q conversion character seems to use native ordering
and I don't think there's a network-order equivalent:

irb(main):002:0> "\000\000\000\000\000\000\000\001".unpack("Q")
=> [72057594037927936]

If all you're concerned about is this step:

"append length of message (before pre-processing), in bits, as 64-bit
big-endian integer"

then you could do it by converting to hex first:

buff << [("%016X" % len)].pack("H*")

BTW, I presume you're doing this as an academic exercise. After all,
there's already:

require 'digest/sha1'
puts Digest::SHA1.hexdigest("hello world")

HTH,

Brian.
 
R

Robert Klemme

Zangief said:
Thanks you for you answer.
Actually I would like to rewrite the SHA1 algorithm
(http://en.wikipedia.org/wiki/SHA_hash_functions#SHA-1_pseudocode) in a
pure ruby implementation. And in this way, I would need the ability to
accomplish the "Pre-processing" step by converting the input as a 64-bit
big-endian integer. I believe that's could be more simple to do in Ruby
then in an other language such as in C. But I am not really sure about
the way to do so.

ri String#unpack

Unfortunately, the q/Q conversion character seems to use native ordering
and I don't think there's a network-order equivalent:

irb(main):002:0> "\000\000\000\000\000\000\000\001".unpack("Q")
=> [72057594037927936]

If all you're concerned about is this step:

"append length of message (before pre-processing), in bits, as 64-bit
big-endian integer"

then you could do it by converting to hex first:

buff << [("%016X" % len)].pack("H*")

Or use "N" and combine, e.g.

irb(main):007:0> s = "\000\000\000\000\000\000\000\001"
=> "\000\000\000\000\000\000\000\001"
irb(main):008:0> r=[];s.unpack("N*").each_slice(2) {|hi,lo| r << (hi <<
32 | lo)}; r
=> [1]

Kind regards

robert
 
Z

Zangief Ief

Thank you all for your help.

So if I have well understood, is that correct if I use unpack('N*') like
this?
=> "1001011110100010100001010001010100101100001110000101010101100111"
 
B

Brian Candler

Zangief said:
So if I have well understood, is that correct if I use unpack('N*') like
this?

=> "1001011110100010100001010001010100101100001110000101010101100111"

No. The message itself isn't treated as a 64-bit integer, only the
*length* of the message is a 64-bit integer, which is *appended* to the
message. In this case the length is 9*8 = 72 bits, so you need
\x00\x00\x00\x00\x00\x00\x00\x48

Anyway, I don't know why you are going to binary. You just want a String
of bytes. Don't worry about the order of bits-within-bytes; it will be
correct, trust me :)

Of course, if you are trying to write an SHA1 implementation which
properly handles input streams which are not a multiple of 8 bits long
(as many don't), then you have a little bit more work to do. But not
very much, since the padding operating makes it into whole bytes anyway.

e.g. if your input is
10101010101

this becomes

10101010 10110000 00000000 00000000 ...
^^^^^ ^^^^^^^^ ^^^^^^^^
padding

and hence your string just needs to be \xAA\xB0\x00\x00 ..... padded to
the correct length. And the length is \x00\x00\x00\x00\x00\x00\x00\x0b,
i.e. 11 bits.

However if your SHA1 input is just a stream of bytes, as is normally the
case, then the padding is simply \x80\x00\x00\x00\x00 ... etc

Anyway, this is no longer a Ruby question, this is about reading the
SHA1 pseudocode correctly. But you could always submit it as a Ruby Quiz
idea :)
 
B

Brian Candler

Just to make this clearer: the padding operation just pads the message
up to a multiple of 64 bytes (512 bits), where the last block consists
of 56 bytes (448 bits) followed by 8 bytes of message length.

So assuming your message consists only of whole bytes, as your example
implied, then I believe the padding operation is simply this:

message = "A message"
bits = message.size * 8
message << "\x80"
message << "\x00" while (message.size & 63) != 56
message << [("%016X" % bits)].pack("H*")

Now your message is exactly n * 64 bytes long, and you can proceed.
 
Z

Zangief Ief

My apology, I had made a confusion between the length of the message and
the length appended of it at its end... Now that's okay, many thanks :)

I just have an ultimate question:
Because I would like to work with an input in binary format, I would
like to convert the message at the begining, before append the bit '1'
=>
"100000100000010010110110101001101100111011001110100001101110011010100110"

There is .unpack('B*') too, but with "B" the order is not correct I
think.
 
B

Brian Candler

Zangief said:
Because I would like to work with an input in binary format, I would
like to convert the message at the begining, before append the bit '1'

=>
"100000100000010010110110101001101100111011001110100001101110011010100110"

There is .unpack('B*') too, but with "B" the order is not correct I
think.

I believe you'll need B*. The letter "A" should unpack to 01000001 (MSB
first).

However this is a really, really bad way to implement the SHA1
algorithm. If the input is already presented as a string of bytes, then
it is completely pointless to convert it into a string of bits, because
the SHA1 algorithm is *designed* to be run on bytes, as the pseudocode
demonstrates. That is one reason why the input has to be padded to a
multiple of 64 bytes; so that the core loop does *not* have to worry
about working at the bit level!

Of course, as an academic exercise, you're free to do whatever you like.
If you want to experiment with binary arithmetic where the operands are
strings of 0x30 and 0x31 (representing bit 0 and bit 1 respectively),
then fine. The resulting code will be tortuous, use tons of RAM and run
extremely slowly.

(Hopefully it should also be clear from the pseudocode that you don't
have to read in the entire message at the start at all. You can process
the message in 64-byte chunks, *as it arrives*)
 
Z

Zangief Ief

Many Thanks for all your answers, Brian Candler. I am going to work as
you said, because I think that's really more efficient.

Regards
 
A

Ashrith Barthur

Brian said:
Just to make this clearer: the padding operation just pads the message
up to a multiple of 64 bytes (512 bits), where the last block consists
of 56 bytes (448 bits) followed by 8 bytes of message length.

So assuming your message consists only of whole bytes, as your example
implied, then I believe the padding operation is simply this:

message = "A message"
bits = message.size * 8
message << "\x80"
message << "\x00" while (message.size & 63) != 56
message << [("%016X" % bits)].pack("H*")

Now your message is exactly n * 64 bytes long, and you can proceed.

Firstly, forgive me for continuing the topic of SHA-1 here.. but I found
that this is the relevant to what I am doing right now and hence wanted
to post here.

Brian Candler, I would like to say that my implementation of SHA-1 is
almost on the same lines as you have explained. I have also coded it so
that I handed in hex.

Let me list down the requirements of SHA-1 implementations followed by
where we might have an issue while using ruby.

1. There is a bit wise operation that is required between 2 Hex values
and this will only occur if both of them are Integer-Hex or Integer
anything.

When you use unpack in ruby to get a string into its hex values, Ruby
still thinks that it is a string but only in its hex values.
For this to undergo bitwise operation.
You will have to explicitly convert this to an Integer.

For example lets say the string is 'abc'
a gives 61 when unpacked in hex. so now. if the array holding this is
messageHex and the position of a is [0] then we will have to explicitly
say
messageHex[0].hex.to_i this will ensure that it is a integer in hex.

Next thing.. appending strings '0x80' or '0x00' is felt not to be
appropriate by me because.. if you were to use 0x80 or 0x00 then ruby
thinks that its an integer already and you dont need to do any explicit
type casting.

Also ruby does not explicitly give you a value in hex if you do any
mathematical or bitwise operation in hex, it always defaults to dec.

These were some of the issues I faced while implementing SHA-1.

Please do let me know if there are any workarounds or easier way to
implement or typecast the hex values. Also is there an default value for
ruby to understand that any unpacking of a string to hex will tell ruby
to take them as a integer value directly.

Thanks
Ashrith
 
B

Brian Candler

Ashrith said:
1. There is a bit wise operation that is required between 2 Hex values
and this will only occur if both of them are Integer-Hex or Integer
anything.

"Integer-Hex" doesn't really mean anything.

The specification says it works on 32-bit unsigned integer values, and
that each 64-byte block of source data is treated as 16 x 32-bit words.
You can get this via

# data is a 64-byte string

w = data.unpack("N16")
# now w is an array of 16 Integers
For example lets say the string is 'abc'
a gives 61 when unpacked in hex. so now. if the array holding this is
messageHex and the position of a is [0] then we will have to explicitly
say
messageHex[0].hex.to_i this will ensure that it is a integer in hex.

String#hex will give you an Integer directly; the to_i is superfluous.

But in any case, the conversion into hex-ascii in the first place is
superfluous. Unpack directly to Integers, as shown above.
Next thing.. appending strings '0x80' or '0x00' is felt not to be
appropriate by me because.. if you were to use 0x80 or 0x00 then ruby
thinks that its an integer already and you dont need to do any explicit
type casting.

Sorry, but I am unable to make any sense of that sentence at all. The
input data to SHA1 is an arbitary-sized string of bytes (*); the padding
algorithm requires you to add more bytes (*) to the end, to achieve
alignment into 64-byte blocks. So adding padding bytes is exactly what
is required.

(*) actually an arbitary-sized string of *bits*, but most
implementations assume that it's a whole number of bytes, i.e. n*8 bits.
Also ruby does not explicitly give you a value in hex if you do any
mathematical or bitwise operation in hex, it always defaults to dec.

I think you may have lost the distinction between a number, and its
external representation.

Doing a bitwise operation "in hex" or "in decimal" doesn't make any
sense. The number is stored internally in binary - this is a digital
computer, after all - and the bitwise operations are done on those bits.
It is only converted into a hex or decimal representation at the point
where you input or output the number.

a = 20
a.to_s # converts to string "20"
a.to_s(16) # converts to string "14"

Anyway, maybe you would like to submit this as a ruby quiz, as you'd
probably get some good implementations to look at.

Brian.
 
A

Ashrith Barthur

Brian said:
Ashrith said:
1. There is a bit wise operation that is required between 2 Hex values
and this will only occur if both of them are Integer-Hex or Integer
anything.

"Integer-Hex" doesn't really mean anything.

The specification says it works on 32-bit unsigned integer values, and
that each 64-byte block of source data is treated as 16 x 32-bit words.
You can get this via

# data is a 64-byte string

w = data.unpack("N16")
# now w is an array of 16 Integers
For example lets say the string is 'abc'
a gives 61 when unpacked in hex. so now. if the array holding this is
messageHex and the position of a is [0] then we will have to explicitly
say
messageHex[0].hex.to_i this will ensure that it is a integer in hex.

String#hex will give you an Integer directly; the to_i is superfluous.

But in any case, the conversion into hex-ascii in the first place is
superfluous. Unpack directly to Integers, as shown above.
Next thing.. appending strings '0x80' or '0x00' is felt not to be
appropriate by me because.. if you were to use 0x80 or 0x00 then ruby
thinks that its an integer already and you dont need to do any explicit
type casting.

Sorry, but I am unable to make any sense of that sentence at all. The
input data to SHA1 is an arbitary-sized string of bytes (*); the padding
algorithm requires you to add more bytes (*) to the end, to achieve
alignment into 64-byte blocks. So adding padding bytes is exactly what
is required.

(*) actually an arbitary-sized string of *bits*, but most
implementations assume that it's a whole number of bytes, i.e. n*8 bits.
Also ruby does not explicitly give you a value in hex if you do any
mathematical or bitwise operation in hex, it always defaults to dec.

I think you may have lost the distinction between a number, and its
external representation.

Doing a bitwise operation "in hex" or "in decimal" doesn't make any
sense. The number is stored internally in binary - this is a digital
computer, after all - and the bitwise operations are done on those bits.
It is only converted into a hex or decimal representation at the point
where you input or output the number.

a = 20
a.to_s # converts to string "20"
a.to_s(16) # converts to string "14"

Anyway, maybe you would like to submit this as a ruby quiz, as you'd
probably get some good implementations to look at.

Brian.

Hi Brian..

Pretty much did on the same lines and Instead of N16 I unpacked it as
H8*16 just because the display has to be in hex mode.

Additionally.. I am in a tight spot right now. I have coded the complete
algorithm. I do get the values out, or the digest...

You know how for SHA-1 we need to do a looping for 80 times so that the
bits are rotated and the a,b,c,d are changed and the keys are used...
well.. this is the funny thing that happens with my code.

It works perfectly fine with in give the loop 64 times.. that is

for i in 0..63 then moment i say.

for i in 0.79 the code returns with an error saying
"`[]=': index 64 out of string (IndexError)"

I earlier thought that ruby arrays are limited to only 64 of them..
which i feel is a stupid assumption on my part, but I just gave it a
shot and recoded to handle the loop more than 64 but still its does not
work.

I searched far and wide on the internet and I don't see anyone posting
this kind of an error .. I really dont get it as to why the code would
work with 64 loops but not 79 loops.

Is it gotta do with the amount of numbers that are being inserted in
each value of the array? while unpacking I have defined it as H8*16 so
does that limit the size of each value in array? would it work if i say
N16? or is it a completely different error?
 
B

Brian Candler

Ashrith said:
I searched far and wide on the internet and I don't see anyone posting
this kind of an error .. I really dont get it as to why the code would
work with 64 loops but not 79 loops.

The only answer I can give is trite: "because there is a bug in your
program", or "because you are doing something wrong". If you don't post
the code, then we cannot guess what you are doing wrong.

At a guess: your input by this stage should consist of an array of 80
32-bit integers, and you should be indexing this array to obtain w.
Possibly you have set up this array wrongly.
Is it gotta do with the amount of numbers that are being inserted in
each value of the array? while unpacking I have defined it as H8*16 so
does that limit the size of each value in array? would it work if i say
N16? or is it a completely different error?

Arrays in Ruby are of unlimited size (subject only to available RAM).

FWIW, I have attached a direct translation of the pseudocode on
Wikipedia. It seems to work for the handful of test vectors I've tried.

$ echo -n "" | sha1sum
da39a3ee5e6b4b0d3255bfef95601890afd80709 -
$ echo -n "hello" | sha1sum
aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d -
$ echo -n "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
| sha1sum
06ced2e070e58c2c4ed9f2b8cb890f0c512ce60d -

Attachments:
http://www.ruby-forum.com/attachment/2846/sha1.rb
 
A

Ashrith Barthur

Brian said:
The only answer I can give is trite: "because there is a bug in your
program", or "because you are doing something wrong". If you don't post
the code, then we cannot guess what you are doing wrong.

At a guess: your input by this stage should consist of an array of 80
32-bit integers, and you should be indexing this array to obtain w.
Possibly you have set up this array wrongly.

Arrays in Ruby are of unlimited size (subject only to available RAM).

FWIW, I have attached a direct translation of the pseudocode on
Wikipedia. It seems to work for the handful of test vectors I've tried.

$ echo -n "" | sha1sum
da39a3ee5e6b4b0d3255bfef95601890afd80709 -
$ echo -n "hello" | sha1sum
aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d -
$ echo -n "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
| sha1sum
06ced2e070e58c2c4ed9f2b8cb890f0c512ce60d -


The following is the code that I have set.... I figured that the fix for
that array limit is padding more 00s and then ripping it to a 80 row
array...
The thing is the code perfectly.. works but my shasum is not the say
that I am supposed to get.. well thats the issue...

here is the code

#!/usr/bin/ruby -w

h0=0x67452301
h1=0xefcdab89
h2=0x98badcfe
h3=0x10325476
h4=0xc3d2e1f0

message='abcde'

bit=message.size
message << '80'.hex

if(bit>64) then
newbitlength=bit%64
else
newbitlength=bit
end

while (newbitlength<=61) do
message <<'00'.hex
newbitlength=newbitlength+1
end

message << (('%016X' %(bit*8)).hex)

for i in (0..79)
message<<'00'.hex
end
message.unpack('H8'*80)

a=h0
b=h1
c=h2
d=h3
e=h4

for i in (0..79)
if (i>=16 or i<=79) then
message=(((message[i-3]) ^ (message[i-8]) ^
(message[i-14]) ^ (message[i-16]))<<1)
tempmessage=(message)>>31
message=(message<<1)+tempmessage
end

if (i>=0 or i<=19) then
f=((b&c)|((~b)&d))
k=0x5A827999
elsif (i>=20 or i<=39) then
f=b^c^d
k=0x6ED9EBA1
elsif (i>=40 or i<=59) then
f=(b&c)|(b&d)|(c&d)
k=0x8F1BBCDC
else
f=b^c^d
k=0xCA62C1D6
end

tempvaluea=a>>27
arot=(a<<5)+tempvaluea

temp = (arot+f+e+k+(message))%(2**32)



e=d
d=c
tempvalueb=b>>2
brot=(b<<30)+tempvalueb
c=brot
b=a
a=temp

h0=(h0+a)%(2**32)
h1=(h1+b)%(2**32)
h2=(h2+c)%(2**32)
h3=(h3+d)%(2**32)
h4=(h4+e)%(2**32)

puts i
puts "The value of H0:"<<h0.to_s(base=16)
puts "The value of H1:"<<h1.to_s(base=16)
puts "The value of H2:"<<h2.to_s(base=16)
puts "The value of H3:"<<h3.to_s(base=16)
puts "The value of H4:"<<h4.to_s(base=16)
end


puts "The digest for the given input is
:"<<h0.to_s(base=16)<<h1.to_s(base=16)<<h2.to_s(base=16)<<h3.to_s(base=16)<<h4.to_s(base=16)


Regards,Ashrith
 
B

Brian Candler

Ashrith said:
The thing is the code perfectly.. works but my shasum is not the say
that I am supposed to get.. well thats the issue...

Then it's not working perfectly, is it?

OK, there's tons wrong with this code. I'm not going to debug it fully
for you, as you need to do this yourself as a learning experience. I
suggest you debug it by putting lots of debugging statements in, e.g.

puts "h0 = #{h0.inspect}"

and comparing the values in your code at each point as it runs with
those in mine.

However, the following glaring errors stand out just from a visual
inspection:
while (newbitlength<=61) do

I think this should be <56 (448 bits)
message << (('%016X' %(bit*8)).hex)

Wrong expression: you have converted bit*8 to a hex ascii string, then
converted it straight back to decimal!!! So this is identical to

message << (bit * 8)

which will append one character to the message.

I suggest adding a check at this point to see that the padded message is
exactly a multiple of 64 bytes long, because with your code I don't
think it is, but this is a requirement for the rest of the algorithm to
proceed.
for i in (0..79)
message<<'00'.hex
end

Nowhere in the algorithm does it say add 80 zero bytes to the end of the
message.
message.unpack('H8'*80)

This is a bizarre unpack operation on the message. But not only that,
you have not assigned the result to anywhere - so this line doesn't do
anything at all!
a=h0
b=h1
c=h2
d=h3
e=h4

All the code from this point should be inside a loop, one iteration for
each 64-byte block of the message (as the pseudocode says: "break
message into 512-bit chunks // for each chunk")
for i in (0..79)
if (i>=16 or i<=79) then
message=(((message[i-3]) ^ (message[i-8]) ^
(message[i-14]) ^ (message[i-16]))<<1)
tempmessage=(message)>>31
message=(message<<1)+tempmessage
end


The pseudocode says: "break chunk into sixteen 32-bit big-endian words
w, 0 ≤ i ≤ 15", but you have not done this.

So in your code, message is a single byte, message[0] to message[63],
but actually you should have converted this to w[0] to w[15], each
element of w comprising 4 bytes from the original message.
puts "The value of H0:"<<h0.to_s(base=16)

The assignment to 'base' is not needed. i.e. h0.to_s(16) is all you
need.

However this won't pad the string to 8 hex characters, so what you
really want is

("%08X" % h0)

That's plenty of help - especially since you also have a working version
to compare against - so I'm not going to help you further on this.

Brian.
 
B

Brian Candler

Brian said:
I
suggest you debug it by putting lots of debugging statements in, e.g.

puts "h0 = #{h0.inspect}"

... and I also suggest you start with the message padding. For example,
the message "abcde" should pad to

"abcde\200\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000("

That is, "abcde" followed by 0x80 followed by 50 zeros (making 56 bytes
so far), followed by 00 00 00 00 00 00 00 28 which is the length of the
message in bits as a 64-bit big-endian value, to give a 64-byte block.

If the message doesn't look like this before you enter your block
processing loop, then there's no hope of getting the right answer (GIGO
principle)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,816
Latest member
SapanaCarpetStudio

Latest Threads

Top