splitting binary data

hroyd hroyd · Apr 20, 2011

Hello

First post (i am new to ruby

). Can you help?

I am using eventmachine to read in TCP segments off the network. I read
in a TCP segment that contains 4 messages. The TCP segment binary data
is shown below, where
\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\ is the
marker for each message. I would like to split the data intot he 4
messages, but am having trouble doing so. When I split the data, the
whole message gets inserted into the first array element. I understand I
may need to escape the \, but how would i do that for the following
message. I can split it by unpacking to Hex, and the splitting, but that
is inefficient for my needs as I use bindata to inspect the packet. Any
help is appreciated

Thanks

\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00R\x02\x00\x00\x00'@\x01\x01\x00@\x02\x00@\x03\x04\n\x10\x8E\xC8\x80\x04\x04\x00\x00\x002@\x05\x04\x00\x00\x00d\xC0\b\b\x00\x01\x00\x01\x00\x01\x00\x02\x18\n\x10\x8E\b\x04\x18\x02\x02\x02
\n\x13\x00\x01
\x01\x01\x01\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00A\x02\x00\x00\x00'@\x01\x01\x00@\x02\x00@\x03\x04\n\x10\x8E\xC8\x80\x04\x04\x00\x00\x002@\x05\x04\x00\x00\x00d\xC0\b\b\x00\x01\x00\x01\x00\x01\x00\x02\x10\x03\x03\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00d\x02\x00\x00\x00I@\x01\x01\x00@\x02\x1E\x02\x0E=\xD6R\x132H2H2H2H2H2H2H2H\x8A\xEA\x8A\xEA\x8A\xEA\x8A\xEA@\x03\x04\n\x10\x8E\n\x80\x04\x04\x00\x00\x00\x00@\x05\x04\x00\x00\x00d\xC0\b\f=\xD6\x01,=\xD6\x011=\xD6\v\xEB\x16.\xAE\xF0\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00N\x02\x00\x00\x003@\x01\x01\x00@\x02\b\x02\x03=\xD6R\xE3\xC0\x1F@\x03\x04\n\x10\x8E\n\x80\x04\x04\x00\x00\x00\x00@\x05\x04\x00\x00\x00d\xC0\b\f=\xD6\x01,=\xD6\x011=\xD6\v\xEB\x16.\xAD\xA8

7stud -- · Apr 20, 2011

hroyd hroyd wrote in post #993957:

Hello

First post (i am new to ruby ). Can you help?

I am using eventmachine to read in TCP segments off the network. I read
in a TCP segment that contains 4 messages. The TCP segment binary data
is shown below, where
\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\ is the
marker for each message. I would like to split the data intot he 4
messages, but am having trouble doing so. When I split the data, the
whole message gets inserted into the first array element.

I'm not seeing that. Your message starts with the delimiter, so the
first element of the array will be a blank string:

str = "\xFF\xFF" +
"\x61" +
"\xFF\xFF" +
"\x62" +
"\xFF\xFF" +
"\x63" +
"\xFF\xFF" +
"\x64"

pattern = "\xFF\xFF"
p str.split(pattern)

--output:--
["", "a", "b", "c", "d"]

hroyd hroyd · Apr 21, 2011

Thanks for the reply, that works

I was trying to split on

"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\"

but dropping the last \ was what I was missing

"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF"

["",
"\x00R\x02\x00\x00\x00'@\x01\x01\x00@\x02\x00@\x03\x04\n\x10\x8E\xC8\x80\x04\x04\x00\x00\x002@\x05\x04\x00\x00\x00d\xC0\b\b\x00\x01\x00\x01\x00\x01\x00\x02\x18\n\x10\x8E\b\x04\x18\x02\x02\x02
\n\x13\x00\x01 \x01\x01\x01\x01",
"\x00A\x02\x00\x00\x00'@\x01\x01\x00@\x02\x00@\x03\x04\n\x10\x8E\xC8\x80\x04\x04\x00\x00\x002@\x05\x04\x00\x00\x00d\xC0\b\b\x00\x01\x00\x01\x00\x01\x00\x02\x10\x03\x03",
"\x00d\x02\x00\x00\x00I@\x01\x01\x00@\x02\x1E\x02\x0E=\xD6R\x132H2H2H2H2H2H2H2H\x8A\xEA\x8A\xEA\x8A\xEA\x8A\xEA@\x03\x04\n\x10\x8E\n\x80\x04\x04\x00\x00\x00\x00@\x05\x04\x00\x00\x00d\xC0\b\f=\xD6\x01,=\xD6\x011=\xD6\v\xEB\x16.\xAE\xF0",
"\x00N\x02\x00\x00\x003@\x01\x01\x00@\x02\b\x02\x03=\xD6R\xE3\xC0\x1F@\x03\x04\n\x10\x8E\n\x80\x04\x04\x00\x00\x00\x00@\x05\x04\x00\x00\x00d\xC0\b\f=\xD6\x01,=\xD6\x011=\xD6\v\xEB\x16.\xAD\xA8"]

Thanks for your help

IÃ±aki Baz Castillo · Apr 21, 2011

2011/4/20 7stud -- said:
str =3D "\xFF\xFF" +
=C2=A0 =C2=A0 =C2=A0"\x61" +
=C2=A0 =C2=A0 =C2=A0"\xFF\xFF" +
=C2=A0 =C2=A0 =C2=A0"\x62" +
=C2=A0 =C2=A0 =C2=A0"\xFF\xFF" +
=C2=A0 =C2=A0 =C2=A0"\x63" +
=C2=A0 =C2=A0 =C2=A0"\xFF\xFF" +
=C2=A0 =C2=A0 =C2=A0"\x64"

pattern =3D "\xFF\xFF"
p str.split(pattern)

--output:--
["", "a", "b", "c", "d"]

Note that this fails under Ruby1.9:

p str.split(pattern)
ArgumentError: invalid byte sequence in UTF-8
from (irb):10:in `split'

--=20
I=C3=B1aki Baz Castillo
<[email protected]>

7stud -- · Apr 21, 2011

I=C3=B1aki Baz Castillo said:
2011/4/20 7stud -- said:

p str.split(pattern)

--output:--
["", "a", "b", "c", "d"]

Click to expand...

Note that this fails under Ruby1.9:

p str.split(pattern)
ArgumentError: invalid byte sequence in UTF-8
from (irb):10:in `split'

I guess you missed this:

puts RUBY_VERSION

...
...
...

--output:--
1.9.2

-- =

Posted via http://www.ruby-forum.com/.=

7stud -- · Apr 21, 2011

hroyd hroyd wrote in post #994257:

Thanks for your help

Sure. Also, note that ruby lets you do this:

pattern = "\xFF" * 16
p pattern

--output:--
"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF"

...so that you don't have to write that out by hand.

IÃ±aki Baz Castillo · Apr 21, 2011

2011/4/21 7stud -- said:
I guess you missed this:

puts RUBY_VERSION

...

--output:--
1.9.2

Interesting, I also use 1.9.2, but have realized that it fails under
irb, but not in case I run the above code in a separate file.

--=20
I=C3=B1aki Baz Castillo
<[email protected]>

7stud -- · Apr 21, 2011

I=C3=B1aki Baz Castillo said:
Interesting, I also use 1.9.2, but have realized that it fails under
irb, but not in case I run the above code in a separate file.a

I never use irb like interfaces in any language anymore--they are =

unreliable.

-- =

Posted via http://www.ruby-forum.com/.=

Y. NOBUOKA · Apr 26, 2011

On ruby 1.9, a String object knows the encoding of itself.
And, If a String object includes byte sequences unsuitable for the encoding,
the String#split method raises error.

Not using the magic comment, it's not the matter that a string literal includes
non-ASCII characters.

## example: OK!!
#-------------------------------------------------
#! ruby-1.9.2

str = "\xFF\xFF\x61\xFF\xFF\x62\xFF\xFF\x63\xFF\xFF\x64"
p str.encoding #=> #<Encoding:ASCII-8BIT>
p str.valid_encoding? #=> true

pattern = "\xFF\xFF"
p str.split( pattern ) #=> ["", "a", "b", "c", "d"]
#-------------------------------------------------

However, using the magic comment to tell the file encoding is UTF-8,
it's the matter that a string literal includes non-ASCII characters.

## example: NG
#-------------------------------------------------
#! ruby-1.9.2
# coding: UTF-8

str = "\xFF\xFF\x61\xFF\xFF\x62\xFF\xFF\x63\xFF\xFF\x64"
p str.encoding #=> #<Encoding:UTF-8>
p str.valid_encoding? #=> false

pattern = "\xFF\xFF"
p pattern.valid_encoding? #=> false
p str.split( pattern ) # ERROR OCCURS!!!
#-------------------------------------------------

Avoiding this problem, you must change the encoding of the string which include
non-ASCII characters into ASCII-8BIT.

## example: avoiding the problem
#-------------------------------------------------
#! ruby-1.9.2
# coding: UTF-8

str = "\xFF\xFF\x61\xFF\xFF\x62\xFF\xFF\x63\xFF\xFF\x64"
# change the encoding of the string
str.force_encoding Encoding::ASCII_8BIT
p str.encoding #=> #<Encoding:ASCII-8BIT>
p str.valid_encoding? #=> true

pattern = "\xFF\xFF".force_encoding Encoding::ASCII_8BIT
p pattern.valid_encoding? #=> true
p str.split( pattern ) #=> ["", "a", "b", "c", "d"]
#-------------------------------------------------

Kind regards,

Output confusion	2	Mar 9, 2023
Porting a c program	11	May 9, 2009
Anyone can give some instructions on the function of this asm?	7	Mar 2, 2006
problem with logic in reading a binary file	9	Mar 29, 2008
Changing baud rate doesn't allow second command	5	Apr 25, 2011
netlink messages	0	Jun 11, 2007
read from bin file	2	Dec 17, 2009
windows active directory ldap output encoding	2	Jul 8, 2008

splitting binary data

hroyd hroyd

7stud --

hroyd hroyd

IÃ±aki Baz Castillo

7stud --

7stud --

IÃ±aki Baz Castillo

7stud --

Y. NOBUOKA

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads