splitting binary data

H

hroyd hroyd

Hello

First post (i am new to ruby :)). Can you help?

I am using eventmachine to read in TCP segments off the network. I read
in a TCP segment that contains 4 messages. The TCP segment binary data
is shown below, where
\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\ is the
marker for each message. I would like to split the data intot he 4
messages, but am having trouble doing so. When I split the data, the
whole message gets inserted into the first array element. I understand I
may need to escape the \, but how would i do that for the following
message. I can split it by unpacking to Hex, and the splitting, but that
is inefficient for my needs as I use bindata to inspect the packet. Any
help is appreciated

Thanks


\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00R\x02\x00\x00\x00'@\x01\x01\x00@\x02\x00@\x03\x04\n\x10\x8E\xC8\x80\x04\x04\x00\x00\x002@\x05\x04\x00\x00\x00d\xC0\b\b\x00\x01\x00\x01\x00\x01\x00\x02\x18\n\x10\x8E\b\x04\x18\x02\x02\x02
\n\x13\x00\x01
\x01\x01\x01\x01\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00A\x02\x00\x00\x00'@\x01\x01\x00@\x02\x00@\x03\x04\n\x10\x8E\xC8\x80\x04\x04\x00\x00\x002@\x05\x04\x00\x00\x00d\xC0\b\b\x00\x01\x00\x01\x00\x01\x00\x02\x10\x03\x03\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00d\x02\x00\x00\x00I@\x01\x01\x00@\x02\x1E\x02\x0E=\xD6R\x132H2H2H2H2H2H2H2H\x8A\xEA\x8A\xEA\x8A\xEA\x8A\xEA@\x03\x04\n\x10\x8E\n\x80\x04\x04\x00\x00\x00\x00@\x05\x04\x00\x00\x00d\xC0\b\f=\xD6\x01,=\xD6\x011=\xD6\v\xEB\x16.\xAE\xF0\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\x00N\x02\x00\x00\x003@\x01\x01\x00@\x02\b\x02\x03=\xD6R\xE3\xC0\x1F@\x03\x04\n\x10\x8E\n\x80\x04\x04\x00\x00\x00\x00@\x05\x04\x00\x00\x00d\xC0\b\f=\xD6\x01,=\xD6\x011=\xD6\v\xEB\x16.\xAD\xA8
 
7

7stud --

hroyd hroyd wrote in post #993957:
Hello

First post (i am new to ruby :)). Can you help?

I am using eventmachine to read in TCP segments off the network. I read
in a TCP segment that contains 4 messages. The TCP segment binary data
is shown below, where
\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\ is the
marker for each message. I would like to split the data intot he 4
messages, but am having trouble doing so. When I split the data, the
whole message gets inserted into the first array element.

I'm not seeing that. Your message starts with the delimiter, so the
first element of the array will be a blank string:

str = "\xFF\xFF" +
"\x61" +
"\xFF\xFF" +
"\x62" +
"\xFF\xFF" +
"\x63" +
"\xFF\xFF" +
"\x64"

pattern = "\xFF\xFF"
p str.split(pattern)

--output:--
["", "a", "b", "c", "d"]
 
H

hroyd hroyd

Thanks for the reply, that works

I was trying to split on

"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\"

but dropping the last \ was what I was missing

"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF"

["",
"\x00R\x02\x00\x00\x00'@\x01\x01\x00@\x02\x00@\x03\x04\n\x10\x8E\xC8\x80\x04\x04\x00\x00\x002@\x05\x04\x00\x00\x00d\xC0\b\b\x00\x01\x00\x01\x00\x01\x00\x02\x18\n\x10\x8E\b\x04\x18\x02\x02\x02
\n\x13\x00\x01 \x01\x01\x01\x01",
"\x00A\x02\x00\x00\x00'@\x01\x01\x00@\x02\x00@\x03\x04\n\x10\x8E\xC8\x80\x04\x04\x00\x00\x002@\x05\x04\x00\x00\x00d\xC0\b\b\x00\x01\x00\x01\x00\x01\x00\x02\x10\x03\x03",
"\x00d\x02\x00\x00\x00I@\x01\x01\x00@\x02\x1E\x02\x0E=\xD6R\x132H2H2H2H2H2H2H2H\x8A\xEA\x8A\xEA\x8A\xEA\x8A\xEA@\x03\x04\n\x10\x8E\n\x80\x04\x04\x00\x00\x00\x00@\x05\x04\x00\x00\x00d\xC0\b\f=\xD6\x01,=\xD6\x011=\xD6\v\xEB\x16.\xAE\xF0",
"\x00N\x02\x00\x00\x003@\x01\x01\x00@\x02\b\x02\x03=\xD6R\xE3\xC0\x1F@\x03\x04\n\x10\x8E\n\x80\x04\x04\x00\x00\x00\x00@\x05\x04\x00\x00\x00d\xC0\b\f=\xD6\x01,=\xD6\x011=\xD6\v\xEB\x16.\xAD\xA8"]

Thanks for your help
 
I

Iñaki Baz Castillo

2011/4/20 7stud -- said:
str =3D "\xFF\xFF" +
=C2=A0 =C2=A0 =C2=A0"\x61" +
=C2=A0 =C2=A0 =C2=A0"\xFF\xFF" +
=C2=A0 =C2=A0 =C2=A0"\x62" +
=C2=A0 =C2=A0 =C2=A0"\xFF\xFF" +
=C2=A0 =C2=A0 =C2=A0"\x63" +
=C2=A0 =C2=A0 =C2=A0"\xFF\xFF" +
=C2=A0 =C2=A0 =C2=A0"\x64"

pattern =3D "\xFF\xFF"
p str.split(pattern)

--output:--
["", "a", "b", "c", "d"]

Note that this fails under Ruby1.9:

p str.split(pattern)
ArgumentError: invalid byte sequence in UTF-8
from (irb):10:in `split'

--=20
I=C3=B1aki Baz Castillo
<[email protected]>
 
7

7stud --

hroyd hroyd wrote in post #994257:
Thanks for your help

Sure. Also, note that ruby lets you do this:

pattern = "\xFF" * 16
p pattern

--output:--
"\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF"


...so that you don't have to write that out by hand.
 
I

Iñaki Baz Castillo

2011/4/21 7stud -- said:
I guess you missed this:

puts RUBY_VERSION

...

--output:--
1.9.2


Interesting, I also use 1.9.2, but have realized that it fails under
irb, but not in case I run the above code in a separate file.

--=20
I=C3=B1aki Baz Castillo
<[email protected]>
 
Y

Y. NOBUOKA

On ruby 1.9, a String object knows the encoding of itself.
And, If a String object includes byte sequences unsuitable for the encoding,
the String#split method raises error.

Not using the magic comment, it's not the matter that a string literal includes
non-ASCII characters.

## example: OK!!
#-------------------------------------------------
#! ruby-1.9.2

str = "\xFF\xFF\x61\xFF\xFF\x62\xFF\xFF\x63\xFF\xFF\x64"
p str.encoding #=> #<Encoding:ASCII-8BIT>
p str.valid_encoding? #=> true

pattern = "\xFF\xFF"
p str.split( pattern ) #=> ["", "a", "b", "c", "d"]
#-------------------------------------------------

However, using the magic comment to tell the file encoding is UTF-8,
it's the matter that a string literal includes non-ASCII characters.

## example: NG
#-------------------------------------------------
#! ruby-1.9.2
# coding: UTF-8

str = "\xFF\xFF\x61\xFF\xFF\x62\xFF\xFF\x63\xFF\xFF\x64"
p str.encoding #=> #<Encoding:UTF-8>
p str.valid_encoding? #=> false

pattern = "\xFF\xFF"
p pattern.valid_encoding? #=> false
p str.split( pattern ) # ERROR OCCURS!!!
#-------------------------------------------------

Avoiding this problem, you must change the encoding of the string which include
non-ASCII characters into ASCII-8BIT.

## example: avoiding the problem
#-------------------------------------------------
#! ruby-1.9.2
# coding: UTF-8

str = "\xFF\xFF\x61\xFF\xFF\x62\xFF\xFF\x63\xFF\xFF\x64"
# change the encoding of the string
str.force_encoding Encoding::ASCII_8BIT
p str.encoding #=> #<Encoding:ASCII-8BIT>
p str.valid_encoding? #=> true

pattern = "\xFF\xFF".force_encoding Encoding::ASCII_8BIT
p pattern.valid_encoding? #=> true
p str.split( pattern ) #=> ["", "a", "b", "c", "d"]
#-------------------------------------------------

Kind regards,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,967
Messages
2,570,148
Members
46,694
Latest member
LetaCadwal

Latest Threads

Top