I'm trying to read a binary file and would want to find within the
file a particular substring. I have done the following
unpack is useful when you need to turn binary data into usable
structure. Just use a regular expression:
open 'testfile', 'rb' do |io|
if io.read(512) =~ /\020/ then
puts "The tag was found"
else
puts "The tag was not found"
end
end
Hi,
thanks for your response, but it didn't solve my problem, so I guess I
didn't explain it correctly.
I'll give it another go.
I have a binary file.
If I look at this file using hexdump -Cn 512 testfile
it looks like this:
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|................|
*
00000080 44 49 43 4d 02 00 00 00 55 4c 04 00 c2 00 00 00 |
DICM....UL......|
00000090 02 00 01 00 4f 42 00 00 02 00 00 00 00 01 02 00
|....OB..........|
000000a0 02 00 55 49 1a 00 31 2e 32 2e 38 34 30 2e 31 30 |..UI..
1.2.840.10|
000000b0 30 30 38 2e 35 2e 31 2e 34 2e 31 2e 31 2e 32 00 |
008.5.1.4.1.1.2.|
000000c0 02 00 03 00 55 49 3c 00 32 2e 31 36 2e 38 34 30 |....UI<.
2.16.840|
000000d0 2e 31 2e 31 31 33 36 36 32 2e 32 2e 31 2e 34 35 |.
1.113662.2.1.45|
000000e0 31 39 2e 34 31 35 38 32 2e 34 31 30 35 31 35 32 |
19.41582.4105152|
000000f0 2e 34 31 39 39 39 30 35 30 35 2e 34 31 30 35 32 |.
419990505.41052|
00000100 33 32 35 31 02 00 10 00 55 49 14 00 31 2e 32 2e |
3251....UI..1.2.|
00000110 38 34 30 2e 31 30 30 30 38 2e 31 2e 32 2e 31 00 |
840.10008.1.2.1.|
00000120 02 00 12 00 55 49 18 00 32 2e 31 36 2e 38 34 30 |....UI..
2.16.840|
00000130 2e 31 2e 31 31 33 36 36 32 2e 32 2e 31 2e 31 00 |.
1.113662.2.1.1.|
00000140 02 00 16 00 41 45 0a 00 50 48 4f 45 4e 49 58 53
|....AE..PHOENIXS|
00000150 43 50 08 00 00 00 55 4c 04 00 54 02 00 00 08 00 |
CP....UL..T.....|
00000160 05 00 43 53 0a 00 49 53 4f 5f 49 52 20 31 30 30
|..CS..ISO_IR 100|
00000170 08 00 08 00 43 53 16 00 4f 52 49 47 49 4e 41 4c
|....CS..ORIGINAL|
00000180 5c 50 52 49 4d 41 52 59 5c 41 58 49 41 4c 08 00 |\PRIMARY
\AXIAL..|
00000190 12 00 44 41 0a 00 31 39 39 39 2e 30 35 2e 30 35 |..DA..
1999.05.05|
000001a0 08 00 13 00 54 4d 10 00 31 30 3a 35 32 3a 33 34 |....TM..
10:52:34|
000001b0 2e 35 33 30 30 30 30 20 08 00 16 00 55 49 1a 00 |.
530000 ....UI..|
000001c0 31 2e 32 2e 38 34 30 2e 31 30 30 30 38 2e 35 2e |
1.2.840.10008.5.|
000001d0 31 2e 34 2e 31 2e 31 2e 32 00 08 00 18 00 55 49 |
1.4.1.1.2.....UI|
000001e0 3c 00 32 2e 31 36 2e 38 34 30 2e 31 2e 31 31 33 |<.
2.16.840.1.113|
000001f0 36 36 32 2e 32 2e 31 2e 34 35 31 39 2e 34 31 35 |
662.2.1.4519.415|
00000200
What I want to do is to find the group 00201000 in the hexadecimal
representation. It's in the 11th line of output
00000100 33 32 35 31 02 00 10 00 55 49 14 00 31 2e 32 2e |
3251....UI..1.2.|
So I thought,
I open the file and read a bit into str
file = File.open("testFile","rb")
str = String.new
file.read(512, str)
then I unpack the str into another string interpreting the bytes as
hexadecimal representations
strhex = str.unpack("H*")
and look for the desired group within this "transformed" string
if (strhex.include?("02001000")) == true then
puts "The tag was found"
else
puts "The tag was not found"
end
As I said, this doesn't work (it reports the tag was not found!) and I
wonder whether it has something to do with the fact that
strhex.length # -> 1
By the way, if after unpacking I do
puts strhex
then I get the correct string, that is exactly the same as hexdump
shows me on the left-hand side.
So strhex seems to contain what I need, but still there is some kind
of problem.
Any hints?
Thank you very much
Cheers
Rafael