search for hex characters in a binary file and remove them

venkateshwar D · Aug 18, 2009

Hi All,

I need to look for a sequence of hex characters in a binary file and
remove them. the binary file has 00 00 02 02 01 00 sequence somewhere
in the file.
The script should open the file and look for this sequence 00 00 02 02
01 00 <18 variable bytes> and remove the 18 + 6 = 24 bytes from the
file.can someone please help. I can open the binary file and buffer
byte by byte but since the pattern can be anywhere in the file i dont
know how to proceed

regards
venkat

sln · Aug 18, 2009

Hi All,

I need to look for a sequence of hex characters in a binary file and
remove them. the binary file has 00 00 02 02 01 00 sequence somewhere
in the file.
The script should open the file and look for this sequence 00 00 02 02
01 00 <18 variable bytes> and remove the 18 + 6 = 24 bytes from the
file.can someone please help. I can open the binary file and buffer
byte by byte but since the pattern can be anywhere in the file i dont
know how to proceed

regards
venkat

Hex characters? Like [a-f0-9] ? Or integers?

$sequence = " 00 00 02 02 01 00";

open my $fin, '<:raw', 'filename.in' or die "can't open input file: $!";
open my $fout, '>:raw', 'filename.out' or die "can't open output file: $!";
{
local $/;
$buf = <$fin>;
$buf =~ s/$sequence//;
print $fout $buf;
}
close $fout;
close $fin;

-sln

sln · Aug 18, 2009

Hi All,

I need to look for a sequence of hex characters in a binary file and
remove them. the binary file has 00 00 02 02 01 00 sequence somewhere
in the file.
The script should open the file and look for this sequence 00 00 02 02
01 00 <18 variable bytes> and remove the 18 + 6 = 24 bytes from the
file.can someone please help. I can open the binary file and buffer
byte by byte but since the pattern can be anywhere in the file i dont
know how to proceed

regards
venkat

Click to expand...

Hex characters? Like [a-f0-9] ? Or integers?

$sequence = " 00 00 02 02 01 00";

open my $fin, '<:raw', 'filename.in' or die "can't open input file: $!";
open my $fout, '>:raw', 'filename.out' or die "can't open output file: $!";
{
local $/;
$buf = <$fin>;
$buf =~ s/$sequence//;

^^
$buf =~ s/$sequence.{6}//gs;

The 6 bytes after the sequence as well?
-sln

venkateshwar D · Aug 18, 2009

Hex characters? Like [a-f0-9] ? Or integers?

Click to expand...

$sequence = " 00 00 02 02 01 00";

Click to expand...

open my $fin, '<:raw', 'filename.in' or die "can't open input file: $!";
open my $fout, '>:raw', 'filename.out' or die "can't open output file: $!";
{
local $/;
$buf = <$fin>;
$buf =~ s/$sequence//;

Click to expand...

^^
$buf =~ s/$sequence.{6}//gs;

The 6 bytes after the sequence as well?
-sln- Hide quoted text -

- Show quoted text -

Hi

Thanks a lot. This does not seem to be working. it is doing a binary
file copy.

I want to search for that pattern in the binary file (000002020100)
(it is hex character file) and remove this pattern + the next 18 bytes
in the file.
thanks
venkat

sln · Aug 18, 2009

Hex characters? Like [a-f0-9] ? Or integers?

Click to expand...

$sequence = " 00 00 02 02 01 00";

Click to expand...

open my $fin, '<:raw', 'filename.in' or die "can't open input file: $!";
open my $fout, '>:raw', 'filename.out' or die "can't open output file: $!";

Click to expand...

- Show quoted text -

Click to expand...

Hi

Thanks a lot. This does not seem to be working. it is doing a binary
file copy.

I want to search for that pattern in the binary file (000002020100)
(it is hex character file) and remove this pattern + the next 18 bytes
in the file.
thanks
venkat

I don't understand what you mean. Opening the file in ':raw' mode
takes away any CRLF translations and or possible encoding.
Your free to read it as bytes then.

Surely "000002020100" as text can be represented in a regular expression.
Regular expressions are all about 'text'.
Each character there has an ordinal value that you would consider binary.

If you are instead looking for binary value, 0 value would be \x{0}
character, 2 is \x{2}.

If its text, just look for the sequence + the next 18 bytes:

=~ s/000002020100.{18}//s

After the buffer is modified, write it out to a different ':raw' file,
where no translations will take place.

You can get the same affect in translated mode just make sure the buffer
isin't upgraded to utf8.

-sln

sln · Aug 18, 2009

[email protected] said:
[email protected] said:

Hex characters? Like [a-f0-9] ? Or integers?

$sequence = " 00 00 02 02 01 00";

open my $fin, '<:raw', 'filename.in' or die "can't open input file: $!";
open my $fout, '>:raw', 'filename.out' or die "can't open output file: $!";
{
local $/;
$buf = <$fin>;
$buf =~ s/$sequence//;
print $fout $buf;
}
close $fout;
close $fin;

Click to expand...

For extra merit, make it work without reading
the whole file into ram at once ;-)

BugBear

Double buffer, something like this then:

open my $fin, '<:raw', 'filename.in' or die "can't open input file: $!";
open my $fout, '>:raw', 'filename.out' or die "can't open output file: $!";

$keep = 50;
{
local $/ = \4092;
($buf,$block) = ('','');

while (defined ($block = <$fin>))
{
$buf .= $block;
$keep = 0 if ($keep and $buf =~ s/000002020100.{18}//s);
print $fout substr( $buf, 0, length($buf)-$keep, "");
}
}
close $fout;
close $fin;
======================

Or, a little more efficient, but this may actually be slower:

$keep = 50;
{
local $/ = \4092;
($buf,$block) = ('','');
$bref = \$block;

while (defined ($$bref = <$fin>))
{
if ($keep)
{
$buf .= $block;
if ($buf =~ s/000002020100.{18}//s) {
$keep = 0;
$bref = \$buf;
}
print $fout substr( $buf, 0, length($buf)-$keep, "");
next;
}
print $fout $buf;
}
}
close $fout;
close $fin;

======================
-sln

sln · Aug 18, 2009

Double buffer, something like this then:

$keep = 50;
{
}

Of course you have to check $keep or $buf here
incase nothing was found, but if it wasn't found, the
output will match the input file, so invalid results:

print $fout $buf if $keep;

close $fout;
close $fin;

-sln

sln · Aug 18, 2009

[email protected] said:
[email protected] said:

Hex characters? Like [a-f0-9] ? Or integers?

$sequence = " 00 00 02 02 01 00";

open my $fin, '<:raw', 'filename.in' or die "can't open input file: $!";
open my $fout, '>:raw', 'filename.out' or die "can't open output file: $!";
{
local $/;
$buf = <$fin>;
$buf =~ s/$sequence//;
print $fout $buf;
}
close $fout;
close $fin;

Click to expand...

For extra merit, make it work without reading
the whole file into ram at once ;-)

BugBear

haha, I can do than. special buffering, and algo.
-sln

sln · Aug 19, 2009

Thanks a lot. This does not seem to be working. it is doing a binary
file copy.

I want to search for that pattern in the binary file (000002020100)
(it is hex character file) and remove this pattern + the next 18 bytes
in the file.
thanks
venkat

How did you make out, any luck?
Try this sample and see if it is similar to what you have.

-sln
------------------------
use strict;
use warnings;

open my $ftest, '>', 'dummy.txt' or die "can't create dummy.txt: $!";
for (1 .. 2_000)
{
print $ftest "$_ 0000000000000000000 111111111111111111111\n";
}
print $ftest "sequence line: 0000000000000000000 <000002020100555555555555555555>111\n";
for (2_001 .. 4_000)
{
print $ftest "$_ 0000000000000000000 111111111111111111111\n";
}
close $ftest;

open my $fin, '<:raw', 'dummy.txt' or die "can't open input file: $!";
open my $fout, '>:raw', 'dummy_o.txt' or die "can't open output file: $!";

my ($chunksize, $found) = (4096,0);
{
local $/ = \$chunksize;

my ($keep, $buf, $data) = (50,'','');

while (defined ($data = <$fin>))
{
$buf .= $data;
$found = 1 if (not $found and $buf =~ s/000002020100.{18}//s);
print $fout substr( $buf, 0, -$keep, "");
}
print $fout $buf;
}
if (!$found) {
print "Did not match sequence: '000002020100.{18}'\n";
}

close $fout;
close $fin;

__END__

Josef Moellers · Aug 20, 2009

venkateshwar said:
Hi All,

I need to look for a sequence of hex characters in a binary file and
remove them. the binary file has 00 00 02 02 01 00 sequence somewhere
in the file.
The script should open the file and look for this sequence 00 00 02 02
01 00 <18 variable bytes> and remove the 18 + 6 = 24 bytes from the
file.can someone please help. I can open the binary file and buffer
byte by byte but since the pattern can be anywhere in the file i dont
know how to proceed

I've done this a couple of times in order to find some embedded files in
some documents (most often to find images in xls, doc, ppt, ...),
although I usually discard whatever is not of interest to me.

You have to read the file byte-by-byte and check for the header:
(Untested Code follows!)

my $special = pack('C*', 0x00, 0x00, 0x02, 0x02, 0x01, 0x00);
open(my $src, '<', $srcname) or die "$0: cannot open $srcname: $!\n";
open(my $dst, '>', $dstname) or die "$0: Cannot create $dstname: $!\n";
binmode $src;
my $buf;
read($src, $buf, length($special));
while (1) {
if ($buf eq $special) {
seek($src, 18, 0);
last if read($src, $buf, length($special)) != length($special);
next;
}
print $dst substr($buf, 1, 1);
substr($buf, 1, 1, '');
last if read($src, $buf, 1, -1) != 1;
}
print $dst $buf;
close($src);
close($dst);

HTH,

Josef

sln · Aug 20, 2009

I've done this a couple of times in order to find some embedded files in
some documents (most often to find images in xls, doc, ppt, ...),
although I usually discard whatever is not of interest to me.

You have to read the file byte-by-byte and check for the header:
(Untested Code follows!)

my $special = pack('C*', 0x00, 0x00, 0x02, 0x02, 0x01, 0x00);

Why would you have to read the file a byte at a time and check
for the header? You store binary (byte) data in a buffer then use 'eq'
as if it is a character, but you won't trust a regular expression which
would do the same thing.

The file could be slurped into a buffer then checked with a regular expression
or it could be read in a chunk at a time, checked, then the chunk rolled out
of the buffer minus the width of the sequence plus 18 bytes. The next chunk
is appended, then the process repeats until its found.

I put up an example how to do this.
The proof that this works is using the same method you use but
instead of read 1, is it 'eq', etc.., uses a regular expression
on a chunk of bytes.

Perl defaults to bytes in regex, it will upgrade the context to
utf8 if anything in the expression forces it to. In this case it
doesen't, the sequence is byte context (ie: less than 0x100).
The file is opened in binary mode, its byte context.

-sln

use strict;
use warnings;

my $special = pack('C*', 0x00, 0x00, 0x02, 0x02, 0x01, 0x00);
my $bytes = '';

for (1 .. 12_000) {
if ($_ == 6000) {
$bytes .= $special;
} else {
$bytes .= chr(int(rand(256)) & 0xff);
}
}
print "buf len = ".length($bytes)."\n";
my $posn = 0;
if ($bytes =~ s/($special)(.{18})/$posn = pos($bytes); ''/es) {
print "Found special at position ".$posn.": ".ordsplit($1)."\n";
print "Next 18 bytes : ".ordsplit($2)."\n";
print "Special + 18 bytes, removed!\n";
}
print "buf len = ".length($bytes)."\n";
sub ordsplit
{
my $string = shift;
my $buf = '';
for (map {ord $_} split //, $string) {
$buf.= sprintf ("%02x ",$_);
}
return $buf;
}
__END__

buf len = 12005
Found special at position 5999: 00 00 02 02 01 00
Next 18 bytes : de b9 70 b9 4b b9 4c 9f 1d f3 de 33 52 00 26 a7
50 41
Special + 18 bytes, removed!
buf len = 11981

sln · Aug 20, 2009

Hi All,

I need to look for a sequence of hex characters in a binary file and
remove them. the binary file has 00 00 02 02 01 00 sequence somewhere
in the file.
The script should open the file and look for this sequence 00 00 02 02
01 00 <18 variable bytes> and remove the 18 + 6 = 24 bytes from the
file.can someone please help. I can open the binary file and buffer
byte by byte but since the pattern can be anywhere in the file i dont
know how to proceed

regards
venkat

Here's the same example in binary mode (ie: the dummy file
is random binary, with the binary sequence embedded).
If this doesen't work for you, something else is wrong.

-sln
-------------------------

use strict;
use warnings;

my $sequence = "\x{00}\x{00}\x{02}\x{02}\x{01}\x{00}";
# or = pack('C*', 0x00, 0x00, 0x02, 0x02, 0x01, 0x00);

# Create dummy random binary file with embeded sequence
# ##
open my $ftest, '>:raw', 'dummy.bin' or die "can't create dummy.bin: $!";
for (1 .. 12_000) {
if ($_ == 2000) {
print $ftest $sequence;
} else {
print $ftest chr(int(rand(256)) & 0xff);
}
}
close $ftest;

# Read in binary, look for sequence, remove then write to file
# ##
open my $fin, '<:raw', 'dummy.bin' or die "can't open input file: $!";
open my $fout, '>:raw', 'dummy_o.bin' or die "can't open output file: $!";
my ($chunksize, $found) = (1024,0);
{
local $/ = \$chunksize;
my ($keep, $buf, $data) = (50,'','');
while (defined ($data = <$fin>)) {
$buf .= $data;
if (!$found) {
if ($buf =~ s/($sequence)(.{18})//s) {
print "Found sequence: ".ordsplit($1)."\n";
print "Next 18 bytes : ".ordsplit($2)."\n";
print "Sequence + 18 bytes, removed!\n";
$found = 1;
}
}
print $fout substr( $buf, 0, -$keep, "");
}
print $fout $buf;
}
if (!$found) {
print "Did not match sequence: '\$sequence.{18}'\n";
}
close $fout;
close $fin;

## End of program
exit 0;

sub ordsplit
{
my $string = shift;
my $buf = '';
for (map {ord $_} split //, $string) {
$buf.= sprintf ("%02x ",$_);
}
return $buf;
}

__END__

Found sequence: 00 00 02 02 01 00
Next 18 bytes : 25 6f e4 7e 6e fb fe 1e 47 af e6 2e 50 3f 31 54
dd 51
Sequence + 18 bytes, removed!

pack and hex	9	Oct 3, 2008
Converting hex to decimal when printed?	2	Oct 10, 2007
Data saving in condition of changing reality	0	Apr 29, 2022
Search regular expression with search for hex values in files?	1	Jan 6, 2008
Character set woes with binary data	0	Apr 1, 2007
search and replace in a binary file	6	Jul 22, 2006
ASCII characters in a string gets converted, why?	1	Dec 18, 2008
Noobie: Open file -> read characters & multiply	6	Dec 26, 2006

search for hex characters in a binary file and remove them

venkateshwar D

sln

sln

venkateshwar D

sln

sln

sln

sln

sln

Josef Moellers

sln

sln

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads