search for hex characters in a binary file and remove them

V

venkateshwar D

Hi All,

I need to look for a sequence of hex characters in a binary file and
remove them. the binary file has 00 00 02 02 01 00 sequence somewhere
in the file.
The script should open the file and look for this sequence 00 00 02 02
01 00 <18 variable bytes> and remove the 18 + 6 = 24 bytes from the
file.can someone please help. I can open the binary file and buffer
byte by byte but since the pattern can be anywhere in the file i dont
know how to proceed

regards
venkat
 
S

sln

Hi All,

I need to look for a sequence of hex characters in a binary file and
remove them. the binary file has 00 00 02 02 01 00 sequence somewhere
in the file.
The script should open the file and look for this sequence 00 00 02 02
01 00 <18 variable bytes> and remove the 18 + 6 = 24 bytes from the
file.can someone please help. I can open the binary file and buffer
byte by byte but since the pattern can be anywhere in the file i dont
know how to proceed

regards
venkat

Hex characters? Like [a-f0-9] ? Or integers?

$sequence = " 00 00 02 02 01 00";

open my $fin, '<:raw', 'filename.in' or die "can't open input file: $!";
open my $fout, '>:raw', 'filename.out' or die "can't open output file: $!";
{
local $/;
$buf = <$fin>;
$buf =~ s/$sequence//;
print $fout $buf;
}
close $fout;
close $fin;

-sln
 
S

sln

Hi All,

I need to look for a sequence of hex characters in a binary file and
remove them. the binary file has 00 00 02 02 01 00 sequence somewhere
in the file.
The script should open the file and look for this sequence 00 00 02 02
01 00 <18 variable bytes> and remove the 18 + 6 = 24 bytes from the
file.can someone please help. I can open the binary file and buffer
byte by byte but since the pattern can be anywhere in the file i dont
know how to proceed

regards
venkat

Hex characters? Like [a-f0-9] ? Or integers?

$sequence = " 00 00 02 02 01 00";

open my $fin, '<:raw', 'filename.in' or die "can't open input file: $!";
open my $fout, '>:raw', 'filename.out' or die "can't open output file: $!";
{
local $/;
$buf = <$fin>;
$buf =~ s/$sequence//;
^^
$buf =~ s/$sequence.{6}//gs;

The 6 bytes after the sequence as well?
-sln
 
V

venkateshwar D

Hex characters? Like [a-f0-9] ? Or integers?
$sequence = " 00 00 02 02 01 00";
open my $fin, '<:raw', 'filename.in' or die "can't open input file: $!";
open my $fout, '>:raw', 'filename.out' or die "can't open output file: $!";
{
  local $/;
  $buf = <$fin>;
  $buf =~ s/$sequence//;

                  ^^
   $buf =~ s/$sequence.{6}//gs;

The 6 bytes after the sequence as well?
-sln- Hide quoted text -

- Show quoted text -

Hi

Thanks a lot. This does not seem to be working. it is doing a binary
file copy.

I want to search for that pattern in the binary file (000002020100)
(it is hex character file) and remove this pattern + the next 18 bytes
in the file.
thanks
venkat
 
S

sln

Hex characters? Like [a-f0-9] ? Or integers?
$sequence = " 00 00 02 02 01 00";
open my $fin, '<:raw', 'filename.in' or die "can't open input file: $!";
open my $fout, '>:raw', 'filename.out' or die "can't open output file: $!";

- Show quoted text -

Hi

Thanks a lot. This does not seem to be working. it is doing a binary
file copy.

I want to search for that pattern in the binary file (000002020100)
(it is hex character file) and remove this pattern + the next 18 bytes
in the file.
thanks
venkat

I don't understand what you mean. Opening the file in ':raw' mode
takes away any CRLF translations and or possible encoding.
Your free to read it as bytes then.

Surely "000002020100" as text can be represented in a regular expression.
Regular expressions are all about 'text'.
Each character there has an ordinal value that you would consider binary.

If you are instead looking for binary value, 0 value would be \x{0}
character, 2 is \x{2}.

If its text, just look for the sequence + the next 18 bytes:

=~ s/000002020100.{18}//s

After the buffer is modified, write it out to a different ':raw' file,
where no translations will take place.

You can get the same affect in translated mode just make sure the buffer
isin't upgraded to utf8.


-sln
 
S

sln

Hex characters? Like [a-f0-9] ? Or integers?

$sequence = " 00 00 02 02 01 00";

open my $fin, '<:raw', 'filename.in' or die "can't open input file: $!";
open my $fout, '>:raw', 'filename.out' or die "can't open output file: $!";
{
local $/;
$buf = <$fin>;
$buf =~ s/$sequence//;
print $fout $buf;
}
close $fout;
close $fin;

For extra merit, make it work without reading
the whole file into ram at once ;-)

BugBear

Double buffer, something like this then:

open my $fin, '<:raw', 'filename.in' or die "can't open input file: $!";
open my $fout, '>:raw', 'filename.out' or die "can't open output file: $!";

$keep = 50;
{
local $/ = \4092;
($buf,$block) = ('','');

while (defined ($block = <$fin>))
{
$buf .= $block;
$keep = 0 if ($keep and $buf =~ s/000002020100.{18}//s);
print $fout substr( $buf, 0, length($buf)-$keep, "");
}
}
close $fout;
close $fin;
======================

Or, a little more efficient, but this may actually be slower:

$keep = 50;
{
local $/ = \4092;
($buf,$block) = ('','');
$bref = \$block;

while (defined ($$bref = <$fin>))
{
if ($keep)
{
$buf .= $block;
if ($buf =~ s/000002020100.{18}//s) {
$keep = 0;
$bref = \$buf;
}
print $fout substr( $buf, 0, length($buf)-$keep, "");
next;
}
print $fout $buf;
}
}
close $fout;
close $fin;

======================
-sln
 
S

sln

Double buffer, something like this then:


$keep = 50;
{
}
Of course you have to check $keep or $buf here
incase nothing was found, but if it wasn't found, the
output will match the input file, so invalid results:

print $fout $buf if $keep;
close $fout;
close $fin;

-sln
 
S

sln

Hex characters? Like [a-f0-9] ? Or integers?

$sequence = " 00 00 02 02 01 00";

open my $fin, '<:raw', 'filename.in' or die "can't open input file: $!";
open my $fout, '>:raw', 'filename.out' or die "can't open output file: $!";
{
local $/;
$buf = <$fin>;
$buf =~ s/$sequence//;
print $fout $buf;
}
close $fout;
close $fin;

For extra merit, make it work without reading
the whole file into ram at once ;-)

BugBear

haha, I can do than. special buffering, and algo.
-sln
 
S

sln

Thanks a lot. This does not seem to be working. it is doing a binary
file copy.

I want to search for that pattern in the binary file (000002020100)
(it is hex character file) and remove this pattern + the next 18 bytes
in the file.
thanks
venkat

How did you make out, any luck?
Try this sample and see if it is similar to what you have.

-sln
------------------------
use strict;
use warnings;


open my $ftest, '>', 'dummy.txt' or die "can't create dummy.txt: $!";
for (1 .. 2_000)
{
print $ftest "$_ 0000000000000000000 111111111111111111111\n";
}
print $ftest "sequence line: 0000000000000000000 <000002020100555555555555555555>111\n";
for (2_001 .. 4_000)
{
print $ftest "$_ 0000000000000000000 111111111111111111111\n";
}
close $ftest;


open my $fin, '<:raw', 'dummy.txt' or die "can't open input file: $!";
open my $fout, '>:raw', 'dummy_o.txt' or die "can't open output file: $!";

my ($chunksize, $found) = (4096,0);
{
local $/ = \$chunksize;

my ($keep, $buf, $data) = (50,'','');

while (defined ($data = <$fin>))
{
$buf .= $data;
$found = 1 if (not $found and $buf =~ s/000002020100.{18}//s);
print $fout substr( $buf, 0, -$keep, "");
}
print $fout $buf;
}
if (!$found) {
print "Did not match sequence: '000002020100.{18}'\n";
}

close $fout;
close $fin;

__END__
 
J

Josef Moellers

venkateshwar said:
Hi All,

I need to look for a sequence of hex characters in a binary file and
remove them. the binary file has 00 00 02 02 01 00 sequence somewhere
in the file.
The script should open the file and look for this sequence 00 00 02 02
01 00 <18 variable bytes> and remove the 18 + 6 = 24 bytes from the
file.can someone please help. I can open the binary file and buffer
byte by byte but since the pattern can be anywhere in the file i dont
know how to proceed

I've done this a couple of times in order to find some embedded files in
some documents (most often to find images in xls, doc, ppt, ...),
although I usually discard whatever is not of interest to me.

You have to read the file byte-by-byte and check for the header:
(Untested Code follows!)

my $special = pack('C*', 0x00, 0x00, 0x02, 0x02, 0x01, 0x00);
open(my $src, '<', $srcname) or die "$0: cannot open $srcname: $!\n";
open(my $dst, '>', $dstname) or die "$0: Cannot create $dstname: $!\n";
binmode $src;
my $buf;
read($src, $buf, length($special));
while (1) {
if ($buf eq $special) {
seek($src, 18, 0);
last if read($src, $buf, length($special)) != length($special);
next;
}
print $dst substr($buf, 1, 1);
substr($buf, 1, 1, '');
last if read($src, $buf, 1, -1) != 1;
}
print $dst $buf;
close($src);
close($dst);

HTH,

Josef
 
S

sln

I've done this a couple of times in order to find some embedded files in
some documents (most often to find images in xls, doc, ppt, ...),
although I usually discard whatever is not of interest to me.

You have to read the file byte-by-byte and check for the header:
(Untested Code follows!)

my $special = pack('C*', 0x00, 0x00, 0x02, 0x02, 0x01, 0x00);

Why would you have to read the file a byte at a time and check
for the header? You store binary (byte) data in a buffer then use 'eq'
as if it is a character, but you won't trust a regular expression which
would do the same thing.

The file could be slurped into a buffer then checked with a regular expression
or it could be read in a chunk at a time, checked, then the chunk rolled out
of the buffer minus the width of the sequence plus 18 bytes. The next chunk
is appended, then the process repeats until its found.

I put up an example how to do this.
The proof that this works is using the same method you use but
instead of read 1, is it 'eq', etc.., uses a regular expression
on a chunk of bytes.

Perl defaults to bytes in regex, it will upgrade the context to
utf8 if anything in the expression forces it to. In this case it
doesen't, the sequence is byte context (ie: less than 0x100).
The file is opened in binary mode, its byte context.

-sln

use strict;
use warnings;

my $special = pack('C*', 0x00, 0x00, 0x02, 0x02, 0x01, 0x00);
my $bytes = '';

for (1 .. 12_000) {
if ($_ == 6000) {
$bytes .= $special;
} else {
$bytes .= chr(int(rand(256)) & 0xff);
}
}
print "buf len = ".length($bytes)."\n";
my $posn = 0;
if ($bytes =~ s/($special)(.{18})/$posn = pos($bytes); ''/es) {
print "Found special at position ".$posn.": ".ordsplit($1)."\n";
print "Next 18 bytes : ".ordsplit($2)."\n";
print "Special + 18 bytes, removed!\n";
}
print "buf len = ".length($bytes)."\n";
sub ordsplit
{
my $string = shift;
my $buf = '';
for (map {ord $_} split //, $string) {
$buf.= sprintf ("%02x ",$_);
}
return $buf;
}
__END__

buf len = 12005
Found special at position 5999: 00 00 02 02 01 00
Next 18 bytes : de b9 70 b9 4b b9 4c 9f 1d f3 de 33 52 00 26 a7
50 41
Special + 18 bytes, removed!
buf len = 11981
 
S

sln

Hi All,

I need to look for a sequence of hex characters in a binary file and
remove them. the binary file has 00 00 02 02 01 00 sequence somewhere
in the file.
The script should open the file and look for this sequence 00 00 02 02
01 00 <18 variable bytes> and remove the 18 + 6 = 24 bytes from the
file.can someone please help. I can open the binary file and buffer
byte by byte but since the pattern can be anywhere in the file i dont
know how to proceed

regards
venkat

Here's the same example in binary mode (ie: the dummy file
is random binary, with the binary sequence embedded).
If this doesen't work for you, something else is wrong.

-sln
-------------------------

use strict;
use warnings;

my $sequence = "\x{00}\x{00}\x{02}\x{02}\x{01}\x{00}";
# or = pack('C*', 0x00, 0x00, 0x02, 0x02, 0x01, 0x00);

# Create dummy random binary file with embeded sequence
# ##
open my $ftest, '>:raw', 'dummy.bin' or die "can't create dummy.bin: $!";
for (1 .. 12_000) {
if ($_ == 2000) {
print $ftest $sequence;
} else {
print $ftest chr(int(rand(256)) & 0xff);
}
}
close $ftest;

# Read in binary, look for sequence, remove then write to file
# ##
open my $fin, '<:raw', 'dummy.bin' or die "can't open input file: $!";
open my $fout, '>:raw', 'dummy_o.bin' or die "can't open output file: $!";
my ($chunksize, $found) = (1024,0);
{
local $/ = \$chunksize;
my ($keep, $buf, $data) = (50,'','');
while (defined ($data = <$fin>)) {
$buf .= $data;
if (!$found) {
if ($buf =~ s/($sequence)(.{18})//s) {
print "Found sequence: ".ordsplit($1)."\n";
print "Next 18 bytes : ".ordsplit($2)."\n";
print "Sequence + 18 bytes, removed!\n";
$found = 1;
}
}
print $fout substr( $buf, 0, -$keep, "");
}
print $fout $buf;
}
if (!$found) {
print "Did not match sequence: '\$sequence.{18}'\n";
}
close $fout;
close $fin;

## End of program
exit 0;

sub ordsplit
{
my $string = shift;
my $buf = '';
for (map {ord $_} split //, $string) {
$buf.= sprintf ("%02x ",$_);
}
return $buf;
}

__END__

Found sequence: 00 00 02 02 01 00
Next 18 bytes : 25 6f e4 7e 6e fb fe 1e 47 af e6 2e 50 3f 31 54
dd 51
Sequence + 18 bytes, removed!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,992
Messages
2,570,220
Members
46,805
Latest member
ClydeHeld1

Latest Threads

Top