M
mike blamires
I am having great difficulty using Unicode characters in a Regular
Expression, I am trying to match extended Unicode characters.
I am wishing to split a large Dumpfile (containing only JPEGS) I have used
a hex editor to manually extract a file just to show it can be done, so I
know the input is intact.
Each JPEG starts with the Unicode characters \u00FF \u00D8 \u00FF \u00E1
and there are plenty of these to be found within the file.
open(DUMPFILE, "/pathtodumpfile");
my $line;
while(<DUMPFILE>) {
$line = $line.$_;
}
@files = split(/\x{00FF}\x{00D8}\x{00FF}\x{00E1}/, $line);
(As you may see from the above style I am relatively inexperienced to the
perl side of programming
I have tried inserting the Unicode characters in various ways \xFF, \x{FF}
etc. It just doesn't seem to find the pattern. I am at a bit of a loss as
to whether it is my regexp that is wrong, my use of Unicode characters
or use of Extended Unicode characters.
many thanks for your help.
cheers
Mike
Expression, I am trying to match extended Unicode characters.
I am wishing to split a large Dumpfile (containing only JPEGS) I have used
a hex editor to manually extract a file just to show it can be done, so I
know the input is intact.
Each JPEG starts with the Unicode characters \u00FF \u00D8 \u00FF \u00E1
and there are plenty of these to be found within the file.
open(DUMPFILE, "/pathtodumpfile");
my $line;
while(<DUMPFILE>) {
$line = $line.$_;
}
@files = split(/\x{00FF}\x{00D8}\x{00FF}\x{00E1}/, $line);
(As you may see from the above style I am relatively inexperienced to the
perl side of programming
I have tried inserting the Unicode characters in various ways \xFF, \x{FF}
etc. It just doesn't seem to find the pattern. I am at a bit of a loss as
to whether it is my regexp that is wrong, my use of Unicode characters
or use of Extended Unicode characters.
many thanks for your help.
cheers
Mike