regex dingbat dodge - single char as string to repeatable singlechar.

John · Jan 25, 2008

I have text that might have had a star character in the proprietary
orginating system. The character is used in ratings boxes: A three-
star movie, a four-star restaurant, etc.

By the time it's exported and available to me, it's represented by a
string: "<star>".

I want to suround consecutive stars with font coding and replace each
instance of the string with a single character that, in conjuction
with the font change, will eventually print as a star.

To set up this substitution, I change the strings back to a unique
character, one that I reckon would never occur in nature.

When I try to surround any repetitions of this invented character, I
instead match everything.

===

#!/usr/bin/perl -w
use strict;

my $text = "Cuisine: Urban deli<ep>";
$text .= "Overall: <star><star><star><star><1/2> (very good to
excellent)<ep>";
$text .= "Food: <star><star><star><star><1/2><ep>";

$text =~ s/\<star\>/_STAR_/ig; # uscores easier in regex than angle
brackets.
$text =~ s/_STAR_/\xbc/g; # change pseudocharacter to single
character
$text =~ s/(\xbc*)/_STARFONT_$1_ENDSTAR/g; #bracket groups in more
pseudocode

print $text;

====

If I limit the search to five consecutive stars, the match works as I
intended:

===

#!/usr/bin/perl -w
use strict;

my $text = "Cuisine: Urban deli<ep>";
$text .= "Overall: <star><star><star><star><1/2> (very good to
excellent)<ep>";
$text .= "Food: <star><star><star><star><1/2><ep>";

$text =~ s/\<star\>/_STAR_/ig; # uscores easier in regex than angle
brackets.
$text =~ s/_STAR_/\xbc/g; # change pseudocharacter to single
character
$text =~ s/(\xbc{1,5})/_STARFONT_$1_ENDSTAR/g; #bracket groups of
stars in more pseudocode

print $text;

===

So what am I missing when it comes to the first search?

Certainly, I am missing some superior technique for matching repeated
instances of such a string, so I am open to suggestions there.

John Campbell
Haddonfield, NJ 08033

John W. Krahn · Jan 25, 2008

John said:
I have text that might have had a star character in the proprietary
orginating system. The character is used in ratings boxes: A three-
star movie, a four-star restaurant, etc.

By the time it's exported and available to me, it's represented by a
string: "<star>".

I want to suround consecutive stars with font coding and replace each
instance of the string with a single character that, in conjuction
with the font change, will eventually print as a star.

To set up this substitution, I change the strings back to a unique
character, one that I reckon would never occur in nature.

When I try to surround any repetitions of this invented character, I
instead match everything.

===

#!/usr/bin/perl -w
use strict;

my $text = "Cuisine: Urban deli<ep>";
$text .= "Overall: <star><star><star><star><1/2> (very good to
excellent)<ep>";
$text .= "Food: <star><star><star><star><1/2><ep>";

$text =~ s/\<star\>/_STAR_/ig; # uscores easier in regex than angle
brackets.
$text =~ s/_STAR_/\xbc/g; # change pseudocharacter to single
character
$text =~ s/(\xbc*)/_STARFONT_$1_ENDSTAR/g; #bracket groups in more
pseudocode

print $text;

====

If I limit the search to five consecutive stars, the match works as I
intended:

===

#!/usr/bin/perl -w
use strict;

my $text = "Cuisine: Urban deli<ep>";
$text .= "Overall: <star><star><star><star><1/2> (very good to
excellent)<ep>";
$text .= "Food: <star><star><star><star><1/2><ep>";

$text =~ s/\<star\>/_STAR_/ig; # uscores easier in regex than angle
brackets.
$text =~ s/_STAR_/\xbc/g; # change pseudocharacter to single
character
$text =~ s/(\xbc{1,5})/_STARFONT_$1_ENDSTAR/g; #bracket groups of
stars in more pseudocode

print $text;

===

So what am I missing when it comes to the first search?

Certainly, I am missing some superior technique for matching repeated
instances of such a string, so I am open to suggestions there.

In the first regular expression you are matching '\xbc*' and in the
second you are matching '\xbc{1,5}'. The '*' modifier matches *zero* or
more times and there are *zero* '\xbc' characters everywhere in the
string. The second one has to match at least *one* character. Change
'\xbc*' to '\xbc+'.

John

John · Jan 25, 2008

John wrote:
The '*' modifier matches *zero* or
more times and there are *zero* '\xbc' characters everywhere in the
string. The second one has to match at least *one* character. Change
'\xbc*' to '\xbc+'.

That does the trick. It ought to come in handy.

Just realized that this snippet prints what in some systems is an
unprintable character.
I just see questions marks. Hope I didn't cause any problems with that.

Ben Morrow · Jan 25, 2008

Quoth John said:
I have text that might have had a star character in the proprietary
orginating system. The character is used in ratings boxes: A three-
star movie, a four-star restaurant, etc.

By the time it's exported and available to me, it's represented by a
string: "<star>".

I want to suround consecutive stars with font coding and replace each
instance of the string with a single character that, in conjuction
with the font change, will eventually print as a star.

To set up this substitution, I change the strings back to a unique
character, one that I reckon would never occur in nature.

When I try to surround any repetitions of this invented character, I
instead match everything.

#!/usr/bin/perl -w

You want

use warnings;

rather than -w, nowadays.

use strict;

my $text = "Cuisine: Urban deli<ep>";
$text .= "Overall: <star><star><star><star><1/2> (very good to
excellent)<ep>";
$text .= "Food: <star><star><star><star><1/2><ep>";

$text =~ s/\<star\>/_STAR_/ig; # uscores easier in regex
than angle

No they're not. Angles don't need escaping inside regexen.

brackets.
$text =~ s/_STAR_/\xbc/g; # change pseudocharacter to single
character
$text =~ s/(\xbc*)/_STARFONT_$1_ENDSTAR/g; #bracket groups in more
pseudocode

I don't know what the point of that is, unless you have some intervening
code that processes char-by-char.

s/( (?: <star> )+ )/_STARFONT_$1_ENDSTAR/gx;

will work perfectly well. Notice the difference between () and (?: )
(capturing vs. grouping) and my use of /x to make the regex more
comprehensible. Needing + instead of * has already been covered

.

Ben

How to read a file as binary or hex "string" so that I can do regex search?	3	Dec 19, 2024
Regex: match double OR single quote	4	Jul 12, 2012
How to substitute (regex) single newline (0A) character on Win32	3	Jul 17, 2009
Big problem I need to solve with some unix utils	1	Jun 19, 2022
FAQ 6.9 How can I quote a variable to use in a regex?	10	Apr 12, 2011
regex problem	2	Jan 17, 2007
passing single backslash character as commang line argument	5	Sep 27, 2010
Regex query	10	Jul 24, 2006

regex dingbat dodge - single char as string to repeatable singlechar.

John

John W. Krahn

John

Ben Morrow

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads