s///gsi; with a wildcard

J

Jason Carlton

Every once in awhile, someone will copy and paste into my message
board from Word. After it submits through my Perl script, I'll have
something like this plugged in:

Normal 0 false false false EN-US X-NONE X-NONE
MicrosoftInternetExplorer4 /* Style Definitions */
table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-
rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-
style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-
padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-
margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:
0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt;
font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-
ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New
Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-
family:Calibri; mso-hansi-theme-font:minor-latin;}

The fonts and all that are different for each post; the only
consistency seems to be that it starts with "Normal 0 false false
false", and it ends with a "}".

Would something as simple as this be enough to consistently remove it?

$comment =~ s/Normal 0 false false false.*?}//gsi;

Or is there more to it than I'm thinking?
 
J

Jason Carlton

Every once in awhile, someone will copy and paste into my message
board from Word. After it submits through my Perl script, I'll have
something like this plugged in:

Normal 0 false false false EN-US X-NONE X-NONE
MicrosoftInternetExplorer4 /* Style Definitions */
table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-
rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-
style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-
padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-
margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:
0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt;
font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-
ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New
Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-
family:Calibri; mso-hansi-theme-font:minor-latin;}

The fonts and all that are different for each post; the only
consistency seems to be that it starts with "Normal 0 false false
false", and it ends with a "}".

Would something as simple as this be enough to consistently remove it?

$comment =~ s/Normal 0 false false false.*?}//gsi;

Or is there more to it than I'm thinking?

Sorry if I made that too much to read.

Basically, I want to remove "Normal 0 false false false" followed by
random stuff, but always ending with }.

Will this do it correctly, or will it remove other things that I'm not
recognizing?

$comment =~ s/Normal 0 false false false.*?}//gsi;

TIA,

Jason
 
J

Jason Carlton

You've shown in the past that anything you write is too much to read.

:-(

--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.liamg\100cm.j.dat/"
The above message is a Usenet post.
I don't recall having given anyone permission to use it on a Web site.

So, you're saying that you don't know the answer? If so, then why
bother replying? Or spending time in a Perl NG, for that matter.
 
S

sln

Every once in awhile, someone will copy and paste into my message
board from Word. After it submits through my Perl script, I'll have
something like this plugged in:

Normal 0 false false false EN-US X-NONE X-NONE
MicrosoftInternetExplorer4 /* Style Definitions */
table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-
rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-
style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-
padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-
margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:
0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt;
font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-
ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New
Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-
family:Calibri; mso-hansi-theme-font:minor-latin;}

The fonts and all that are different for each post; the only
consistency seems to be that it starts with "Normal 0 false false
false", and it ends with a "}".

Would something as simple as this be enough to consistently remove it?

$comment =~ s/Normal 0 false false false.*?}//gsi;

Or is there more to it than I'm thinking?

$comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;
 
J

Jason Carlton

Every once in awhile, someone will copy and paste into my message
board from Word. After it submits through my Perl script, I'll have
something like this plugged in:
Normal 0 false false false EN-US X-NONE X-NONE
MicrosoftInternetExplorer4 /* Style Definitions */
table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-
rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-
style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-
padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-
margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:
0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt;
font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-
ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New
Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-
family:Calibri; mso-hansi-theme-font:minor-latin;}
The fonts and all that are different for each post; the only
consistency seems to be that it starts with "Normal 0 false false
false", and it ends with a "}".
Would something as simple as this be enough to consistently remove it?
$comment =~ s/Normal 0 false false false.*?}//gsi;
Or is there more to it than I'm thinking?

$comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;

Thanks, s.
 
J

Jason Carlton

$comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;

Thanks, s.

Unfortunately, neither of these are working the way I expected:

$comment =~ s/Normal 0 false false false.*?}//gsi;
$comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;

It's catching the "Normal 0 false false false", but not everything
else that comes after, and before the "}".

How do I make it remove everything from "Normal 0 false false false"
until it finds the first "}"?

TIA,

Jason
 
S

sln

Every once in awhile, someone will copy and paste into my message
board from Word. After it submits through my Perl script, I'll have
something like this plugged in:
Normal 0 false false false EN-US X-NONE X-NONE
MicrosoftInternetExplorer4 /* Style Definitions */
table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-
rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-
style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-
padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-
margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:
0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt;
font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-
ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New
Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-
family:Calibri; mso-hansi-theme-font:minor-latin;}
The fonts and all that are different for each post; the only
consistency seems to be that it starts with "Normal 0 false false
false", and it ends with a "}".
Would something as simple as this be enough to consistently remove it?
$comment =~ s/Normal 0 false false false.*?}//gsi;
Or is there more to it than I'm thinking?
$comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;

Thanks, s.

Unfortunately, neither of these are working the way I expected:

$comment =~ s/Normal 0 false false false.*?}//gsi;
$comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;

It's catching the "Normal 0 false false false", but not everything
else that comes after, and before the "}".

How do I make it remove everything from "Normal 0 false false false"
until it finds the first "}"?

TIA,

Jason

You can generalize it more:

$comment =~ s/Normal \s* \d+ \s* false \s* false \s* false [^}]* \} //xig;

But, its probably not matching, so the format is different, maybe there
is no terminating '}' in the real text. You don't need /s if you don't have
a '.' in the pattern, thats why [^}]* \}

Its not a good idea to get everything between the the "Normal" to "}"
as thats not really enough info to make a pattern.

It looks like this:
Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4
is a space delimited set of variable settings, followed by
a '{' block '}' delimeted set of style definitions:

You could use alternation to flag the start the definition if you
know the possible values (the slots look constant), so:

$comment =~ s/ (?:Normal|<something else>) \s* \d+ \s* (?:false|true) \s* (?:false|true) \s* (?:false|true) [^}]* \} //xig;

But, I don't know this format and it possibly can't be relied upon.
Also, the regex has a requirement that it have a style block (or at least something
with a '}' as the terminator.

-sln
 
J

J. Gleixner

Jason said:
Every once in awhile, someone will copy and paste into my message
board from Word. After it submits through my Perl script, I'll have
something like this plugged in:
Normal 0 false false false EN-US X-NONE X-NONE
MicrosoftInternetExplorer4 /* Style Definitions */
table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-
rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-
style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-
padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-
margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:
0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt;
font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-
ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New
Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-
family:Calibri; mso-hansi-theme-font:minor-latin;}
The fonts and all that are different for each post; the only
consistency seems to be that it starts with "Normal 0 false false
false", and it ends with a "}".
Would something as simple as this be enough to consistently remove it?
$comment =~ s/Normal 0 false false false.*?}//gsi;
Or is there more to it than I'm thinking?
$comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;
Thanks, s.

Unfortunately, neither of these are working the way I expected:

$comment =~ s/Normal 0 false false false.*?}//gsi;
$comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;

It's catching the "Normal 0 false false false", but not everything
else that comes after, and before the "}".

How do I make it remove everything from "Normal 0 false false false"
until it finds the first "}"?

$comment =~ s/Normal 0 false false false[^}]*}//gsi;

my $str = 'Start Normal 0 false false false blah blah { more blah }
Starting second match Normal 0 false false false blah blah { more blah }
The End';
$str =~ s/Normal 0 false false false[^}]*}//gsi;
print $str;

Start Starting second match The End
 
J

Jason Carlton

JasonCarltonwrote:
On Mar 9, 9:21 pm, (e-mail address removed) wrote:
Every once in awhile, someone will copy and paste into my message
board from Word. After it submits through my Perl script, I'll have
something like this plugged in:
Normal 0 false false false EN-US X-NONE X-NONE
MicrosoftInternetExplorer4 /* Style Definitions */
table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-
rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-
style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-
padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-
margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:
0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt;
font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-
ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New
Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-
family:Calibri; mso-hansi-theme-font:minor-latin;}
The fonts and all that are different for each post; the only
consistency seems to be that it starts with "Normal 0 false false
false", and it ends with a "}".
Would something as simple as this be enough to consistently remove it?
$comment =~ s/Normal 0 false false false.*?}//gsi;
Or is there more to it than I'm thinking?
$comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;
Thanks, s.
Unfortunately, neither of these are working the way I expected:
$comment =~ s/Normal 0 false false false.*?}//gsi;
$comment =~ s/Normal 0 false false false[^{]+\{[^}]+\}//;
It's catching the "Normal 0 false false false", but not everything
else that comes after, and before the "}".
How do I make it remove everything from "Normal 0 false false false"
until it finds the first "}"?

$comment =~ s/Normal 0 false false false[^}]*}//gsi;

my $str = 'Start Normal 0 false false false blah blah { more blah }
Starting second match Normal 0 false false false blah blah { more blah }
The End';
$str =~ s/Normal 0 false false false[^}]*}//gsi;
print $str;

Start  Starting second match  The End

J, should that first "}" be a "{"? Like:

$str =~ s/Normal 0 false false false[^{]*}//gsi;
 
J

J. Gleixner

Jason Carlton wrote:
[...]remove it?
[...]
J, should that first "}" be a "{"? Like:
$str =~ s/Normal 0 false false false[^{]*}//gsi;

Before asking if it's not correct, why not try it?

[^}]* - match everything until it sees '}'
} - include '}' in the pattern. -- without that you'll
have '}' in your results.

I gave example text, and the output it generates, if that
doesn't match what you want, then please be a little
more verbose. Provide a -short- example of the text before,
and what you want the text to be after doing something to it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top