X
Xah Lee
20050207 text pattern matching
# -*- coding: utf-8 -*-
# Python
# suppose you want to replace all strings of the form
# <img src="some.gif" width="30" height="20">
# to
# <img src="some.png" width="30" height="20">
# in your html files.
# you can use the "re" module.
import re
text = r'''<html>
blab blab
<P> look at this <img src="./some.gif" width="30" height="20"> pict
and this one <img class="float-right" src="../that.gif">, both are
beautiful, but also look: <img src ="my.gif">, and sequel
<img src=
"girl.gif"> yeah! </p>
'''
new = re.sub(r'''src\s*=\s*"([^"]+)\.gif"''', r'''src="\1.png"''',
text)
print new
# the first argument to re.sub is a regex pattern.
# the second argument is the replacement string,
# which can contain captured pattern (the \1)
# the third argument is the text to be checked.
# an optional 4th argument is number of replacement
# to make. If ommitted, it replace all occurances of matches.
# see
# http://python.org/doc/lib/module-re.html
--------------------
# similar code in perl is s///. For example,
$text = "123";
$text =~ s/2/9/;
print $text;
----------------------
In languages human or computer, there's a notion of expressiveness.
English for example, is very expressive in manifestation, witness all
the poetry and implications and allusions and connotations and
dictions. There are a myriad ways to say one thing, fuzzy and warm and
all. But when we look at what things it can say, its power of
expression with respect to meaning, or its efficiency or precision, we
find natural languages incapable.
These can be felt thru several means. A sure way is thru logic,
linguistics, and or what's called Philosophy of Languages. One can also
glean directly the incapacity and inadequacy of natural languages by
studying the artificial language lojban, where one realizes, not only
are natural languages incapable in precision and lacking in efficiency,
but simply a huge number of things are near impossible to express thru
them.
One thing commonly misunderstood in computing industry is the notion of
expressiveness. If a language has a vocabulary of (smile, laugh, grin,
giggle, chuckle, guffaw, cackle), then that language will not be as
expressive, as a language with just (severe, slight, laugh, cry). The
former is "expressive" in terms of fluff, where the latter is
expressive with respect to meaning.
Similarly, in computer languages, expressiveness is significant with
respect to semantics, not syntactical variation.
These two contrasting ideas can be easily seen thru Perl vs Python
languages, and as one specific example of their text pattern matching
abilities.
Perl is a language of syntactical variegations. Python on the other
hand, does not even allow changes in code's indentation, but its
efficiency and power in expression, with respect to semantics (i.e.
algorithms), showcases Perl's poverty in specification.
Xah
(e-mail address removed)
http://xahlee.org/PageTwo_dir/more.html
# -*- coding: utf-8 -*-
# Python
# suppose you want to replace all strings of the form
# <img src="some.gif" width="30" height="20">
# to
# <img src="some.png" width="30" height="20">
# in your html files.
# you can use the "re" module.
import re
text = r'''<html>
blab blab
<P> look at this <img src="./some.gif" width="30" height="20"> pict
and this one <img class="float-right" src="../that.gif">, both are
beautiful, but also look: <img src ="my.gif">, and sequel
<img src=
"girl.gif"> yeah! </p>
'''
new = re.sub(r'''src\s*=\s*"([^"]+)\.gif"''', r'''src="\1.png"''',
text)
print new
# the first argument to re.sub is a regex pattern.
# the second argument is the replacement string,
# which can contain captured pattern (the \1)
# the third argument is the text to be checked.
# an optional 4th argument is number of replacement
# to make. If ommitted, it replace all occurances of matches.
# see
# http://python.org/doc/lib/module-re.html
--------------------
# similar code in perl is s///. For example,
$text = "123";
$text =~ s/2/9/;
print $text;
----------------------
In languages human or computer, there's a notion of expressiveness.
English for example, is very expressive in manifestation, witness all
the poetry and implications and allusions and connotations and
dictions. There are a myriad ways to say one thing, fuzzy and warm and
all. But when we look at what things it can say, its power of
expression with respect to meaning, or its efficiency or precision, we
find natural languages incapable.
These can be felt thru several means. A sure way is thru logic,
linguistics, and or what's called Philosophy of Languages. One can also
glean directly the incapacity and inadequacy of natural languages by
studying the artificial language lojban, where one realizes, not only
are natural languages incapable in precision and lacking in efficiency,
but simply a huge number of things are near impossible to express thru
them.
One thing commonly misunderstood in computing industry is the notion of
expressiveness. If a language has a vocabulary of (smile, laugh, grin,
giggle, chuckle, guffaw, cackle), then that language will not be as
expressive, as a language with just (severe, slight, laugh, cry). The
former is "expressive" in terms of fluff, where the latter is
expressive with respect to meaning.
Similarly, in computer languages, expressiveness is significant with
respect to semantics, not syntactical variation.
These two contrasting ideas can be easily seen thru Perl vs Python
languages, and as one specific example of their text pattern matching
abilities.
Perl is a language of syntactical variegations. Python on the other
hand, does not even allow changes in code's indentation, but its
efficiency and power in expression, with respect to semantics (i.e.
algorithms), showcases Perl's poverty in specification.
Xah
(e-mail address removed)
http://xahlee.org/PageTwo_dir/more.html