Q
qwweeeit
Hi all,
I am writing a script to visualize (and print)
the web references hidden in the html files as:
'<a href="web reference"> underlined reference</a>'
Optimizing my code, I found that an essential step is:
splitting on a word (in this case 'href').
I am asking if there is some alternative (more pythonic...):
# SplitMultichar.py
import re
# string s simulating an html file
s='ffy: ytrty <a href="www.python.org">python</a> fyt <A
HREF="wwwx">wx</A> dtrtf'
p=re.compile(r'\bhref\b',re.I)
lHref=p.findall(s) # lHref=['href','HREF']
# for normal html files the lHref list has more elements
# (more web references)
c='~' # char to be used as delimiter
# c=chr(127) # char to be used as delimiter
for i in lHref:
s=s.replace(i,c)
# s ='ffy: ytrty <a ~="www.python.org">python</a> fyt <A
~="wwwx">wx</A> dtrtf'
list=s.split(c)
# list=['ffy: ytrty <a ', '="www.python.org">python</a> fyt <A ',
'="wwwx">wx</A> dtrtf']
#=-----------------------------------------------------
If you save the original s string to xxx.html, any browser
can visualize it.
To be sure as delimiter I choose chr(127)
which surely is not present in the html file.
Bye.
I am writing a script to visualize (and print)
the web references hidden in the html files as:
'<a href="web reference"> underlined reference</a>'
Optimizing my code, I found that an essential step is:
splitting on a word (in this case 'href').
I am asking if there is some alternative (more pythonic...):
# SplitMultichar.py
import re
# string s simulating an html file
s='ffy: ytrty <a href="www.python.org">python</a> fyt <A
HREF="wwwx">wx</A> dtrtf'
p=re.compile(r'\bhref\b',re.I)
lHref=p.findall(s) # lHref=['href','HREF']
# for normal html files the lHref list has more elements
# (more web references)
c='~' # char to be used as delimiter
# c=chr(127) # char to be used as delimiter
for i in lHref:
s=s.replace(i,c)
# s ='ffy: ytrty <a ~="www.python.org">python</a> fyt <A
~="wwwx">wx</A> dtrtf'
list=s.split(c)
# list=['ffy: ytrty <a ', '="www.python.org">python</a> fyt <A ',
'="wwwx">wx</A> dtrtf']
#=-----------------------------------------------------
If you save the original s string to xxx.html, any browser
can visualize it.
To be sure as delimiter I choose chr(127)
which surely is not present in the html file.
Bye.