Help with regular expression patterns

M

Michel Perez

Hi:
i'm so newbie in python that i don't get the right idea about regular
expressions. This is what i want to do:
Extract using python some information and them replace this expresion
for others, i use as a base the wikitext and this is what i do:

<code file="parse.py">
paragraphs = """
= Test '''wikitest'''=
[[Image:image_link.jpg|rigth|thumbnail|200px|"PREMIER"]]

[http://www.google.com.cu]
::''Note: This is just an example to test some regular expressions
stuffs.''

The ''wikitext'' is a text format that helps a lot. In concept is a
simple [[markup]] [[programming_language|language]]. That helps to make
simple create documentations texts.

==Wikitext==

Created by Warn as a ...

<nowiki>[</nowiki> this is a normal <nowiki>sign]</nowiki>
""".split('\n\n')

import re
wikipatterns = {
'a_nowiki' : re.compile(r"<nowiki>(.\S+)</nowiki>"), # nowiki
'section' : re.compile(r"\=(.*)\="), # section one tags
'sectiontwo' : re.compile(r"\=\=(.*?)\=\="),# section two tags
'wikilink': re.compile(r"\[\[(.*?)\]\]"), # links tags
'link': re.compile(r"\[(.*?)\]"), # external links tags
'italic': re.compile(r"\'\'(.*?)\'\'"), # italic text tags
'bold' : re.compile(r"\'\'\'(.*?)\'\'\'"), # bold text tags
}

for pattern in wikipatterns:
print "===> processing pattern :", pattern, "<=============="
for paragraph in paragraphs:
print wikipatterns[pattern].findall(paragraph)

</code>

But When i run it the result is not what i want, it's something like:

<code>
michel@cerebellum:/local/python$python parser.py
===> processing pattern : bold <==============
['braille']
[]
[]
[]
[]
[]
===> processing pattern : section <==============
[" Test '''wikitest'''"]
[]
[]
['=Wikitext=']
[]
[]
===> processing pattern : sectiontwo <==============
[]
[]
[]
['Wikitext']
[]
[]
===> processing pattern : link <==============
['[Image:image_link.jpg|rigth|thumbnail|200px|"PREMIER"']
['http://www.google.com.cu']
['[markup', '[programming_language|language']
[]
[]
['</nowiki> this is a normal <nowiki>sign']
===> processing pattern : italic <==============
["'wikitest"]
['Note: This is just an example to test some regular expressions
stuffs.']
['wikitext']
[]
[]
[]
===> processing pattern : wikilink <==============
['Image:image_link.jpg|rigth|thumbnail|200px|"PREMIER"']
[]
['markup', 'programming_language|language']
[]
[]
[]
===> processing pattern : a_nowiki <==============
[]
[]
[]
[]
[]
['sign]']
</code>

In the first case the result it's Ok
In the second the first it's Ok, but the second it's not because second
result it's a level two section not a level one.
In the third result things are Ok
The fourth, the first and thrid result are wrong beacuse they are level
two links, but the second it's Ok.
The fifth it Ok
The sixth shows only one result and it should show two.

Please help.

PS: am really sorry about my technical English.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top