Help with regular expression patterns

Michel Perez · Nov 28, 2008

Hi:
i'm so newbie in python that i don't get the right idea about regular
expressions. This is what i want to do:
Extract using python some information and them replace this expresion
for others, i use as a base the wikitext and this is what i do:

<code file="parse.py">
paragraphs = """
= Test '''wikitest'''=
[[Image:image_link.jpg|rigth|thumbnail|200px|"PREMIER"]]

[http://www.google.com.cu]
::''Note: This is just an example to test some regular expressions
stuffs.''

The ''wikitext'' is a text format that helps a lot. In concept is a
simple [[markup]] [[programming_language|language]]. That helps to make
simple create documentations texts.

==Wikitext==

Created by Warn as a ...

<nowiki>[</nowiki> this is a normal <nowiki>sign]</nowiki>
""".split('\n\n')

import re
wikipatterns = {
'a_nowiki' : re.compile(r"<nowiki>(.\S+)</nowiki>"), # nowiki
'section' : re.compile(r"\=(.*)\="), # section one tags
'sectiontwo' : re.compile(r"\=\=(.*?)\=\="),# section two tags
'wikilink': re.compile(r"\[\[(.*?)\]\]"), # links tags
'link': re.compile(r"\[(.*?)\]"), # external links tags
'italic': re.compile(r"\'\'(.*?)\'\'"), # italic text tags
'bold' : re.compile(r"\'\'\'(.*?)\'\'\'"), # bold text tags
}

for pattern in wikipatterns:
print "===> processing pattern :", pattern, "<=============="
for paragraph in paragraphs:
print wikipatterns[pattern].findall(paragraph)

</code>

But When i run it the result is not what i want, it's something like:

<code>
michel@cerebellum:/local/python$python parser.py
===> processing pattern : bold <==============
['braille']
[]
[]
[]
[]
[]
===> processing pattern : section <==============
[" ï»¿Test '''wikitest'''"]
[]
[]
['=Wikitext=']
[]
[]
===> processing pattern : sectiontwo <==============
[]
[]
[]
['Wikitext']
[]
[]
===> processing pattern : link <==============
['[Image:image_link.jpg|rigth|thumbnail|200px|"PREMIER"']
['http://www.google.com.cu']
['[markup', '[programming_language|language']
[]
[]
['</nowiki> this is a normal <nowiki>sign']
===> processing pattern : italic <==============
["'wikitest"]
['Note: This is just an example to test some regular expressions
stuffs.']
['wikitext']
[]
[]
[]
===> processing pattern : wikilink <==============
['Image:image_link.jpg|rigth|thumbnail|200px|"PREMIER"']
[]
['markup', 'programming_language|language']
[]
[]
[]
===> processing pattern : a_nowiki <==============
[]
[]
[]
[]
[]
['sign]']
</code>

In the first case the result it's Ok
In the second the first it's Ok, but the second it's not because second
result it's a level two section not a level one.
In the third result things are Ok
The fourth, the first and thrid result are wrong beacuse they are level
two links, but the second it's Ok.
The fifth it Ok
The sixth shows only one result and it should show two.

Please help.

PS: am really sorry about my technical English.

Help with regular expression in python	1	Aug 18, 2011
What's the best way to write this regular expression?	41	Mar 6, 2012
Regular expression help	4	Jul 18, 2008
Help on regular expression match	4	Sep 23, 2005
regular expression extracting groups	3	Aug 10, 2008
Looking for help with Regular Expression	3	May 24, 2006
Please help with regular expression finding multiple floats	6	Oct 22, 2009
Regular expression to structure HTML	11	Oct 2, 2009

Help with regular expression patterns

Michel Perez

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads