Regular Expression for Finding and Deleting comments

Jeremy · Jan 4, 2011

I am trying to write a regular expression that finds and deletes (replaces with nothing) comments in a string/file. Comments are defined by the first non-whitespace character is a 'c' or a dollar sign somewhere in the line. I want to replace these comments with nothing which isn't too hard. The trouble is, the comments are replaced with a new-line; or the new-line isn't captured in the regular expression.

Below, I have copied a minimal example. Can someone help?

Thanks,
Jeremy

import re

text = """ c
C - Second full line comment (first comment had no text)
c Third full line comment
F44:N 2 $ Inline comments start with dollar sign and go to end of line"""

commentPattern = re.compile("""
(^\s*?c\s*?.*?| # Comment start with c or C
\$.*?)$\n # Comment starting with $
""", re.VERBOSE|re.MULTILINE|re.IGNORECASE)

found = commentPattern.finditer(text)

print("\n\nCard:\n--------------\n%s\n------------------" %text)

if found:
print("\nI found the following:")
for f in found: print(f.groups())

else:
print("\nNot Found")

print("\n\nComments replaced with ''")
replaced = commentPattern.sub('', text)
print("--------------\n%s\n------------------" %replaced)

MRAB · Jan 4, 2011

I am trying to write a regular expression that finds and deletes (replaces with nothing) comments in a string/file. Comments are defined by the first non-whitespace character is a 'c' or a dollar sign somewhere in the line. I want to replace these comments with nothing which isn't too hard. The trouble is, the comments are replaced with a new-line; or the new-line isn't captured in the regular expression.

Below, I have copied a minimal example. Can someone help?

Thanks,
Jeremy

import re

text = """ c
C - Second full line comment (first comment had no text)
c Third full line comment
F44:N 2 $ Inline comments start with dollar sign and go to end of line"""

commentPattern = re.compile("""
(^\s*?c\s*?.*?| # Comment start with c or C
\$.*?)$\n # Comment starting with $
""", re.VERBOSE|re.MULTILINE|re.IGNORECASE)

Part of the problem is that you're not using raw string literals or
doubling the backslashes.

Try soemthing like this:

commentPattern = re.compile(r"""
(^[ \t]*c.*\n| # Comment start with c or C
[ \t]*\$.*) # Comment starting with $
""", re.VERBOSE|re.MULTILINE|re.IGNORECASE)

Regular expression for BOM required	6	Jan 12, 2013
Repeating assertions in regular expression	3	Jan 3, 2012
Regular expression	0	Jul 21, 2009
Problem creating a regular expression to parse open-iscsi, iscsiadmoutput (help?)	5	Jun 13, 2013
Please help with regular expression finding multiple floats	6	Oct 22, 2009
Question: Optional Regular Expression Grouping	4	Oct 10, 2011
FAQ 6.11 How do I use a regular expression to strip C style comments from a file?	0	Feb 10, 2011
How do I get the text that is found by a regular expression?	10	Apr 30, 2014

Regular Expression for Finding and Deleting comments

Jeremy

MRAB

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads