Regular Expression for Finding and Deleting comments

J

Jeremy

I am trying to write a regular expression that finds and deletes (replaces with nothing) comments in a string/file. Comments are defined by the first non-whitespace character is a 'c' or a dollar sign somewhere in the line. I want to replace these comments with nothing which isn't too hard. The trouble is, the comments are replaced with a new-line; or the new-line isn't captured in the regular expression.

Below, I have copied a minimal example. Can someone help?

Thanks,
Jeremy


import re

text = """ c
C - Second full line comment (first comment had no text)
c Third full line comment
F44:N 2 $ Inline comments start with dollar sign and go to end of line"""

commentPattern = re.compile("""
(^\s*?c\s*?.*?| # Comment start with c or C
\$.*?)$\n # Comment starting with $
""", re.VERBOSE|re.MULTILINE|re.IGNORECASE)

found = commentPattern.finditer(text)

print("\n\nCard:\n--------------\n%s\n------------------" %text)

if found:
print("\nI found the following:")
for f in found: print(f.groups())

else:
print("\nNot Found")

print("\n\nComments replaced with ''")
replaced = commentPattern.sub('', text)
print("--------------\n%s\n------------------" %replaced)
 
M

MRAB

I am trying to write a regular expression that finds and deletes (replaces with nothing) comments in a string/file. Comments are defined by the first non-whitespace character is a 'c' or a dollar sign somewhere in the line. I want to replace these comments with nothing which isn't too hard. The trouble is, the comments are replaced with a new-line; or the new-line isn't captured in the regular expression.

Below, I have copied a minimal example. Can someone help?

Thanks,
Jeremy


import re

text = """ c
C - Second full line comment (first comment had no text)
c Third full line comment
F44:N 2 $ Inline comments start with dollar sign and go to end of line"""

commentPattern = re.compile("""
(^\s*?c\s*?.*?| # Comment start with c or C
\$.*?)$\n # Comment starting with $
""", re.VERBOSE|re.MULTILINE|re.IGNORECASE)
Part of the problem is that you're not using raw string literals or
doubling the backslashes.

Try soemthing like this:

commentPattern = re.compile(r"""
(^[ \t]*c.*\n| # Comment start with c or C
[ \t]*\$.*) # Comment starting with $
""", re.VERBOSE|re.MULTILINE|re.IGNORECASE)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,701
Latest member
XavierQ83

Latest Threads

Top