re.search - just skip it

R

rasdj

Input is this:

SET1_S_W CHAR(1) NOT NULL,
SET2_S_W CHAR(1) NOT NULL,
SET3_S_W CHAR(1) NOT NULL,
SET4_S_W CHAR(1) NOT NULL,
;

..py says:

import re, string, sys
s_ora = re.compile('.*S_W.*')
lines = open("y.sql").readlines()
for i in range(len(lines)):
try:
if s_ora.search(lines): del lines
except IndexError:
open("z.sql","w").writelines(lines)

but output is:

SET2_S_W CHAR(1) NOT NULL,
SET4_S_W CHAR(1) NOT NULL,
;

It should delete every, not every other!

thx,

RasDJ
 
D

Duncan Booth

wrote:
Input is this:

SET1_S_W CHAR(1) NOT NULL,
SET2_S_W CHAR(1) NOT NULL,
SET3_S_W CHAR(1) NOT NULL,
SET4_S_W CHAR(1) NOT NULL,
;

.py says:

import re, string, sys
s_ora = re.compile('.*S_W.*')
lines = open("y.sql").readlines()
for i in range(len(lines)):
try:
if s_ora.search(lines): del lines
except IndexError:
open("z.sql","w").writelines(lines)

but output is:

SET2_S_W CHAR(1) NOT NULL,
SET4_S_W CHAR(1) NOT NULL,
;

It should delete every, not every other!


No, it should delete every other line since that is what happens if you use
an index to iterate over a list while deleting items from the same list.
Whenever you delete an item the following items shuffle down and then you
increment the loop counter which skips over the next item.

The fact that you got an IndexError should have been some sort of clue that
your code was going to go wrong.

Try one of these:
iterate backwards
iterate over a copy of the list but delete from the original
build a new list containing only those lines you want to keep

also, the regex isn't needed here, and you should always close files when
finished with them.

Something like this should work (untested):

s_ora = 'S_W'
input = open("y.sql")
try:
lines = [ line for line in input if s_ora in line ]
finally:
input.close()

output = open("z.sql","w")
try:
output.write(str.join('', lines))
finally:
output.close()
 
F

Fredrik Lundh

but output is:

SET2_S_W CHAR(1) NOT NULL,
SET4_S_W CHAR(1) NOT NULL,

It should delete every, not every other!

for i in range(len(lines)):
try:
if s_ora.search(lines): del lines
except IndexError:
...

when you loop over a range, the loop counter is incremented also if you delete
items. but when you delete items, the item numbering changes, so you end up
skipping over an item every time the RE matches.

to get rid of all lines for which s_ora.search matches, try this

lines = [line for line in lines if not s_ora.search(line)]

for better performance, get rid of the leading and trailing ".*" parts of your
pattern, btw. not that it matters much in this case (unless the SQL state-
ment is really huge).

</F>
 
K

Kent Johnson

Input is this:

SET1_S_W CHAR(1) NOT NULL,
SET2_S_W CHAR(1) NOT NULL,
SET3_S_W CHAR(1) NOT NULL,
SET4_S_W CHAR(1) NOT NULL,
;

.py says:

import re, string, sys
s_ora = re.compile('.*S_W.*')
lines = open("y.sql").readlines()
for i in range(len(lines)):
try:
if s_ora.search(lines): del lines


When you delete for example lines[0], the indices of the following lines change. So the former
lines[1] is now lines[0] and will not be checked.

The simplest way to do this is with a list comprehension:
lines = [ line for line in lines if not s_ora.search(line) ]

Even better, there is no need to make the intermediate list of all lines, you can say
lines = [ line for line in open("y.sql") if not s_ora.search(line) ]

In Python 2.4 you don't have to make a list at all, you can just say
open("z.sql","w").writelines(line for line in open("y.sql") if not s_ora.search(line))

;)

Kent
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,218
Messages
2,571,124
Members
47,727
Latest member
smavolo

Latest Threads

Top