Consolidate several lines of a CSV file with firewall rules

S

Starriol

Hi guys.
I have a CSV file, which I created using an HTML export from a Check Point firewall policy.
Each rule is represented as several lines, in some cases. That occurs when a rule has several address sources, destinations or services.
I need the output to have each rule described in only one line.
It's easy to distinguish when each rule begins. In the first column, there's the rule ID, which is a number.

Let me show you an example:

NO.;NAME;SOURCE;DESTINATION;VPN  ;SERVICE;ACTION;TRACK;INSTALL ON;TIME;COMMENT
1;;fwxcluster;mcast_vrrp;;vrrp;accept;Log;fwxcluster;Any;"VRRP;;*Comment suppressed*
;;;;;igmp**;;;;;
2;;fwxcluster;fwxcluster;;FireWall;accept;Log;fwxcluster;Any;"Management FWg;*Comment suppressed*
;;fwmgmpe**;fwmgmpe**;;ssh**;;;;;
;;fwmgm**;fwmgm**;;;;;;;
3;NTP;G_NTP_Clients;cmm_ntpserver_pe01;;ntp;accept;None;fwxcluster;Any;*Comment suppressed*
;;;cmm_ntpserver_pe02**;;;;;;;

What I need ,explained in pseudo code, is this:

Read the first column of the next line. If there's a number:
Evaluate the first column of the next line. If there's no number there, concatenate (separating with a comma) \
the strings in the columns of this line with the last one and eliminate the text in the current one

The output should be something like this:

NO.;NAME;SOURCE;DESTINATION;VPN  ;SERVICE;ACTION;TRACK;INSTALL ON;TIME;COMMENT
1;;fwxcluster,fwmgmpe**,fwmgm**;mcast_vrrp,fwmgmpe**,fwmgm**;;vrrp,ssh**;accept;Log;fwxcluster;Any;*Comment suppressed*
;;;;;;;;;;
;;;;;;;;;;
3;NTP;G_NTP_Clients;cmm_ntpserver_pe01,cmm_ntpserver_pe02**;;ntp;accept;None;fwxcluster;Any;*Comment suppressed*
;;;;;;;;;;

The empty lines are there only to be more clear, I don't actually need them.

Thanks!
 
J

Joel Goldstick

Hi guys.
I have a CSV file, which I created using an HTML export from a Check Point firewall policy.
Each rule is represented as several lines, in some cases. That occurs when a rule has several address sources, destinations or services.
I need the output to have each rule described in only one line.
It's easy to distinguish when each rule begins. In the first column, there's the rule ID, which is a number.

Let me show you an example:

NO.;NAME;SOURCE;DESTINATION;VPN  ;SERVICE;ACTION;TRACK;INSTALL ON;TIME;COMMENT
1;;fwxcluster;mcast_vrrp;;vrrp;accept;Log;fwxcluster;Any;"VRRP;;*Comment suppressed*
;;;;;igmp**;;;;;
2;;fwxcluster;fwxcluster;;FireWall;accept;Log;fwxcluster;Any;"Management FWg;*Comment suppressed*
;;fwmgmpe**;fwmgmpe**;;ssh**;;;;;
;;fwmgm**;fwmgm**;;;;;;;
3;NTP;G_NTP_Clients;cmm_ntpserver_pe01;;ntp;accept;None;fwxcluster;Any;*Comment suppressed*
;;;cmm_ntpserver_pe02**;;;;;;;

What I need ,explained in pseudo code, is this:

Read the first column of the next line. If there's a number:
Evaluate the first column of the next line. If there's no number there, concatenate (separating with a comma) \
the strings in the columns of this line with the last one and eliminate the text in the current one

The output should be something like this:

NO.;NAME;SOURCE;DESTINATION;VPN  ;SERVICE;ACTION;TRACK;INSTALL ON;TIME;COMMENT
1;;fwxcluster,fwmgmpe**,fwmgm**;mcast_vrrp,fwmgmpe**,fwmgm**;;vrrp,ssh**;accept;Log;fwxcluster;Any;*Comment suppressed*
;;;;;;;;;;
;;;;;;;;;;
3;NTP;G_NTP_Clients;cmm_ntpserver_pe01,cmm_ntpserver_pe02**;;ntp;accept;None;fwxcluster;Any;*Comment suppressed*
;;;;;;;;;;

The empty lines are there only to be more clear, I don't actually need them.

Thanks!

I think you posted twice, and perhaps in html? Its hard to read.

At any rate, there is a csv module in python that will let you gather
your data in a list of lists. With that you can iterate through the
csv rows, saving rows with a number in the first position. Iterate
and append the rows below that until you run into another row with a
number in the first position.

Why don't you write some code, see how it goes, copy and paste the
code back here with full traceback if you get an error or with your
results if you have some. Do it for a subset of a couple of rows of
input data.
 
T

Tim Chase

NO.;NAME;SOURCE;DESTINATION;VPN  ;SERVICE;ACTION;TRACK;INSTALL
ON;TIME;COMMENT
1;;fwxcluster;mcast_vrrp;;vrrp;accept;Log;fwxcluster;Any;"VRRP;;*Comment
suppressed* ;;;;;igmp**;;;;;
2;;fwxcluster;fwxcluster;;FireWall;accept;Log;fwxcluster;Any;"Management
FWg;*Comment
suppressed* ;;fwmgmpe**;fwmgmpe**;;ssh**;;;;; ;;fwmgm**;fwmgm**;;;;;;;
3;NTP;G_NTP_Clients;cmm_ntpserver_pe01;;ntp;accept;None;fwxcluster;Any;*Comment
suppressed* ;;;cmm_ntpserver_pe02**;;;;;;;
What I need ,explained in pseudo code, is this:

Read the first column of the next line. If there's a number:
Evaluate the first column of the next line. If there's no
number there, concatenate (separating with a comma) \ the strings
in the columns of this line with the last one and eliminate the
text in the current one

The output should be something like this:

NO.;NAME;SOURCE;DESTINATION;VPN  ;SERVICE;ACTION;TRACK;INSTALL
ON;TIME;COMMENT
1;;fwxcluster,fwmgmpe**,fwmgm**;mcast_vrrp,fwmgmpe**,fwmgm**;;vrrp,ssh**;accept;Log;fwxcluster;Any;*Comment
suppressed* ;;;;;;;;;; ;;;;;;;;;;
3;NTP;G_NTP_Clients;cmm_ntpserver_pe01,cmm_ntpserver_pe02**;;ntp;accept;None;fwxcluster;Any;*Comment
suppressed* ;;;;;;;;;;

The empty lines are there only to be more clear, I don't actually
need them.

Though there are a couple oddities in your source data, I'll take a
crack at it. First, there are the dangling open-quotes on #1 that
cause most CSV parsers (I tested both Gnumeric and Python's csv
module) to read until the subsequent line is read. If this is
intentional, then it's all good. If not, you get weird behaviors.
That said, you can try the code below (adjusting the 3 lines for
your desired filenames and whether you *want* to write emptied rows)
to see if it gets you what you want:

##########################################################

import csv
# adjust these 3 lines
WRITE_EMPTIES = True
INFILE = "input.txt"
OUTFILE = "output.txt"
with open(INFILE, "r") as in_file:
r = csv.reader(in_file, delimiter=";")
with open(OUTFILE, "wb") as out_file:
previous = None
empties_to_write = 0
out_writer = csv.writer(out_file, delimiter=";")
for i, row in enumerate(r):
first_val = row[0].strip()
if first_val:
if previous:
out_writer.writerow(previous)
if WRITE_EMPTIES and empties_to_write:
out_writer.writerows(
[["" for _ in previous]] * empties_to_write
)
empties_to_write = 0
previous = row
else: # append sub-portions to each other
previous = [
",".join(
subitem
for subitem in existing.split(",") + [new]
if subitem
)
for existing, new in zip(previous, row)
]
empties_to_write += 1
if previous: # take care of the last row
out_writer.writerow(previous)
if WRITE_EMPTIES and empties_to_write:
out_writer.writerows(
[["" for _ in previous]] * empties_to_write
)
####################################################

Hope this helps,

-tkc
 
T

Tim Chase

the dangling open-quotes on #1 that cause most CSV parsers to read
until the subsequent line is read.

And by "subsequent line", I mean "subsequent closing-quote" of
course. :)

-tkc
 
S

Starriol

And by "subsequent line", I mean "subsequent closing-quote" of

course. :)



-tkc

Ha, thanks a million, Tim!
That worked great!
Hopefully soon enough I'm going to be able to write it on my own without having to bother you guys!
 
M

Mark Lawrence

Ha, thanks a million, Tim!
That worked great!
Hopefully soon enough I'm going to be able to write it on my own without having to bother you guys!

If in future you do have to bother us we'll not object, but you'll
certainly be less bother if you read and digest this first
https://wiki.python.org/moin/GoogleGroupsPython :)

--
Roses are red,
Violets are blue,
Most poems rhyme,
But this one doesn't.

Mark Lawrence
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,732
Latest member
ArronPalin

Latest Threads

Top