H
hofer
Hi,
Something I have to do very often is filtering / transforming line
based file contents and storing the result in an array or a
dictionary.
Very often the functionallity exists already in form of a shell script
with sed / awk / grep , . . .
and I would like to have the same implementation in my script
What's a compact, efficient (no intermediate arrays generated /
regexps compiled only once) way in python
for such kind of 'pipe line'
Example 1 (in bash): (annotated with comment (thus not working) if
copied / pasted
#-------------------------------------------------------------------------------------------
cat file \ ### read from file
| sed 's/\.\..*//' \ ### remove '//' comments
| sed 's/#.*//' \ ### remove '#' comments
| grep -v '^\s*$' \ ### get rid of empty lines
| awk '{ print $1 + $2 " " $2 }' \ ### knowing, that all remaining
lines contain always at least
\ ### two integers calculate
sum and 'keep' second number
| grep '^42 ' ### keep lines for which sum is 42
| awk '{ print $2 }' ### print number
Same example in perl:
# I guess (but didn't try), taht the perl example will create more
intermediate
# data structures than necessary.
# Ideally the python implementation shouldn't do this, but just
'chain' iterators.
#-------------------------------------------------------------------------------------------
my $filename= "file";
open(my $fh,$filename) or die "failed opening file $filename";
# order of 'pipeline' is syntactically reversed (if compared to shell
script)
my @numbers =
map { $_->[1] } # extract num 2
grep { $_->[0] == 42 } # keep lines with result 42
map { [ $_->[0]+$_->[1],$_->[1] ] } # calculate sum of first two
nums and keep second num
map { [ split(' ',$_,3) ] } # split by white space
grep { ! ($_ =~ /^\s*$/) } # remove empty lines
map { $_ =~ s/#.*// ; $_} # strip '#' comments
map { $_ =~ s/\/\/.*// ; $_} # strip '//' comments
<$fh>;
print "Numbers are:\n",join("\n",@numbers),"\n";
thanks in advance for any suggestions of how to code this (keeping the
comments)
H
Something I have to do very often is filtering / transforming line
based file contents and storing the result in an array or a
dictionary.
Very often the functionallity exists already in form of a shell script
with sed / awk / grep , . . .
and I would like to have the same implementation in my script
What's a compact, efficient (no intermediate arrays generated /
regexps compiled only once) way in python
for such kind of 'pipe line'
Example 1 (in bash): (annotated with comment (thus not working) if
copied / pasted
#-------------------------------------------------------------------------------------------
cat file \ ### read from file
| sed 's/\.\..*//' \ ### remove '//' comments
| sed 's/#.*//' \ ### remove '#' comments
| grep -v '^\s*$' \ ### get rid of empty lines
| awk '{ print $1 + $2 " " $2 }' \ ### knowing, that all remaining
lines contain always at least
\ ### two integers calculate
sum and 'keep' second number
| grep '^42 ' ### keep lines for which sum is 42
| awk '{ print $2 }' ### print number
Same example in perl:
# I guess (but didn't try), taht the perl example will create more
intermediate
# data structures than necessary.
# Ideally the python implementation shouldn't do this, but just
'chain' iterators.
#-------------------------------------------------------------------------------------------
my $filename= "file";
open(my $fh,$filename) or die "failed opening file $filename";
# order of 'pipeline' is syntactically reversed (if compared to shell
script)
my @numbers =
map { $_->[1] } # extract num 2
grep { $_->[0] == 42 } # keep lines with result 42
map { [ $_->[0]+$_->[1],$_->[1] ] } # calculate sum of first two
nums and keep second num
map { [ split(' ',$_,3) ] } # split by white space
grep { ! ($_ =~ /^\s*$/) } # remove empty lines
map { $_ =~ s/#.*// ; $_} # strip '#' comments
map { $_ =~ s/\/\/.*// ; $_} # strip '//' comments
<$fh>;
print "Numbers are:\n",join("\n",@numbers),"\n";
thanks in advance for any suggestions of how to code this (keeping the
comments)
H