One-liners: single quotes; altering first line only; printing thechanges?

A

Adam Funk

I had some very large RDF-XML files that had been incorrectly
generated with the prolog

<?xml version='1.0' encoding='UTF8'?>

which I wanted to change to

<?xml version='1.0' encoding='UTF-8'?>

so I used the following command.

perl -pi.bak -e 's!^(<\?xml version=.1\.0. encoding=.)UTF8(.\?\>)!\1UTF-8\2!' *.rdf


It worked, but I have three questions about doing it better.

1. Is there any way to specify single quotes (') in the pattern? (I
realize this is at least as much of a shell problem as a Perl
problem; this is in bash on GNU/Linux.)

2. Is it possible to tell the command to look at the first line of
each file only? (These were very large files.)

3. Is it possible to make a perl -i command print to STDOUT the
changes it makes (and only the changed lines)?

Thanks,
Adam
 
J

John W. Krahn

Adam said:
I had some very large RDF-XML files that had been incorrectly
generated with the prolog

<?xml version='1.0' encoding='UTF8'?>

which I wanted to change to

<?xml version='1.0' encoding='UTF-8'?>

so I used the following command.

perl -pi.bak -e 's!^(<\?xml version=.1\.0. encoding=.)UTF8(.\?\>)!\1UTF-8\2!' *.rdf

You should use $1 and $2 in the replacement string instead of \1 and \2.

It worked, but I have three questions about doing it better.

1. Is there any way to specify single quotes (') in the pattern? (I
realize this is at least as much of a shell problem as a Perl
problem; this is in bash on GNU/Linux.)

$ perl -le'print "\047" x 10'
''''''''''
2. Is it possible to tell the command to look at the first line of
each file only? (These were very large files.)

No. Unless you only want one line left in the new files.

3. Is it possible to make a perl -i command print to STDOUT the
changes it makes (and only the changed lines)?

Yes, but only if you explicitly use the STDOUT filehandle because with
the -i switch the default output filehandle is ARGVOUT.



John
 
S

smallpond

I had some very large RDF-XML files that had been incorrectly
generated with the prolog

<?xml version='1.0' encoding='UTF8'?>

which I wanted to change to

<?xml version='1.0' encoding='UTF-8'?>

so I used the following command.

perl -pi.bak -e 's!^(<\?xml version=.1\.0. encoding=.)UTF8(.\?\>)!\1UTF-8\2!' *.rdf

It worked, but I have three questions about doing it better.
2. Is it possible to tell the command to look at the first line of
   each file only?  (These were very large files.)

s/UTF8/UTF-8/ if 1 .. 1; # use the line range op
 
P

Peter Makholm

John W. Krahn said:
No. Unless you only want one line left in the new files.

You could do something like:

perl -pi -e 'close ARGV if eof; next if ($. > 1)..0; s/.../.../;'

it would still read every line, but only the first would be
processed. Tested, but I'm not sure I would recommend it.

//Makholm
 
P

Peter Makholm

smallpond said:
s/UTF8/UTF-8/ if 1 .. 1; # use the line range op

Only changes the first line in the first file. You have to reset $. at
some point. But this works:

perl -pi -e 's/.../.../ if 1..1; close ARGV if eof;' *

I don't know why I insisted on using next which forced me to close
before the substitution.

//Makholm
 
T

Tad J McClellan

Adam Funk said:
I had some very large RDF-XML files that had been incorrectly
generated with the prolog

<?xml version='1.0' encoding='UTF8'?>

which I wanted to change to

<?xml version='1.0' encoding='UTF-8'?>

so I used the following command.

perl -pi.bak -e 's!^(<\?xml version=.1\.0. encoding=.)UTF8(.\?\>)!\1UTF-8\2!' *.rdf


It worked, but I have three questions about doing it better.

1. Is there any way to specify single quotes (') in the pattern? (I
realize this is at least as much of a shell problem as a Perl
problem; this is in bash on GNU/Linux.)


Single quotes won't bother the shell if you use double quotes on
the argument instead of single quotes.

Most people stick with slash for the delimiter unless slash is
part of their pattern.

perl -pi.bak -e "s/\Q<?xml version='1.0' encoding='UTF8'?>/<?xml version='1.0' encoding='UTF-8'?>/"


You should use $1 instead of \1 in replacement strings.

You could use an escape sequence for the single quote character
(see "Quote and Quote-like Operators" in perlop):

2. Is it possible to tell the command to look at the first line of
each file only? (These were very large files.)


That depends on what you mean by "look at".

If you mean: only attempt the s/// on the 1st line,
then yes, that is easy to do:

s/// if $. == 1;

If you mean: process only the 1st line, then no, you can't do that
in conjunction with in-place editing or you'd end up with a 1-line file.

3. Is it possible to make a perl -i command print to STDOUT the
changes it makes (and only the changed lines)?


Not in the general case, but you can for this particular case:

perl -pi.bak -e "print STDOUT if s/\Q<?xml version='1.0' encoding='UTF8'?>/<?xml version='1.0' encoding='UTF-8'?>/"
 
P

Peter J. Holzer

perl -pi.bak -e 's!^(<\?xml version=.1\.0. encoding=.)UTF8(.\?\>)!\1UTF-8\2!' *.rdf


It worked, but I have three questions about doing it better.

1. Is there any way to specify single quotes (') in the pattern? (I
realize this is at least as much of a shell problem as a Perl
problem; this is in bash on GNU/Linux.)

\x{27} (or \047 if you prefer to think in octal) should work.

2. Is it possible to tell the command to look at the first line of
each file only? (These were very large files.)

You have to read and copy the whole file to change a line in it - that's
unavoidable. You can only do the substitution on line 1 with something
like

$. == 1 and s!...!...!

I'm not sure if this is much of s speedup.
3. Is it possible to make a perl -i command print to STDOUT the
changes it makes (and only the changed lines)?

I haven't tested it, but the docs say that the output file is
"selected". I take that to mean that STDOUT is unchanged and

print STDOUT "whatever"

should do what you want.

hp
 
A

Adam Funk

Single quotes won't bother the shell if you use double quotes on
the argument instead of single quotes.

They sure wouldn't work for me.
Most people stick with slash for the delimiter unless slash is
part of their pattern.

perl -pi.bak -e "s/\Q<?xml version='1.0' encoding='UTF8'?>/<?xml version='1.0' encoding='UTF-8'?>/"

I know. I tend to use "!" with XML files because I often need the "/"
in the said:
You should use $1 instead of \1 in replacement strings.

[reminds himself again]
You could use an escape sequence for the single quote character
(see "Quote and Quote-like Operators" in perlop):

perl -pi.bak -e 's!^(<\?xml version=\x271\.0\x27 encoding=\x27)UTF8(\x27\?\>)!$1UTF-8$2!'
Thanks.



That depends on what you mean by "look at".

If you mean: only attempt the s/// on the 1st line,
then yes, that is easy to do:

s/// if $. == 1;

That's what I mean. If I'd thought of that, I wouldn't have
conglomerated such a long substitution pattern (and felt the need to
check that the file sizes all differed by 1 afterwards).
Not in the general case, but you can for this particular case:

perl -pi.bak -e "print STDOUT if s/\Q<?xml version='1.0' encoding='UTF8'?>/<?xml version='1.0' encoding='UTF-8'?>/"

I'll take a look at that. Thanks.
 
A

Adam Funk

\x{27} (or \047 if you prefer to think in octal) should work.

Right, thanks.
You have to read and copy the whole file to change a line in it - that's
unavoidable. You can only do the substitution on line 1 with something
like

$. == 1 and s!...!...!

I'm not sure if this is much of s speedup.

I'd expect it to speed things up a bit, since the first test should be
faster than the regex-matching. It would also eliminate the risk of
changing anything past the XML prolog.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,969
Messages
2,570,161
Members
46,705
Latest member
Stefkari24

Latest Threads

Top