Help with end of line charaters

A

Andy

Hi,
I'm new to perl but need to write a script that takes a file and
formats lines.

The file has to 2 fields that are tab separated and each field is made
up of items separated by some type of linefeed character. The end of
the second field is identified by another type of linefeed character.

When I view the file in VIM the second linefeed shows as a ^M so there
must be a way of identifying these separately.

I have tried searching for \r \n %CR %LF $VT $FF but nothing seems to
give the required effect.

I need to run the script on a PC.

Please can you offer some advice or possible places to look.

Thanks

Andy
 
T

Tad McClellan

Andy said:
The file has to 2 fields that are tab separated and each field is made
up of items separated by some type of linefeed character.


There is only ONE type of linefeed character, ASCII defines
the LF character as character #10 (decimal).

The end of
the second field is identified by another type of linefeed character.


There is no "other type" of linefeed character.

When I view the file in VIM the second linefeed shows as a ^M


That is not a "linefeed" (LF) character, that is a "carriage return" (CR)
character (#13 in ASCII).

I have tried searching for \r \n %CR %LF $VT $FF but nothing seems to
give the required effect.


I cannot tell what effect it is that you require...

Please can you offer some advice or possible places to look.


Show us your data.
 
B

Bob Walton

Andy wrote:

....
The file has to 2 fields that are tab separated and each field is made
up of items separated by some type of linefeed character. The end of
---------------------------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


the second field is identified by another type of linefeed character.

------------------------------------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I was only aware of one "type of linefeed character", the newline, or
0x0a. You'll have to fill us in on this new information.

When I view the file in VIM the second linefeed shows as a ^M so there
must be a way of identifying these separately.


^M would be a carriage return (0x0d). In VIM, the file was probably
interpreted as a "UNIX" file (it will say so on the status line if it
was, assuming you have the status line turned on) but actually has
Windoze line endings. Maybe the very first line has just a newline, and
the rest have Windoze line endings?

I have tried searching for \r \n %CR %LF $VT $FF but nothing seems to
give the required effect.

I need to run the script on a PC.

Please can you offer some advice or possible places to look. ....


Andy

You need to identify exactly what characters are present in your data
file. Since it looks like you're running Windoze (you say PC, so it
could be Linux, BSD, DOS, OS/2, etc, I suppose), you don't have a dump
command -- so you can use the following to give you a nice dump of your
file contents:

die "One argument required: name of file to dump" unless scalar @ARGV;
open IN,$ARGV[0] or die "Oops, couldn't open $ARGV[0], $!";
binmode IN;
if(scalar(@ARGV)>1){
open OUT,">$ARGV[1]" or die "Oops, couldn't open $ARGV[1] for
write, $!"
;
}
else{
*OUT=*STDOUT;
}
while($l=read(IN,$in,16)){
@in=split '',$in;
for(@in){$_=ord $_}
printf OUT "%08x ",$add;
$i=0;
for(0..15){
if($_<@in){printf OUT "%02x ",$in[$_]}
else{printf OUT " "}
$i++;
printf OUT " " if $i%4==0;
printf OUT " " if $i==8;
}
$in=~y [\000-\037] [.];
$in=~y [\177-\377] [.];
print OUT "$in\n";
$add+=16;
}

Then you can use ord() to look at what it is Perl is actually reading in
your program. Note that Perl will make an attempt to "fix" Windoze line
endings (that's a feature) upon reading (if it is Win32 Perl), so what
shows up internally in Perl may be different that what is in the file.
If all is well, a file with Windoze line endings will actually contain
newline (or UNIX) line endings internally in Perl -- and the Windoze
line endings will be regenerated at output time. That way, the Perl
internals can deal with a constant, rather than have to deal with OS
differences.
 
T

Tad McClellan

Bob Walton said:
Since it looks like you're running Windoze (you say PC, so it
could be Linux, BSD, DOS, OS/2, etc, I suppose), you don't have a dump
command


I have a Perl program named "xdump". Looks like I copied it
from one of the Camel books...


--------------------------------------------------------------
#!/usr/bin/perl
# copied from the Camel book, page 272

open(STDIN, $ARGV[0]) || die "Can't open '$ARGV[0]' ($!)\n";

while (($len = read(STDIN, $data, 16)) == 16) {
@array = unpack('N4', $data);
$data =~ tr/\0-\37\177-\377/./;
printf "%8.8ld %8.8lx %8.8lx %8.8lx %8.8lx %s\n",
$offset, @array, $data;
$offset += 16;
}

if ($len) {
@array = unpack('C*', $data);
$data =~ y/\0-\37\177-\377/./;
for (@array) {
$_ = sprintf('%2.2x', $_);
}
push(@array, ' ') while $len++ < 16;
$data =~ s/[^ -~]/./g;
printf "%8.8ld ", $offset;
printf "%s%s%s%s %s%s%s%s %s%s%s%s %s%s%s%s %s\n",
@array, $data;
}
 
A

Andy

Thanks everyone for your help you have helped me solve the problem.

The issue was that I was opening the file in the wrong mode once I
used binmode I could then tell the difference between the CR and the
LF and as you all pointed out the what I thought was a different type
of linefeed was just the standard CR LF.

Thanks for your help

Regards

Andy Westcott
 
J

Joe Smith

Andy said:
The file has to 2 fields that are tab separated and each field is made
up of items separated by some type of linefeed character. The end of
the second field is identified by another type of linefeed character.

One way of dealing with Excel-style tab-separated-values is:
binmode IN;
while(<IN>){
s/\n/ /; # Change soft returns to a blank
s/\r /\n/; # Change what used to be CR+LF to just LF;
...
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,146
Messages
2,570,832
Members
47,374
Latest member
EmeliaBryc

Latest Threads

Top