Removing windows CR-LF from middle of text

V

v796

Hi,
I have text on multiple lines in text files on Windows. Example:
" Your voting instructions have been
received.
Thank you! "

I need a regex to remove CR-LF and convert them to space in the middle
of the text. I tried this regex which works at the end.
$content =~ s/^([\s\S]+\S)\s*$/$1/;

This works to remove all CR-LF of Windows at the end an puts into $1.
I must remove the CR-LF in the middle too. How can this be done?

My script is running on Linux.
Finally, I have to import $content into a single column in Excel file.
Unfortunately, the excel file requirement just came up yesterday, else
I would have used CPAN WriteExcel module. Oh well!

Currently the text in excel writes to multiple lines in Excel file,
when it should only be in 1 cell.

Any comments are appreciated.

Thanks,
vk
 
M

Mark Clements

v796 said:
Hi,
I have text on multiple lines in text files on Windows. Example:
I need a regex to remove CR-LF and convert them to space in the middle
of the text. I tried this regex which works at the end.
$content =~ s/^([\s\S]+\S)\s*$/$1/;
Your re doesn't do this at all: it's merely stripping whitespace from the end of lines that end
with non-whitespace followed by whitespace. [\s\S]+ (match white-space or non-whitespace, one or
more times) will pretty much match anything. You could try dropping the end-of line anchor.

eg

$content =~ s/\r\n/ /g;
This works to remove all CR-LF of Windows at the end an puts into $1.
I must remove the CR-LF in the middle too. How can this be done?
Finally, I have to import $content into a single column in Excel file.
Unfortunately, the excel file requirement just came up yesterday, else
I would have used CPAN WriteExcel module. Oh well!
how does this prevent you from using it? It isn't going to be quicker to implement it from scratch?
Currently the text in excel writes to multiple lines in Excel file,
when it should only be in 1 cell.
what text in excel? what are you trying at the moment?

I suggest you write the output to a csv and get excel to import it, or using DBD::Excel. Failing
all that you could look at running Win32::OLE under Windows. You need to learn how to use CPAN...

regards,

Mark
 
J

Joe Smith

v796 said:
I have text on multiple lines in text files on Windows. Example:
" Your voting instructions have been
received.
Thank you! "

From what I've seen, text like that is really
" Your voting instructions have been\n received.\nThank you!"\r\n
in that the soft line breaks are bare LF and hard returns are CRLF.

This becomes visible when using
od -c whatever.csv
at the Unix/Linux command line.
I need a regex to remove CR-LF and convert them to space in the middle
of the text.

I've had some success with this:

binmode IN; # Preserve CR in input stream
local $\ = "\r\n"; # Read DOS-style records
while (<IN>) {
s/\n/ /gs; # Convert soft breaks into spaces
s/\r /\n/; # Convert what used to be CRLF to plain LF
...
}
Currently the text in excel writes to multiple lines in Excel file,
when it should only be in 1 cell.

The above code recognizes that as a single cell on a line.

Of course, using a CPAN module to parse CSV would be better.
-Joe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,816
Latest member
SapanaCarpetStudio

Latest Threads

Top