CR-LF translation

R

Raider

What layer performs translation from "\r\n" to single '\n' when reading
text files by C++ file streams?
Is translation performed by locale's code_cvt facet or by OS? May be
even by C library that is used by file streams to handle files?

I'm trying to get such functionality when using utf8 facet. It requires
file to be opened in binary mode and this lead to leak of CR-LF
transformations.

Raider
 
V

Victor Bazarov

Raider said:
What layer performs translation from "\r\n" to single '\n' when
reading text files by C++ file streams?

The OS, usually.
Is translation performed by locale's code_cvt facet or by OS? May be
even by C library that is used by file streams to handle files?

What would it matter? It is performed by a layer *out of our control*.
The Stadnard does not specify the difference between "binary" and "text"
modes. It just acknowledges that there is one.
I'm trying to get such functionality when using utf8 facet. It
requires file to be opened in binary mode and this lead to leak of
CR-LF transformations.

I don't understand the end of that statement. "this lead to leak of"?
I am guessing you're saying you need to transform \r\n pairs into single
\n chars yourself? Yes, I suppose. Do the \r hurt? Couldn't you just
ignore them?

V
 
P

P.J. Plauger

What layer performs translation from "\r\n" to single '\n' when reading
text files by C++ file streams?
Is translation performed by locale's code_cvt facet or by OS? May be
even by C library that is used by file streams to handle files?

It can be any of the above, but usually if you open a file in text
mode the mapping is done for you.
I'm trying to get such functionality when using utf8 facet. It requires
file to be opened in binary mode and this lead to leak of CR-LF
transformations.

Then it's not a very good codecvt facet.

P.J. Plauger
Dinkumware, Ltd.
http://www.dinkumware.com
 
I

Ivan Vecerina

: What layer performs translation from "\r\n" to single '\n' when reading
: text files by C++ file streams?
: Is translation performed by locale's code_cvt facet or by OS? May be
: even by C library that is used by file streams to handle files?

The translation is handled within the stream buffer (basic_filebuf).
I think that the exact approach is an implementation detail, but
it will most likely be handling at the file-reading level.

: I'm trying to get such functionality when using utf8 facet. It requires
: file to be opened in binary mode and this lead to leak of CR-LF
: transformations.

I'm no expert in codecvt, but you might be able to "emulate"
the CRLF->LF conversion at that level...

hth -Ivan
 
R

Raider

I am guessing you're saying you need to transform \r\n pairs into single
\n chars yourself? Yes, I suppose. Do the \r hurt? Couldn't you just
ignore them?

At first, I need to cut \r at the end of line after std::getline(). At
second, I need to use my own modifier instead of std::endl to write
\r\n. All of these I need to do under MS-DOS and Windows. At Mac I think
I need to cut \r at the beggining of the line and write \n\r instead of
usual std::endl modifier. It's boring. I'm trying to find better way...

Raider
 
P

Pete Becker

Raider said:
At first, I need to cut \r at the end of line after std::getline(). At
second, I need to use my own modifier instead of std::endl to write
\r\n. All of these I need to do under MS-DOS and Windows. At Mac I think
I need to cut \r at the beggining of the line and write \n\r instead of
usual std::endl modifier. It's boring. I'm trying to find better way...

That's what text mode is for.
 
T

Tilman Kuepper

Hello Raider,
I'm trying to get such functionality when using utf8 facet. It requires
file to be opened in binary mode and this lead to leak of CR-LF
transformations.

Instead of using codecvt facets, you might try to do the conversion
to/from UTF-8 using a "filtering stream-buffer":

ftp://ftp.cuj.com/pub/2005/cujmar2005.zip

Within "cujmar2005.zip" there is another archive "Kuepper.zip".
Open the file "utf8.h" and see the example at the end of this file.

Using this converter, you can open the UTF-8 file in text mode
instead of binary mode. So the line-endings should be okay.

Good luck,
Tilman
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top