parse comma delimited text string

E

electrixnow

in MS VC++ Express I need to know how to get from one comma delimited
text string to many strings.

from this:

main_string = "onE,Two,Three , fouR,five, six "

to these:

string1 = "one"
string2 = "two"
string3 = "three"
string4 = "four"
string5 = "five"
string6 = "six"

the white space needs to be removed and the case needs to be all upper
or lower.
The result needs to be strings and not pointers or addresses in memory
so I can test:

string test = "three";

if (three == "three")



some getlines will only have two strings some may have up to 10


Please help with examples, you input with exact syntax is greatly
appreciated

I have been struggling with not using pointers but every time I end up
using them.

Thanks in advance
 
I

Ian Collins

electrixnow said:
in MS VC++ Express I need to know how to get from one comma delimited
text string to many strings.

from this:

main_string = "onE,Two,Three , fouR,five, six "

to these:

string1 = "one"
string2 = "two"
string3 = "three"
string4 = "four"
string5 = "five"
string6 = "six"
For this, a couple of options are the good old C strtok, or the
std::string find members.
the white space needs to be removed and the case needs to be all upper
or lower.

See toxxxx() and isxxx() functions.

Looks too much like homework to me...
 
E

electrixnow

Howework, not even close. I am an 42 year old EE and not really a
programmer. I have an application at work to apply this to. I am trying
to compare two data files and create a third to let my department know
when new releases for drawing are ready. I have tried strtok, and
strtok_s but those functions allways create pointers. That is fine if I
knew how to easly change them back to STRINGS for string comparison. By
the way we have the same last name, that's wierd.

been out of school for too many years!

The only coding I have done was born, corn and c shell in the 80's. C++
is so much more robust, I am trying to learn more. I just purchased a
book C++ without fear. If you have any suggestions for books for VC++
let me know.

what is the best way to compare text strings like this:

if ( "test string" == pointer_returned_from_strtok )

that will get me going in the right directions.

Thanks!
 
B

Ben Pope

electrixnow said:
Howework, not even close. I am an 42 year old EE and not really a
programmer. I have an application at work to apply this to. I am trying
to compare two data files and create a third to let my department know
when new releases for drawing are ready. I have tried strtok, and
strtok_s but those functions allways create pointers. That is fine if I
knew how to easly change them back to STRINGS for string comparison. By

#include <string>
#include <iostream>

int main() {
// C style string
char mystring[] = "hello world";

// pointer to null terminated token
char* token = mystring + 6;

// create string from token
std::string newString(token);

if (newString == "world") {
std::cout << "Comparison True" << std::endl;
} else {
std::cout << "Comparison Fasle" << std::endl;
}
}
The only coding I have done was born, corn and c shell in the 80's. C++
is so much more robust, I am trying to learn more. I just purchased a
book C++ without fear. If you have any suggestions for books for VC++
let me know.

Accelerated C++, by Koenig and Moo.
what is the best way to compare text strings like this:

if ( "test string" == pointer_returned_from_strtok )

// create string on the fly
if ("test string" == std::string(pointer_returned_from_strtok))
that will get me going in the right directions.

Off you go then! :)

Ben Pope
 
A

Alf P. Steinbach

* electrixnow:
in MS VC++ Express I need to know how to get from one comma delimited
text string to many strings.

from this:

main_string = "onE,Two,Three , fouR,five, six "

to these:

string1 = "one"
string2 = "two"
string3 = "three"
string4 = "four"
string5 = "five"
string6 = "six"

the white space needs to be removed and the case needs to be all upper
or lower.
The result needs to be strings and not pointers or addresses in memory
so I can test:

string test = "three";

if (three == "three")



some getlines will only have two strings some may have up to 10


Please help with examples, you input with exact syntax is greatly
appreciated

I have been struggling with not using pointers but every time I end up
using them.

std::string uppercase( std::string const& s )
{
std::string result = s;
for( std::size_t i = 0; i < s.length(); ++i )
{
result = static_cast<char>( std::toupper( s ) );
}
return result;
}

typedef std::vector<std::string> StringVector;

StringVector stringsFrom( std::string s )
{
std::replace( s.begin(), s.end(), ',', ' ' );
std::istringstream stream( s );
StringVector result;
for( ;; )
{
std::string word;
if( !( stream >> word ) ) { break; }
result.push_back( uppercase( word ) );
}
return result;
}
 
S

Sumit Rajan

electrixnow said:
in MS VC++ Express I need to know how to get from one comma delimited
text string to many strings.

from this:

main_string = "onE,Two,Three , fouR,five, six "

You could try something like this:


#include <string>
#include <iostream>
#include <sstream>
#include <algorithm>
#include <cctype>


void trim(std::string& str)
{
std::string::size_type idx = str.find_first_not_of(' ');
if (idx != std::string::npos) {
str.erase(0, idx);
}
idx = str.find_last_not_of(' ');
if (idx != std::string::npos) {
str.erase(idx+1, str.size()-1);
}
}



int main()
{
std::string main_string = "onE,Two,Three , fouR,five, six ,,
d, , ";

std::istringstream iss(main_string);
std::string tok;

while (getline(iss, tok, ',')) {
std::transform(tok.begin(), tok.end(), tok.begin(),
std::tolower);
trim(tok);
std::cout << '!' << tok << "!\n";
}
}

Regards,
Sumit.
 
S

Sumit Rajan

Sumit Rajan said:
void trim(std::string& str)
{
std::string::size_type idx = str.find_first_not_of(' ');
if (idx != std::string::npos) {
str.erase(0, idx);
}
idx = str.find_last_not_of(' ');
if (idx != std::string::npos) {
str.erase(idx+1, str.size()-1);
}
}

Sorry, this function has a bug. May be simplest to use an std::istringstream
as you will see in Alf's post.

Sumit.
 
S

Sumit Rajan

Sumit Rajan said:
void trim(std::string& str)
{
std::string::size_type idx = str.find_first_not_of(' ');
if (idx != std::string::npos) {
str.erase(0, idx);
}
idx = str.find_last_not_of(' ');
if (idx != std::string::npos) {
str.erase(idx+1, str.size()-1);
}
}

void trim(std::string& str)
{
while((!str.empty()) && isspace(str[0])) {
str.erase(0, 1);
}
std::string::size_type idx = str.find_last_not_of(' ');
if (!((idx == std::string::npos)||(idx == str[str.size()-1]))) {
str.erase(idx+1, str.size()-1);
}
}

Regards,
Sumit.
 
D

Daniel T.

"electrixnow said:
in MS VC++ Express I need to know how to get from one comma delimited
text string to many strings.

from this:

main_string = "onE,Two,Three , fouR,five, six "

to these:

string1 = "one"
string2 = "two"
string3 = "three"
string4 = "four"
string5 = "five"
string6 = "six"

the white space needs to be removed and the case needs to be all upper
or lower.
The result needs to be strings and not pointers or addresses in memory
so I can test:

string test = "three";

if (three == "three")



some getlines will only have two strings some may have up to 10


Please help with examples, you input with exact syntax is greatly
appreciated

I have been struggling with not using pointers but every time I end up
using them.

Thanks in advance

Assuming there are no spaces other than the ones just before or just
after the commas, the following will work:

template < typename Out >
void fn( string str, Out it )
{
int (*lower)(int) = &tolower;
transform( str.begin(), str.end(), str.begin(), lower );
replace( str.begin(), str.end(), ',', ' ' );
stringstream ss( str );
copy( istream_iterator<string>( ss ), istream_iterator<string>(),
it );
}

Call the above with a container that has the space or using a
back_inserter. For example:

vector<string> vec;
fn( main_string, back_inserter( vec ) );

The only magic is replacing the comma's with spaces so that the
stringstream will ignore them.


If there are spaces that must be saved then the above won't work. For
example:

main_string = "John Smith, Mark Allen ,Joe";

If the above needs to be parsed to:
vec[0] = "john smith"
vec[1] = "mark allen"
vec[2] = "joe"

Here we can't just dump the commas...

// returns a string that has had the whitespace at the front
// and back removed without disturbing the whitespace in the middle
string strip( string str )
{
const char* const whitespace = " \t";
str.erase( 0, str.find_first_not_of( whitespace ) );
string::size_type pos = str.find_last_not_of( whitespace );
if ( pos != string::npos )
str.erase( pos + 1 );
return str;
}

template < typename Out >
void foo( string str, Out it )
{
int (*lower)(int) = &tolower;
transform( str.begin(), str.end(), str.begin(), lower );
string::size_type prev = 0;
for ( string::size_type pos = str.find( ',' ); pos != string::npos;
pos = str.find( ',', prev ) )
{
string s = strip( str.substr( prev, pos - prev ) );
it++ = s;
prev = pos + 1;
}
string s = strip( str.substr( prev ) );
it = s;
}
 
D

Daniel T.

"Daniel T. said:
If there are spaces that must be saved then the above won't work. For
example:

main_string = "John Smith, Mark Allen ,Joe";

If the above needs to be parsed to:
vec[0] = "john smith"
vec[1] = "mark allen"
vec[2] = "joe"

Here we can't just dump the commas...

// returns a string that has had the whitespace at the front
// and back removed without disturbing the whitespace in the middle
string strip( string str )
{
const char* const whitespace = " \t";
str.erase( 0, str.find_first_not_of( whitespace ) );
string::size_type pos = str.find_last_not_of( whitespace );
if ( pos != string::npos )
str.erase( pos + 1 );
return str;
}

template < typename Out >
void foo( string str, Out it )
{
int (*lower)(int) = &tolower;
transform( str.begin(), str.end(), str.begin(), lower );
string::size_type prev = 0;
for ( string::size_type pos = str.find( ',' ); pos != string::npos;
pos = str.find( ',', prev ) )
{
string s = strip( str.substr( prev, pos - prev ) );
it++ = s;
prev = pos + 1;
}
string s = strip( str.substr( prev ) );
it = s;
}

I liked Sumit Rajan's use of getline... Modified my code to use it.

string& strip( string& str )
{
const char* const whitespace = " \t";
str.erase( 0, str.find_first_not_of( whitespace ) );
string::size_type pos = str.find_last_not_of( whitespace );
if ( pos != string::npos )
str.erase( pos + 1 );
return str;
}

template < typename Out >
void fn( string str, Out it )
{
int (*lower)(int) = &tolower;
transform( str.begin(), str.end(), str.begin(), lower );
stringstream ss( str );
string s;
while ( getline( ss, s, ',' ) )
it++ = strip( s );
}
 
D

Daniel T.

"Sumit Rajan said:
Sorry, this function has a bug. May be simplest to use an std::istringstream
as you will see in Alf's post.

What bug do you see? The only bug I can find is that if 'str' is nothing
but spaces then the function doesn't do anything.
 
S

Shaun

Since your using VC++ express, are you using standard C++ or C++/CLI?
I ask becasue if your using the managed String classes, you can very
easily do what you wish as follows:

System::String^ s = "1,2,3,4,5";
cli::array<System::String^>^ s2 = s->Split(',');
// now use s2[0]...s2[s2->Length] as needed.

If using standard C++, then the suggestions already provided by the
other posters are worth considering.
 
D

Dietmar Kuehl

electrixnow said:
the white space needs to be removed and the case needs to be all upper
or lower.

My favorite approach would be to use a filtering stream buffer
to do the clean-up and use normal string input functions otherwise:

#include <iostream>
#include <streambuf>
#include <algorithm>
#include <iterator>
#include <string>
#include <vector>
#include <stdlib.h>

char clean(char i)
{
return i == ','? ' '
: std::tolower(static_cast<unsigned char>(i));
}

struct cleanbuf:
std::streambuf
{
enum { s_size = 1024 };
cleanbuf(std::streambuf* sbuf): m_sbuf(sbuf) {}
int underflow()
{
std::streamsize size = m_sbuf->sgetn(m_buf, s_size);
setg(m_buf, m_buf, m_buf + size);
std::transform(m_buf + 0, m_buf + size, m_buf, clean);
return gptr() == egptr()
? std::char_traits<char>::eof(): *gptr();
}
private:
std::streambuf* m_sbuf;
char m_buf[s_size];
};

int main()
{
cleanbuf sbuf(std::cin.rdbuf());
std::istream in(&sbuf);
std::vector<std::string> strs(
(std::istream_iterator<std::string>(in)),
(std::istream_iterator<std::string>()));
std::copy(strs.begin(), strs.end(),
std::eek:stream_iterator<std::string>(std::cout, "\n"));
}
~
 
S

Sumit Rajan

What bug do you see? The only bug I can find is that if 'str' is nothing
but spaces then the function doesn't do anything.

Precisely. My other post on this thread takes care of it.

Regards,
Sumit.
 
D

Daniel T.

"Sumit Rajan said:
Precisely. My other post on this thread takes care of it.

I think your other post was more complicated than necessary...

str.erase( 0, str.find_first_not_of( whitespace ) );

Takes care if it without the extra conditionals/loops.
 
D

Daniel T.

Dietmar Kuehl said:
My favorite approach would be to use a filtering stream buffer
to do the clean-up and use normal string input functions otherwise:

Unfortunately, your code doesn't compile on my system. I had to change
the vector<string> c_tor call to:

vector<string> strs;
copy ( istream_iterator<string>(in), istream_iterator<string>(),
back_inserter(strs) );

Also, your code only works for the simple case, where there is no
whitespace *within* a string delimited by commas. How would you change
it to account for the more complicated case?

The nice thing about your code is that it does everything in a single
pass whereas the code I presented to date requires two passes (one to
call tolower, and a second to remove the commas.) It would be a simple
change to make mine one pass as well...

template < typename Out >
void fn( string str, Out it )
{
transform( str.begin(), str.end(), str.begin(), clean );
// the above uses your 'clean' function.
stringstream ss( str );
copy( istream_iterator<string>( ss ), istream_iterator<string>(),
it );
}

This seems much more straight forward and needs fewer lines of code.

I must say that every contribution so far has shed new light on the
problem for me and caused me to improve my code in some way. IMHO, this
is usenet at its best.
#include <iostream>
#include <streambuf>
#include <algorithm>
#include <iterator>
#include <string>
#include <vector>
#include <stdlib.h>

char clean(char i)
{
return i == ','? ' '
: std::tolower(static_cast<unsigned char>(i));
}

struct cleanbuf:
std::streambuf
{
enum { s_size = 1024 };
cleanbuf(std::streambuf* sbuf): m_sbuf(sbuf) {}
int underflow()
{
std::streamsize size = m_sbuf->sgetn(m_buf, s_size);
setg(m_buf, m_buf, m_buf + size);
std::transform(m_buf + 0, m_buf + size, m_buf, clean);
return gptr() == egptr()
? std::char_traits<char>::eof(): *gptr();
}
private:
std::streambuf* m_sbuf;
char m_buf[s_size];
};

int main()
{
cleanbuf sbuf(std::cin.rdbuf());
std::istream in(&sbuf);
std::vector<std::string> strs(
(std::istream_iterator<std::string>(in)),
(std::istream_iterator<std::string>()));
std::copy(strs.begin(), strs.end(),
std::eek:stream_iterator<std::string>(std::cout, "\n"));
}
~
 
S

Sumit Rajan

Daniel T. said:
I think your other post was more complicated than necessary...

str.erase( 0, str.find_first_not_of( whitespace ) );

Takes care if it without the extra conditionals/loops.

True. This one is far more reader-friendly.

Regards,
Sumit.
 
D

Dietmar Kuehl

Daniel said:
Unfortunately, your code doesn't compile on my system. I had to change
the vector<string> c_tor call to:

vector<string> strs;
copy ( istream_iterator<string>(in), istream_iterator<string>(),
back_inserter(strs) );

In this case your compiler or the standard library implementation is
broken: the standard container classes are defined to accept pairs of
input iterators as constructor arguments.
Also, your code only works for the simple case, where there is no
whitespace *within* a string delimited by commas. How would you change
it to account for the more complicated case?

This is a rather different specification which, in particular, does
not yet account for whitespace at the beginning or the end of the
string: is this whitespace to be removed or not? Since commas work
as true delimiters here, I assume it is to be included. I this case
I would [temporarily] 'imbue()' a 'std::ctype<char>' facet which
only considers comma and possibly newline (if this is also considered
a separator). The extractor functions use whitespaces as field
separators for strings. If the case adjustment is still necessary,
I would use the modified facet in combination with the filtering
stream buffer.
This seems much more straight forward and needs fewer lines of code.

Injecting existing filtering stream buffers into the processing is
pretty straight forward. Of course, writing them is not necessarily
so but it is not that hard either. The real advantage I see in the
filtering stream buffer over your code is that it encapsulates the
complete solution, especially if it is also equipped with a simple
input stream which automatically maintains the stream buffer.
I must say that every contribution so far has shed new light on the
problem for me and caused me to improve my code in some way.

This was my intention of posting the code...
 
D

Dietmar Kuehl

Note that this does *NOT* work! The argument to 'std::tolower()'
has to be an *unsigned* value. However, on platforms where 'char'
is signed, the argument could expand to a negative value! In
portable code, the only valid call to 'std::tolower(int)' with a
'char' looks like this:

std::tolower(static_cast<unsigned char>(c));

The only possible variation is how the 'char' is first cast to
an 'unsigned char'.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top