How to split this string correctly?

D

Digital Puer

I have the following std::string contents:

"I want to go"--and with emphasis--"right now," he said.


I would like to split this string into six substrings:

"I want to go"
--
and with emphasis
--
"right now,"
he said.


Is there a way to do this splitting? I am on Linux and can use
Boost libraries (I know there is some regex library, but I have never
used it).

I can potentially use another language. Does Python or Java have
support for this?
 
V

Victor Bazarov

Digital said:
I have the following std::string contents:

"I want to go"--and with emphasis--"right now," he said.


I would like to split this string into six substrings:

"I want to go"

Why? What makes those substrings especially interesting?
Is there a way to do this splitting?

Yes. The field is called "programming", the area is "string
manipulation". You figure out the algorithm and then you write that
algorithm in terms of the language of your choice.

For example, you want to concatenate "Do" and "my homework" with a space
between them. Suppose you have two strings:

std::string do("Do");
std::string my_homework("my homework");

the concatenation can be done using operator + :

std::string result = do + ' ' + my_homework;
> I am on Linux and can use
Boost libraries (I know there is some regex library, but I have never
used it).

So don't. You don't need to. The Standard library has enough mechanism
for you to do what you need. Start with figuring out the formal
algorithm of splitting your string. Then become familiar with the
standard string class (std::string), most likely with its member 'find'
or 'find_first_of', and its member 'substr'.
I can potentially use another language. Does Python or Java have
support for this?

Yes, they do.

V
 
D

Digital Puer

Why?  What makes those substrings especially interesting?


They are not interesting because it is a contrived
example representing a larger class of problem.


Yes.  The field is called "programming", the area is "string
manipulation".  You figure out the algorithm and then you write that
algorithm in terms of the language of your choice.

For example, you want to concatenate "Do" and "my homework" with a space
between them.  Suppose you have two strings:

     std::string do("Do");
     std::string my_homework("my homework");

the concatenation can be done using operator + :

     std::string result = do + ' ' + my_homework;


Good job. That is string concatentation, not splitting.
Did you learn software engineering through a
Cracker Jack box? Congratulations.


 > I am on Linux and can use


So don't.  You don't need to.  The Standard library has enough mechanism
for you to do what you need.  Start with figuring out the formal
algorithm of splitting your string.  Then become familiar with the
standard string class (std::string), most likely with its member 'find'
or 'find_first_of', and its member 'substr'.


Yes, they do.


No. That is not what I am looking for. Let me be
more specific. Is there a way to do the split
on a pattern? In my example, I would like to split:

"I want to go"--and with emphasis--"right now," he said.

I don't really care what the the "--" is, only that it is not
an alphanumeric charater or a quotation mark. I would like
to split between go" and -- and retain both as tokens.

I know that std::string.find() does not allow you
to use a pattern.

Any non-pissy answers would be appreciated.
 
J

Jonathan Lee

That is not what I am looking for. Let me be
more specific. Is there a way to do the split
on a pattern?

Not in the standard libs. You could code your own easily enough
using isalpha(), isdigit(), and ispunct(), I expect. Just walk
along the string and use substr() whenever the grouping changes.
Use a std::vector to build up your list.

Googling "C++ tokenizer" returns a few examples. With some minor
modifications you would have what you want.

If you need more advanced pattern matching I would use an
already made regex library. Boost is always a good place to
start for such things.

--Jonathan
 
R

red floyd

Digital said:
Digital said:
[contrived example redacted]
Why? What makes those substrings especially interesting?
They are not interesting because it is a contrived
example representing a larger class of problem.

And such a contrived example reeks of "Do My Homework For Me".
Good job. That is string concatentation, not splitting.
Did you learn software engineering through a
Cracker Jack box? Congratulations.

Victor is well aware that it was concatenation.
Did you even read his reply? In particular the strings
he used?
 
P

Pascal J. Bourguignon

Digital Puer said:
I have the following std::string contents:

"I want to go"--and with emphasis--"right now," he said.


I would like to split this string into six substrings:

"I want to go"
--
and with emphasis
--
"right now,"
he said.


Is there a way to do this splitting?
Yes.

I am on Linux and can use
Boost libraries (I know there is some regex library, but I have never
used it).

I can potentially use another language. Does Python or Java have
support for this?

No, no language include in its libraries programs to implement all the
random specifications and requirements of the world.

(I'm working on one that would, but it's not ready yet.)

In the mean time, you will have to work and think about how you could
implement a function to do that.

How would you characterise the split points? Why do you want to get:

{"\"I want to go\"",
"--",
"and with emphasis",
"--",
"\"right now,\"",
"he said."}

and not for example:

{"\"I want to",
" go\"--and with emphasis",
"--\"right ",
"now,\" he said."}

?
 
J

James Kanze

Why? What makes those substrings especially interesting?
[/QUOTE]
They are not interesting because it is a contrived example
representing a larger class of problem.

But it's not at all evident what. One example is not a
specification.

Only if you specify exactly what you want to do.

[...]
No. That is not what I am looking for. Let me be more
specific. Is there a way to do the split on a pattern? In my
example, I would like to split:
"I want to go"--and with emphasis--"right now," he said.
I don't really care what the the "--" is, only that it is not
an alphanumeric charater or a quotation mark. I would like to
split between go" and -- and retain both as tokens.
I know that std::string.find() does not allow you to use a
pattern.

I've got a class (and some subclasses) at my site that can be
used to split up strings fairly effectively---on of the
sub-classes uses regular expressions to define the delimiter
patterns. The classes (my FieldArray) drop the separators, but
it would be fairly easy to modify them so that they kept them,
resulting in something like:
fieldArray = someString ;
// fieldArray[ 0 ] == someString
// odd numbered indexes contain fields
// even numbered indexes contain the separator text
// between the fields.
Alternatively, if you more or less know separators before hand, and
how many fields you will have, you can effectively use
boost::regex directly. I don't know what you're trying to do,
really, so I can't say which, if either, corresponds to your
needs.
 
V

Vladimir Jovic

Digital said:
I have the following std::string contents:

"I want to go"--and with emphasis--"right now," he said.


I would like to split this string into six substrings:

"I want to go"
--
and with emphasis
--
"right now,"
he said.


Is there a way to do this splitting? I am on Linux and can use
Boost libraries (I know there is some regex library, but I have never
used it).

Either that, or you can use the std::string class and it's find() method
 
M

migroslinx

I have the following std::string contents:

"I want to go"--and with emphasis--"right now," he said.

I would like to split this string into six substrings:

"I want to go"


You could use boost::regex. In example (sorry I don't test it):

boost::match_results<std::string::const_iterator> what;
boost::regex regx( " (([[:alpha:]])|(--)|(\.))*" );
if ( boost::regex_match( sourceStr, what, regx, boost::match_extra ) )
{
// Iterate over what[*] to see patterns matched....
// see boost::regex_match..
}
 
B

barcaroller

Digital Puer said:
Is there a way to do this splitting? I am on Linux and can use
Boost libraries (I know there is some regex library, but I have never
used it).

You have a few options to split a string based on specific tokens and/or
separators. In plain C you can use strcmp() and/or strtok() to write your
own routine.

In C++ you can can use:
- strcmp() and/or strtok() again
- std::string(); look at find() and related members
- boost::algorithm::string
- boost::tokenizer
- boost::regex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,996
Messages
2,570,237
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top