Regex doesn't recognize single quote

J

Jerric

Hi, I need to remove special characters, except \w and single quotes,
from a string, can someone please help me on the regex?

for example, I have "ab'de+fg", I want to get "ab'defg", and I tried
the following code, but it removed single quote. seems to me java
cannot handle the pattern like [^'].

String val = "ab'de+fg";
val = val.replaceAll("[^\\w']+", "");

Thanks a lot,
 
M

Martin Gregorie

Hi, I need to remove special characters, except \w and single quotes,
from a string, can someone please help me on the regex?

for example, I have "ab'de+fg", I want to get "ab'defg", and I tried the
following code, but it removed single quote. seems to me java cannot
handle the pattern like [^'].

String val = "ab'de+fg";
val = val.replaceAll("[^\\w']+", "");
Did you try escaping the single quote?
 
D

Daniel Pitts

Hi, I need to remove special characters, except \w and single quotes,
from a string, can someone please help me on the regex?

for example, I have "ab'de+fg", I want to get "ab'defg", and I tried
the following code, but it removed single quote. seems to me java
cannot handle the pattern like [^'].

String val = "ab'de+fg";
val = val.replaceAll("[^\\w']+", "");

Thanks a lot,
It works for me, which indicates the problem is somewhere in the code
you didn't post. Here is an SSCCE:

public class Works {
public static void main(String[] args) {
String val = "ab'de+fg";
System.out.println(val.replaceAll("[^\\w']+", ""));

}
}

Try posting exactly the code which causes the problem.
 
R

Roedy Green

Hi, I need to remove special characters, except \w and single quotes,
from a string, can someone please help me on the regex?

That is not what a regex is for. Just use a StringBuilder the length
of your String. Then loop through the chars with charAt. If the
character is a ' or \w, ignore it, else append. If it gets complex,
use a switch or if it gets really complicated use a BitSet.
 
S

Stefan Ram

Roedy Green said:
That is not what a regex is for.

How do you know what it is for?
Just use a StringBuilder the length of your String. Then
loop through the chars with charAt. If the character is a
' or \w, ignore it, else append. If it gets complex, use a
switch or if it gets really complicated use a BitSet.

This might be needless (as far as we know right now)
optimization bloating the code reducing its readability and
low-level thinking, which might be required sometimes, but
does not serve as a general rule. Still it is nice to know
how it could be done if required.
 
R

Roedy Green

How do you know what it is for?

Regexes are for searching for patterns. Transforming or deleting
characters is much simpler done with a for loop.

How do I know what a regex is for? I am familiar with the API. I have
attempted to use them for various purposes and discovered they were
suitable for some and not for others.
This might be needless (as far as we know right now)
optimization bloating the code reducing its readability and
low-level thinking, which might be required sometimes, but
does not serve as a general rule. Still it is nice to know
how it could be done if required.

What is your simpler implementation?

/** remove ' and \w from string
* @param s string to process
* @return string without ' or \w
*/
private static String scrunch( final String s )
{
final Stringbuilder sb = new StringBuilder( s.length() );
for (int i=0; i<s.length(); i++ )
{
char c = s.charAt(i);
if ( !( c = '\'' || c = '\w' ) )
{
sb.append ( c );
}
}
return sb.toString();
}
 
R

Roedy Green

How do you know what it is for?

I see what you mean. I saw the problem as the pattern translation of
various characters to various other characters. The problem is
actually simpler than that. It translates various different
characters all to the same empty "character".

I find the replace methods dangerous. They are improperly named and
thus it is easy to accidentally use a regex or non-regex. They also
have to compile the pattern every time. I tend to avoid them.
 
R

Rafael Villar

Regexes are for searching for patterns. Transforming or deleting
characters is much simpler done with a for loop.

How do I know what a regex is for? I am familiar with the API. I have
attempted to use them for various purposes and discovered they were
suitable for some and not for others.

What is your simpler implementation?

/** remove ' and \w from string
* @param s string to process
* @return string without ' or \w
*/
private static String scrunch( final String s )
{
final Stringbuilder sb = new StringBuilder( s.length() );
for (int i=0; i<s.length(); i++ )
{
char c = s.charAt(i);
if ( !( c = '\'' || c = '\w' ) )
{
sb.append ( c );
}
}
return sb.toString();
}

In most cases is better to use a StringBuilder to perform replacements,
but in this particular case String.replaceAll() is better. By the way,
the escape sequence \w is not a java regular escape sequence but belongs
to the pattern syntax (although you should already know about it, as you
say you are familiar with the API).

Anyway a simpler implementation (and one which works, because yours
doesn't):

/** remove ' and \w from string
* @param s string to process
* @return string without ' or \w
*/
private static String scrunch( final String s ) {
return s.replaceAll("[^'\\w]+", "");
}
 
R

Rafael Villar

Mea Culpa, Sorry, it seems Roedy didn't understand the original problem,
and also I didn't understand what Roedy was understanding (sorry Roedy)

Anyway, a simpler method that does what Roedy intends to do:

/** remove ' and \w from string
* @param s string to process
* @return string without ' or \w
*/
private static String scrunch( final String s ) {
return s.replaceAll("['\\w]+", "");
}

However the original problem is unknown as the original code is actually
working.
 
S

Stefan Ram

Roedy Green said:
What is your simpler implementation?
/** remove ' and \w from string
* @param s string to process
* @return string without ' or \w
*/
private static String scrunch( final String s )
{
final Stringbuilder sb = new StringBuilder( s.length() );
for (int i=0; i<s.length(); i++ )
{
char c = s.charAt(i);
if ( !( c = '\'' || c = '\w' ) )

Even with »==« instead of »=« (a »final « in front of the
»char c« should help to detect such errors) and »\\w«
instead of »\w« (»\w« is an illegal escape character in
Java), the comparison with '\\w' is not what the OP actually
wanted.

Maybe you just want other people not to use regular
expressions because you personally can't read them, but why
should your personal knowledge (which is a by-product of you
personal history) be a limitation of anyones else's work?
{
sb.append ( c );
}
}
return sb.toString();
}

static String scrunch( final String s )
{ final java.lang.String string = s.toString();
final java.lang.String result = s.replaceAll( "('|\\\\w)", "" );
return new String( result ); }

(Assuming the class »String« has an appropriate constructor.)

(This implements your documentation, not what the OP wanted.)
 
A

Arved Sandstrom

I see what you mean. I saw the problem as the pattern translation of
various characters to various other characters. The problem is
actually simpler than that. It translates various different
characters all to the same empty "character".

I find the replace methods dangerous. They are improperly named and
thus it is easy to accidentally use a regex or non-regex. They also
have to compile the pattern every time. I tend to avoid them.

The methods that accept 'char' or 'CharSequence" are named 'replace'.
The two methods that use regexes are called 'replaceAll' and
'replaceFirst'. I don't see a possibility of accidents here.

The methods are not remotely improperly named: they replace text. That
some of them use literals, and others use regular expressions, to
specify what text is to be replaced, does not alter that central fact.

AHS
 
L

Lew

That will not perform the specified action, which is to remove non-word
characters and to _keep_ apostrophes. '\w' is not legitimate Java syntax,
thus will cause a compilation error.
"It is a compile-time error if the character following a backslash in an
escape is not an ASCII b, t, n, f, r, ", ', \, 0, 1, 2, 3, 4, 5, 6, or 7."
<http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.6>

The simpler approach was already posted by Daniel Pitts, and has the added
virtues of both meeting the requirement and compiling:

public class Works {
public static void main(String[] args) {
String val = "ab'de+fg";
System.out.println(val.replaceAll("[^\\w']+", ""));
}
}
 
J

Jim Janney

Daniel Pitts said:
Hi, I need to remove special characters, except \w and single quotes,
from a string, can someone please help me on the regex?

for example, I have "ab'de+fg", I want to get "ab'defg", and I tried
the following code, but it removed single quote. seems to me java
cannot handle the pattern like [^'].

String val = "ab'de+fg";
val = val.replaceAll("[^\\w']+", "");

Thanks a lot,
It works for me, which indicates the problem is somewhere in the code
you didn't post. Here is an SSCCE:

public class Works {
public static void main(String[] args) {
String val = "ab'de+fg";
System.out.println(val.replaceAll("[^\\w']+", ""));

}
}

Try posting exactly the code which causes the problem.

Since replaceAll is being used, the closure is unnecessary, so this can
be shortened by one character :)
 
D

Daniel Pitts

Daniel Pitts said:
Hi, I need to remove special characters, except \w and single quotes,
from a string, can someone please help me on the regex?

for example, I have "ab'de+fg", I want to get "ab'defg", and I tried
the following code, but it removed single quote. seems to me java
cannot handle the pattern like [^'].

String val = "ab'de+fg";
val = val.replaceAll("[^\\w']+", "");

Thanks a lot,
It works for me, which indicates the problem is somewhere in the code
you didn't post. Here is an SSCCE:

public class Works {
public static void main(String[] args) {
String val = "ab'de+fg";
System.out.println(val.replaceAll("[^\\w']+", ""));

}
}

Try posting exactly the code which causes the problem.

Since replaceAll is being used, the closure is unnecessary, so this can
be shortened by one character :)
Perhaps, but I wouldn't be surprised if there was a performance
difference in the two. I'm not saying there definitely is, but there
very well could be.

Also, they are only equivalent because the replacement string is zero
length.
 
D

Daniel Pitts

Wow, that is some of the worst String manipulation code I've seen.

static String scrunch( final String s )
{ final java.lang.String string = s.toString();
s.toString() == s for all non-null instances of String. Unneeded.
final java.lang.String result = s.replaceAll( "('|\\\\w)", "" );
You don't need an intermediate here.
return new String( result ); }
Strings are (mostly) immutable. There are extremely few good reasons to
invoke the String(String) constructor manually. Not to mention
s.replaceAll() will already potentially return a new String.
(Assuming the class »String« has an appropriate constructor.)
It does, but why use it unless you want to guaranty that they are
..equals, but !=.



I'm not even going to comment on your insane style, as I think you've
rebuffed all comments in the past. What I will comment on is the lack
of consistency in this snippet. Some places use use "String" and others
"java.lang.String".

(This implements your documentation, not what the OP wanted.)
So does this, but with less waste and confusion.
static String scrunch( final String source) {
return s.replaceAll( "('|\\\\w)", "" );
}
 
S

Stefan Ram

Daniel Pitts said:
I'm not even going to comment on your insane style, as I think you've
rebuffed all comments in the past. What I will comment on is the lack
of consistency in this snippet. Some places use use "String" and others
"java.lang.String".

»String« is a class name used by Roedy.

The actual class bound to the name of »String« depends on
the context the snippet given by Roedy will be placed in.

Since I have no information on that class »String«,
I started by converting the String instance into a
java.lang.String instance. Then, I was able to apply the
operations of java.lang.String, which /are/ known to me.
In the final end, I had to convert the java.lang.String
instance back to an instance of the class »String«,
because this was required by the interface of that method
as given by Roedy.
 
D

Daniel Pitts

»String« is a class name used by Roedy.

The actual class bound to the name of »String« depends on
the context the snippet given by Roedy will be placed in.

Since I have no information on that class »String«,
I started by converting the String instance into a
java.lang.String instance. Then, I was able to apply the
operations of java.lang.String, which /are/ known to me.
In the final end, I had to convert the java.lang.String
instance back to an instance of the class »String«,
because this was required by the interface of that method
as given by Roedy.
Since String is in the java.lang package, it is safe to assume that
"String" refers to the java.lang.String class, unless you are given
context otherwise.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,001
Messages
2,570,255
Members
46,852
Latest member
CarlaDowle

Latest Threads

Top