basic question on regexp

J

John

Hi all,


I have to split a test but I don't know how to do it.


If I have:

#!/usr/local/bin/perl -w

use strict;

my $arr = "nothing to compa-re";

my $block = [split /\W+/, $arr];

foreach (@$block){
print "$_\n";
}


will produce:

nothing
to
compa
re

but I have to consider - as part of the word.

the result I would like to obtain is:

nothing
to
compa-re


any idea?


Thanks for your help.
J
 
G

Gunnar Hjalmarsson

John said:
If I have:

#!/usr/local/bin/perl -w
use strict;
my $arr = "nothing to compa-re";
my $block = [split /\W+/, $arr];
foreach (@$block){
print "$_\n";
}

will produce:

nothing
to
compa
re

but I have to consider - as part of the word.

the result I would like to obtain is:

nothing
to
compa-re

any idea?

If it's not sufficient to just split on white space:

split ' ', $arr

you can use a character class:

split /[^\w-]+/, $arr
 
R

Richard Snow

Hi all,


I have to split a test but I don't know how to do it.


If I have:

#!/usr/local/bin/perl -w

use strict;

my $arr = "nothing to compa-re";

my $block = [split /\W+/, $arr];

foreach (@$block){
print "$_\n";
}


will produce:

nothing
to
compa
re

but I have to consider - as part of the word.

the result I would like to obtain is:

nothing
to
compa-re


any idea?


Thanks for your help.
J

Well, I'm sure someone will post something more elegant, but
my $block = [ split /[^a-z,A-Z,0-9,_,-]/,$arr ];
will do what you want I believe.

Richard Snow
(e-mail address removed)
 
G

Glenn Jackman

Richard Snow said:
my $block = [ split /[^a-z,A-Z,0-9,_,-]/,$arr ];

Don't use commas inside your character class: it's not a list in there.

[^a-z,A-Z,0-9,_,-] is the same as [^,a-zA-Z0-9_-] is the same as [^,\w-]
 
G

Gunnar Hjalmarsson

Richard said:
Well, I'm sure someone will post something more elegant, but
my $block = [ split /[^a-z,A-Z,0-9,_,-]/,$arr ];
will do what you want I believe.

Richard, please note that the ranges and characters in a character
class shall not be comma separated. This is probably what you mean:

my $block = [ split /[^a-zA-Z0-9_-]/, $arr ];

Also, in order to prevent the creation of empty array elements, you'd
better split on one or more characters in the class:

my $block = [ split /[^a-zA-Z0-9_-]+/, $arr ];
---------------------------------------^

Finally, since there already is a pre-defined character class \w which
is the same as [a-zA-Z0-9_], and provided that you want to avoid
unnecessary typing, you can do:

my $block = [ split /[^\w-]+/, $arr ];
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

regexp that seems not to work since 5.10 8
ChatBot 4
Problems with use locale and regexp 5
Chatbot 0
Regex basic question 2
UTF-8 in regexp with 5.8.1 2
TF-IDF 2
basic thread question 2

Members online

No members online now.

Forum statistics

Threads
474,143
Messages
2,570,822
Members
47,368
Latest member
michaelsmithh

Latest Threads

Top