split : string containing brackets

P

Prototype

I've got a string something like this:
$text='val1 , val2 , max(val3,val4)'

which I want to parse into its individual components:
@parts=split(/,/,$text)

but this simplisitic approach doesn't cope with the last part of $text
which contains a bracketed expression containing the delimiting ","
character.

I've read perldoc -q delimited, and this gives useful guidance for the
similar case of:
$text='val1 , val2 , max "val3,val4"'
but brackets are more difficult to cope with, and the methods shown
here can't be directly applied (think of nested brackets for example)

I'm aware of Text::Balanced, which I think may be part of the
solution, but I can't see how!

The only solution I can see at the moment is the fairly brute-force
approach of going through the string a character at a time, counting
open & closing brackets and only splitting into a new array element
every time a comma is found AND the bracket count is zero.

Something tells me there is probably a more elegant approach than
this! Anyone care to suggest it to me please ;-)

Paul.
 
M

Mirco Wahab

Prototype said:
I've got a string something like this:
$text='val1 , val2 , max(val3,val4)'

which I want to parse into its individual components:
@parts=split(/,/,$text)

but this simplisitic approach doesn't cope with the last part of $text
which contains a bracketed expression containing the delimiting ","
...
Something tells me there is probably a more elegant approach than
this! Anyone care to suggest it to me please ;-)

So we have something like:

val1 , val2 , max( val3, val4 )
val1 , val2 , max( val3, min( val6, val7) )
val1 , val2 , max( val3, max( min( val8, val9 ), val7) )

This is clearly a recursive formulation
and should of course be 'elegantly'
parsed by an appropriate recursive
regex.

*If* that really is what you want ...

But before starting this, I'll better
ask you what you *really* want to do.

Can you give a worked out pseudocode-example
what you want to get out from your lines and
how that should look like?

Regards

M.
 
P

Prototype

Well, in the examples you have given, I would want the results:
'val1 , val2 , max( val3, val4 )'
$part[0]='val1'
$part[1]='val2'
$part[2]='max(val3,val4)'

'val1 , val2 , max( val3, min( val6, val7) )'
$part[0]='val1'
$part[1]='val2'
$part[2]='max( val3, min( val6, val7) )'

'val1 , val2 , max( val3, max( min( val8, val9 ), val7) ) '
$part[0]='val1'
$part[1]='val2'
$part[2]='max( val3, max( min( val8, val9 ), val7) )'

All I'm looking for is to do a "simple" split parse of a string on
commas, but avoiding splitting where the comma is enclosed in some
brackets.

Paul.
 
J

J. Gleixner

Prototype said:
Well, in the examples you have given, I would want the results:

Please don't top post....
'val1 , val2 , max( val3, val4 )'
$part[0]='val1'
$part[1]='val2'
$part[2]='max(val3,val4)'

'val1 , val2 , max( val3, min( val6, val7) )'
$part[0]='val1'
$part[1]='val2'
$part[2]='max( val3, min( val6, val7) )'

'val1 , val2 , max( val3, max( min( val8, val9 ), val7) ) '
$part[0]='val1'
$part[1]='val2'
$part[2]='max( val3, max( min( val8, val9 ), val7) )'

All I'm looking for is to do a "simple" split parse of a string on
commas, but avoiding splitting where the comma is enclosed in some
brackets.

If you're 100% sure that each "part" is separated by ' , ', then:

my $str = 'val1 , val2 , max( val3, max( min( val8, val9 ), val7) ) ';
my @part = split( / , /, , 3 );
print join("\n", @part);
val1
val2
max( val3, max( min( val8, val9 ), val7) )
 
J

J. Gleixner

J. Gleixner said:
Prototype said:
Well, in the examples you have given, I would want the results:

Please don't top post....
'val1 , val2 , max( val3, val4 )'
$part[0]='val1'
$part[1]='val2'
$part[2]='max(val3,val4)'

'val1 , val2 , max( val3, min( val6, val7) )'
$part[0]='val1'
$part[1]='val2'
$part[2]='max( val3, min( val6, val7) )'

'val1 , val2 , max( val3, max( min( val8, val9 ), val7) ) '
$part[0]='val1'
$part[1]='val2'
$part[2]='max( val3, max( min( val8, val9 ), val7) )'

All I'm looking for is to do a "simple" split parse of a string on
commas, but avoiding splitting where the comma is enclosed in some
brackets.

If you're 100% sure that each "part" is separated by ' , ', then:

my $str = 'val1 , val2 , max( val3, max( min( val8, val9 ), val7) ) ';
my @part = split( / , /, , 3 );
correction..
my @part = split( / , /, $str, 3 );
 
A

attn.steven.kuo

I've got a string something like this:
$text='val1 , val2 , max(val3,val4)'

(snipped)

I'm aware of Text::Balanced, which I think may be part of the
solution, but I can't see how!


It took me a while to become familiar with Text::Balanced. YMMV.

If your inputs are sufficiently simple, then try:

use strict;
use warnings;
use Text::Balanced qw/extract_bracketed/;
use Text::CSV;
use Data::Dumper;
use constant {
EXTRACTED => 0,
REMAINDER => 1,
SKIPPED => 2,
};

my $csv = Text::CSV->new;

while (<DATA>)
{
(my $text = $_);
my @atoms;
my @found;
while ( @found = extract_bracketed( $text, '()', qr/[^(]*/ ) and
defined $found[EXTRACTED] )
{

# Use one of the techniques discussed in 'perldoc -q
delimited' for
# the SKIPPED portion, or use a simple split() if you can get
away
# with it. Here I'll demonstrate with Text::CSV

$csv->parse($found[SKIPPED]);
my @fields = $csv->fields();
$fields[-1] .= $found[EXTRACTED];

push @atoms, @fields;

# Check for remaining text to be parsed

$text = $found[REMAINDER];

}

# Stuff at the end that's not in brackets:

if (length $text)
{
$csv->parse($text);
my @fields = $csv->fields();
shift @fields; # extra comma
push @atoms, @fields;
}


print $_;
print Dumper \@atoms;
}

__DATA__
val1 , val2 , max(val3,val4)
max(val1, min(val2, val3)), val4, val5
val1, min(val2, pow(val3, 42)), 10E12

The only solution I can see at the moment is the fairly brute-force
approach of going through the string a character at a time, counting
open & closing brackets and only splitting into a new array element
every time a comma is found AND the bracket count is zero.

Something tells me there is probably a more elegant approach than
this! Anyone care to suggest it to me please ;-)



If your "grammer" is really complicated, then you may
want to look at parser generators on CPAN:


Parse::Yapp
or
Parse::RecDescent
 
J

J. Gleixner

J. Gleixner said:
J. Gleixner said:
Prototype said:
Well, in the examples you have given, I would want the results:

Please don't top post....
'val1 , val2 , max( val3, val4 )'
$part[0]='val1'
$part[1]='val2'
$part[2]='max(val3,val4)'

'val1 , val2 , max( val3, min( val6, val7) )'
$part[0]='val1'
$part[1]='val2'
$part[2]='max( val3, min( val6, val7) )'

'val1 , val2 , max( val3, max( min( val8, val9 ), val7) ) '
$part[0]='val1'
$part[1]='val2'
$part[2]='max( val3, max( min( val8, val9 ), val7) )'

All I'm looking for is to do a "simple" split parse of a string on
commas, but avoiding splitting where the comma is enclosed in some
brackets.

If you're 100% sure that each "part" is separated by ' , ', then:

my $str = 'val1 , val2 , max( val3, max( min( val8, val9 ), val7) ) ';
my @part = split( / , /, , 3 );
correction..
my @part = split( / , /, $str, 3 );

Or.. even if there is/isn't any whitespace around the separator:

my @part = split( /\s*,\s*/, $str, 3 );

You could also split on /,/ and trim the leading/trailing
whitespace. Probably the helpful part is the third argument
to split.
 
A

Ayaz Ahmed Khan

"Prototype" typed:
Well, in the examples you have given, I would want the results:
'val1 , val2 , max( val3, val4 )'
$part[0]='val1'
$part[1]='val2'
$part[2]='max(val3,val4)'

'val1 , val2 , max( val3, min( val6, val7) )'
$part[0]='val1'
$part[1]='val2'
$part[2]='max( val3, min( val6, val7) )'

'val1 , val2 , max( val3, max( min( val8, val9 ), val7) ) '
$part[0]='val1'
$part[1]='val2'
$part[2]='max( val3, max( min( val8, val9 ), val7) )'

All I'm looking for is to do a "simple" split parse of a string on
commas, but avoiding splitting where the comma is enclosed in some
brackets.

For the specific case where the expression to be parsed is built strictly
on the pattern "somevalue, somevalue, max(somevalue, somevalue, ...)",
you could provide split() with a LIMIT argument to split only into a max
of LIMIT fields. Something like this, perhaps:

@part = split(/,/, $text, 3);
 
A

anno4000

[...]

If your "grammer" is really complicated, then you may
want to look at parser generators on CPAN:


Parse::Yapp
or
Parse::RecDescent

The latter is even a core module.

Anno
 
M

Mirco Wahab

Prototype said:
Well, in the examples you have given, I would want the results:
'val1 , val2 , max( val3, val4 )'
$part[0]='val1'
$part[1]='val2'
$part[2]='max(val3,val4)'

All I'm looking for is to do a "simple" split parse of a string on
commas, but avoiding splitting where the comma is enclosed in some
brackets.

OK, then you may count over the brackets
within the regex. I s stupidly took some
not-so-good-parseable examples, so it
took longer. The Regex is relatively
simple (just counting), you'll get the idea.

This is my first shot:



use strict;
use warnings;

my $o;
my $reg = qr/
(?{$o=0}) \s*
(
(?: [^,()\s]+
| \((?{ $o++ })
| \)(?{ $o-- })
| (?(?{ $o }),|\s)
)+
)
[,\s]*/xs;

while( <DATA> ) {
chomp;
my @fields = /$reg/g;
print "$_\n", scalar @fields, "\n", (map " $_\n", @fields), "\n"
}


__DATA__
val1(), val2 , max(val3,val4)
(val0,val1) , val2 , max(val3,val4)
max(val3,val4), val1 , val2 , max(val3(val6,val7),val4)



Regards

M.
 
P

Prototype

Prototype said:
Well, in the examples you have given, I would want the results:
'val1 , val2 , max( val3, val4 )'
$part[0]='val1'
$part[1]='val2'
$part[2]='max(val3,val4)'
All I'm looking for is to do a "simple" split parse of a string on
commas, but avoiding splitting where the comma is enclosed in some
brackets.

OK, then you may count over the brackets
within the regex. I s stupidly took some
not-so-good-parseable examples, so it
took longer. The Regex is relatively
simple (just counting), you'll get the idea.

This is my first shot:

use strict;
use warnings;

my $o;
my $reg = qr/
(?{$o=0}) \s*
(
(?: [^,()\s]+
| \((?{ $o++ })
| \)(?{ $o-- })
| (?(?{ $o }),|\s)
)+
)
[,\s]*/xs;

while( <DATA> ) {
chomp;
my @fields = /$reg/g;
print "$_\n", scalar @fields, "\n", (map " $_\n", @fields), "\n"
}

__DATA__
val1(), val2 , max(val3,val4)
(val0,val1) , val2 , max(val3,val4)
max(val3,val4), val1 , val2 , max(val3(val6,val7),val4)

Regards

M.


Thanks to all for you various comments and solutions. I'm going with
this one, as it does exactly what I need, and has taught me some neat
new regexp stuff.

P.
 
M

Martijn Lievaart

[...]

If your "grammer" is really complicated, then you may want to look at
parser generators on CPAN:


Parse::Yapp
or
Parse::RecDescent

The latter is even a core module.

I just discovered this independently and it's awfully cool. It's not yet
another recursive decent parser generator, but actually has all the
features I fantasized over the years that a recursive decent parser
generator should have. And then some.

M4
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,983
Messages
2,570,187
Members
46,747
Latest member
jojoBizaroo

Latest Threads

Top