regroup several lines into one by counting parenthesis

H

Headphones

Hello,

I try to regroup several lines into one by counting parenthesis. The
goal is to read a tnsnames files (Oracle Database) to get an array
where one line = one instance.

tnsnames example :
Instance_name = (DESCRIPTION=

(ADDRESS=(PROTOCOL=tcp)(HOST=This_is_host_name)(PORT=1234))
(CONNECT_DATA=(SID=xxxxxx))
)


I know i can do a loop and count parenthesis for each lines and regroup
line when count is ok but i 'll prefer to have a regular expression for
doing this. Is there a regexp master that can help me for this ?

Thanks a lot
 
D

Dr.Ruud

Headphones:
I try to regroup several lines into one by counting parenthesis. The
goal is to read a tnsnames files (Oracle Database) to get an array
where one line = one instance.

tnsnames example :
Instance_name = (DESCRIPTION=

(ADDRESS=(PROTOCOL=tcp)(HOST=This_is_host_name)(PORT=1234))
(CONNECT_DATA=(SID=xxxxxx))
)

I know i can do a loop and count parenthesis for each lines and
regroup line when count is ok but i 'll prefer to have a regular
expression for doing this. Is there a regexp master that can help me
for this ?

First put them all on 1 long line and normalize the spaces, then replace
the (...) inside-out by {...} except for (DESCRIPTION=...), then insert
a linebreak after each remaining ), at last replace the {} with ()
again.

Untested:

s/\s+/ /sg

s/[(](?!DESCRIPTION=)([^()]*)[)]/{$1}/g

s/([)])/$1\n/sg

s/{/(/sg
s/}/)/sg

This assumes that {} are not in the original, pick any other pair if
necessary. And that () are not inside the values.

Now show us your Perl code.
 
M

Matt Garrish

Dr.Ruud said:
Headphones:
I try to regroup several lines into one by counting parenthesis. The
goal is to read a tnsnames files (Oracle Database) to get an array
where one line = one instance.

tnsnames example :
Instance_name = (DESCRIPTION=

(ADDRESS=(PROTOCOL=tcp)(HOST=This_is_host_name)(PORT=1234))
(CONNECT_DATA=(SID=xxxxxx))
)

I know i can do a loop and count parenthesis for each lines and
regroup line when count is ok but i 'll prefer to have a regular
expression for doing this. Is there a regexp master that can help me
for this ?

First put them all on 1 long line and normalize the spaces, then replace
the (...) inside-out by {...} except for (DESCRIPTION=...), then insert
a linebreak after each remaining ), at last replace the {} with ()
again.

Untested:

s/\s+/ /sg

s/[(](?!DESCRIPTION=)([^()]*)[)]/{$1}/g

s/([)])/$1\n/sg

s/{/(/sg
s/}/)/sg

This assumes that {} are not in the original, pick any other pair if
necessary. And that () are not inside the values.

Or just use Text::Balanced...

Matt
 
H

Headphones

You told me :
First put them all on 1 long line ... then replace ... except for (DESCRIPTION=...),

In fact : the <code>(DESCRIPTION=...)</code> is exactly what i try to
retrieve. If on my long lone line, i could select everthing like this :
INSTANCE_NAME=(DESCRIPT*) with the ')' corresponding to the closing
parenthesis, i would reach my goal.

I thought there were regular expressions to get the closing
parenthesis. To solve my problem, i do "à la C".

my $parenthesis = 0;
my $uniline = "";

while(<TNSNAMES_FILE>){
chomp;
next if /^\s*$/; # remove empty lines
next if /^\s*#/; # remove comments
s/^\s*//; #remove leading spaces
s/\s*$//; #remove trailing spaces

# count opening parenthesis
# now go through and count all the "("
my $txtline = $_;
while ($txtline =~ /\(/g) {$parenthesis++;}

# count closing parenthesis
my $txtline = $_;
while ($txtline =~ /\)/g) { $parenthesis--; }

# append line to uniline
$uniline .= $_;

# if parenthesis count == 0, we have our uniline
if ($parenthesis == 0){
print "UNILINE = $uniline\n";

# Then reset uniline for the next instance
$uniline="";
}
}
 
H

Headphones

Text::Balanced apparently doesn't work as i expect.

The prefix 'INSTANCE_NAME = ' cause problems and so, the function
return only one line with all the configs.

my $oneLine = ""; # put the file on one single line.
while(<$fh>){
chomp;
next if /^\s*$/;
next if /^\s*#/;
s/^\s*//; #remove leading spaces
s/\s*$//; #remove trailing spaces
$oneLine .= $_;
}


my @result = extract_bracketed( $oneLine, '()' ); # doesn't return
what i expected :-(

I think i will keep my code 'à la C' exept if someone find the
'golden' code that resolve this without 'à la C' code.

thanks to all.
 
D

Dr.Ruud

Headphones:
the <code>(DESCRIPTION=...)</code> is exactly what i try to
retrieve. If on my long lone line, i could select everthing like this
:
INSTANCE_NAME=(DESCRIPT*) with the ')' corresponding to the closing
parenthesis, i would reach my goal.

That is basically what I wrote out, by first replacing the inner (...)
by curlies.

But go and look into Text::Balanced first, as Matt Garish suggested.
http://search.cpan.org/~dconway/Text-Balanced/lib/Text/Balanced.pm
 
D

Dr.Ruud

Headphones:


Always put

#!/usr/bin/perl
use strict;
use warnings;

at the start of your code.

That is redundant, see below.

next if /^\s*$/; # remove empty lines
next if /^\s*#/; # remove comments
s/^\s*//; #remove leading spaces
s/\s*$//; #remove trailing spaces

Your last two '\s*' are meant as '\s+'.

s/^\s+//; # remove leading whitespace
s/\s+$//; # remove trailing whitespace, \n too.
next if /^#/; # skip comment lines
next if /^$/; # skip empty lines
my $txtline = $_;

No need for $txtline.
while ($txtline =~ /\(/g) {$parenthesis++;}

Alternatives:

$parenthesis++ while /\(/g;

and

$parenthesis += () = s/(\()/$1/g;
 
D

Dr.Ruud

Headphones:
Text::Balanced apparently doesn't work as i expect.

The prefix 'INSTANCE_NAME = ' cause problems and so, the function
return only one line with all the configs.

my $oneLine = ""; # put the file on one single line.
while(<$fh>){
chomp;
next if /^\s*$/;
next if /^\s*#/;
s/^\s*//; #remove leading spaces
s/\s*$//; #remove trailing spaces
$oneLine .= $_;
}

Variant:

{
local $/ = undef;
$_ = <$fh>; # slurp
}
s/\s*\n\s*//g # remove newlines and surrounding whitespace
s/^ //; # remove any leading space
s/ $//; # remove any trailing space

my $oneLine = $_;

my @result = extract_bracketed( $oneLine, '()' ); # doesn't return
what i expected :-(

extract_bracketed() can be given a 3rd parameter, check the
documentation.

I think i will keep my code 'à la C' exept if someone find the
'golden' code that resolve this without 'à la C' code.

That code only works if you can assume that a new Instance_name will be
on the start of a new line, which you didn't explicitly state.
 
H

Headphones

Your two last messages are intersting. 'caus i'm busy this week, I'll
investigate this and respond later. (response for next week)

And thanks for this code (i didn't knew it was possible) :
{
local $/ = undef;
$_ = <$fh>; # slurp
}
 
A

A. Sinan Unur

Your two last messages are intersting.

Whose last two messages? Please quote some context when you reply.
And thanks for this code (i didn't knew it was possible) :
{
local $/ = undef;
$_ = <$fh>; # slurp
}

Strictly speaking, the call to undef is not needed. Just

local $/;

is enough.

Sinan
 
H

Headphones

Your two last messages are intersting.
Whose last two messages? Please quote some context when you reply.

Oups, i forgot that GoogleGroups was also usenet :-/

DrRuud :
TextBalanced::extract_bracketed
documentation.

I checked the documentation but the third parameter, if given, will be
matched but skipped in the return value. As i want to get back the
instance_name AND What is described in parenthesis, extract_bracketed
will not help me.
on the start of a new line, which you didn't explicitly state

Correct ! i'm still looking to resolve that problem


Another nice shortcut for coding that i didn't knew.


Here is the final code (with 2 minor improvments still to do.
if ($fh->open("< $file")) {
while(<$fh>){
chomp;
next if /^\s*$/;
next if /^\s*#/; # remove comments
# TODO : comments at end of line are not removed !!

# remove leading and trailing spaces
s/^\s+//; #remove leading spaces
s/\s*$//; #remove trailing spaces
# count opening parenthesis

# now go through and count all the "("
$parenthesis++ while /\(/g;

# count closing parenthesis
$parenthesis-- while /\)/g;

# append line to uniline
$uniline .= $_;

# if parenthesis count == 0, we have our uniline
if ($parenthesis == 0){

# Do some process
# .......

# reset uniline
$uniline="";
}
}
}
 
D

Dr.Ruud

Headphones:
chomp;
next if /^\s*$/;
next if /^\s*#/; # remove comments
# TODO : comments at end of line are not removed !!

# remove leading and trailing spaces
s/^\s+//; #remove leading spaces
s/\s*$//; #remove trailing spaces

<record state=broken>
The chomp removes an optional \n, which is whitespace.
Your s/\s*$// should be s/\s+$//, because you need at least 1 whitespace
character before you can remove any.
</record>

s/^\s+//; # remove leading whitespace
s/\s+$//; # remove trailing whitespace
next if /^$/; # skip empty lines
next if /^#/; # skip comment lines

Removing end-of-line comments is a bit harder. This will remove the
non-interpunctual ones:
s/#(?:\s*\w*)*$//
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,379
Messages
2,571,945
Members
48,806
Latest member
LizetteRoh

Latest Threads

Top