Perl complex data structure ... how to get more flexible access? how to avoid eval?

V

valued customer

### QUESTION
You have a complex perl data structure that you want to access using a
flexible 'xpath-like' syntax supplied in a simple string. How can this
be done elegantly??

### EXAMPLE
Suppose you have a perl data structure that looks something like this
....

$news = {};
$news->{top_stories} = ##* ... code-omitted ...*;
$news->{sports} = ##* ... code-omitted ...*;
$news->{weather} = ##* ... code-omitted ...*

You can infer what the omitted parts look like from the following
sample data nodes ...

### get stuff about todays top story
print $news->{top_stories}[0]{headline};
print $news->{top_stories}[0]{bodytext};

### get stuff about the weather
print $news->{weather}{forecast}[3]{temp}; ## forecast day 3
print $news->{weather}{current}{temp};

### PROBLEM DETAILS
How do you allow a user to arbitrarily access any single data node of
any arbitrary depth in the data structure, simply by specifying a
single query string?

eg; 'news/top_stories/[0]/headline'
eg; 'news/weather/current/temp'
eg; 'news/weather/forecast/[2]/temp'
eg; 'news/classifieds/for_sale/vehicles/trucks/[12]/asking_price'

Is there a more elegant solution than splitting the string,
concatenating
and then doing 'eval'? Dereferencing with '${}' does not seem flexible
enough because
the application does not know in advance how many 'path steps' the
user will
supply in the query string. Is there an XPath-like syntax module
available, or an alternative approach?
 
T

Tassilo v. Parseval

Also sprach valued customer:
### QUESTION
You have a complex perl data structure that you want to access using a
flexible 'xpath-like' syntax supplied in a simple string. How can this
be done elegantly??

### EXAMPLE
Suppose you have a perl data structure that looks something like this
...

$news = {};
$news->{top_stories} = ##* ... code-omitted ...*;
$news->{sports} = ##* ... code-omitted ...*;
$news->{weather} = ##* ... code-omitted ...*

You can infer what the omitted parts look like from the following
sample data nodes ...

### get stuff about todays top story
print $news->{top_stories}[0]{headline};
print $news->{top_stories}[0]{bodytext};

### get stuff about the weather
print $news->{weather}{forecast}[3]{temp}; ## forecast day 3
print $news->{weather}{current}{temp};

### PROBLEM DETAILS
How do you allow a user to arbitrarily access any single data node of
any arbitrary depth in the data structure, simply by specifying a
single query string?

eg; 'news/top_stories/[0]/headline'
eg; 'news/weather/current/temp'
eg; 'news/weather/forecast/[2]/temp'
eg; 'news/classifieds/for_sale/vehicles/trucks/[12]/asking_price'

Is there a more elegant solution than splitting the string,
concatenating
and then doing 'eval'? Dereferencing with '${}' does not seem flexible
enough because
the application does not know in advance how many 'path steps' the
user will
supply in the query string. Is there an XPath-like syntax module
available, or an alternative approach?

You will certainly have to split them. There is no way around that.
However, you can do without string-eval. Here's what a find() function
would look like:

my $news;
$news->{top_stories}[0]{headline} = "bla";

print find($news, "top_stories/[0]/headline");

sub find {
my ($ref, $path) = @_;

return $ref if not ref $ref and not $path;
return if not ref $ref;

my ($branch, $remain) = split /\//, $path, 2;
find ($branch =~ /\[(.*)\]/ ? $ref->[$1]
: $ref->{$branch},
$remain);
}

Depending on where your path comes from, I'd still use string-eval
though because it will give better performance than calling a function
and recursing into it.

Tassilo
 
A

Anno Siegel

valued customer said:
### QUESTION
You have a complex perl data structure that you want to access using a
flexible 'xpath-like' syntax supplied in a simple string. How can this
be done elegantly??

### EXAMPLE
Suppose you have a perl data structure that looks something like this
...

$news = {};
$news->{top_stories} = ##* ... code-omitted ...*;
$news->{sports} = ##* ... code-omitted ...*;
$news->{weather} = ##* ... code-omitted ...*

You can infer what the omitted parts look like from the following
sample data nodes ...

It would have been nice if you *hadn't* omitted the code to build a
working sample of test data. This way everyone who tackles your problem
has to come up with their own. This is repeated work only you know how
to do right. It also helps a possible discussion if everyone works with
the same data.
### get stuff about todays top story
print $news->{top_stories}[0]{headline};
print $news->{top_stories}[0]{bodytext};

### get stuff about the weather
print $news->{weather}{forecast}[3]{temp}; ## forecast day 3
print $news->{weather}{current}{temp};

### PROBLEM DETAILS
How do you allow a user to arbitrarily access any single data node of
any arbitrary depth in the data structure, simply by specifying a
single query string?

eg; 'news/top_stories/[0]/headline'
eg; 'news/weather/current/temp'
eg; 'news/weather/forecast/[2]/temp'
eg; 'news/classifieds/for_sale/vehicles/trucks/[12]/asking_price'

Is there a more elegant solution than splitting the string,
concatenating
and then doing 'eval'? Dereferencing with '${}' does not seem flexible
enough because
the application does not know in advance how many 'path steps' the
user will
supply in the query string. Is there an XPath-like syntax module
available, or an alternative approach?

Ugh. Keep your line length below 72 characters.

Yes, there is an alternative to eval. Use recursion.

First some test data. It's come out a little different from your
planned structure. Too bad.

my $struct = {
news => {
top_stories => [ qw(
first_story
second-story
)],
weather => {
temp => '60 F',
forecast => 'cloudy',
},
},
};

Now for the retrieval:

sub retrieve {
my $struct = shift;
return $struct unless @_;
my $key = shift;
retrieve (
ref $struct eq 'ARRAY' ? $struct->[ $key] : $struct->{ $key},
@_,
);
}

The first argument is the data structure to retrieve from, which must be
composed of hashrefs and arrayrefs. The following arguments are keys,
string or numeric as applicable. Each recursive step consumes one key
and descends one level deeper into the structure. When the keys are
used up, the final result is returned.

Most of the code in the final procedure will be error checking, which
I have entirely left out.

Anno
 
B

Ben Morrow

### QUESTION
You have a complex perl data structure that you want to access using a
flexible 'xpath-like' syntax supplied in a simple string. How can this
be done elegantly??

### PROBLEM DETAILS
How do you allow a user to arbitrarily access any single data node of
any arbitrary depth in the data structure, simply by specifying a
single query string?

eg; 'news/top_stories/[0]/headline'
eg; 'news/weather/current/temp'
eg; 'news/weather/forecast/[2]/temp'
eg; 'news/classifieds/for_sale/vehicles/trucks/[12]/asking_price'

Is there a more elegant solution than splitting the string,
concatenating
and then doing 'eval'? Dereferencing with '${}' does not seem flexible
enough because
the application does not know in advance how many 'path steps' the
user will
supply in the query string. Is there an XPath-like syntax module
available, or an alternative approach?

You my want to take a look at how the Template module does this: it has
an extremely flexible syntax

news.top_stories.0.headline

where each element is a hash key, an array index or a method name, and
is used appropriately.

FWIW, although others have suggested recursive solutions, it is
perfectly possible to do this iteratively.

Ben
 
T

Tassilo v. Parseval

Also sprach Ben Morrow:
### QUESTION
You have a complex perl data structure that you want to access using a
flexible 'xpath-like' syntax supplied in a simple string. How can this
be done elegantly??

### PROBLEM DETAILS
How do you allow a user to arbitrarily access any single data node of
any arbitrary depth in the data structure, simply by specifying a
single query string?

eg; 'news/top_stories/[0]/headline'
eg; 'news/weather/current/temp'
eg; 'news/weather/forecast/[2]/temp'
eg; 'news/classifieds/for_sale/vehicles/trucks/[12]/asking_price'

Is there a more elegant solution than splitting the string,
concatenating
and then doing 'eval'? Dereferencing with '${}' does not seem flexible
enough because
the application does not know in advance how many 'path steps' the
user will
supply in the query string. Is there an XPath-like syntax module
available, or an alternative approach?
[...]

FWIW, although others have suggested recursive solutions, it is
perfectly possible to do this iteratively.

This statement is always correct...not just for the given problem. ;-)

Recursiveness on the other hand looks rather natural here although it is
a slight waste of ressources compared with an iterative approach.

Tassilo
 
A

Anno Siegel

Tassilo v. Parseval said:
Also sprach Ben Morrow:
(e-mail address removed) (valued customer) wrote:
[...]
Is there a more elegant solution than splitting the string,
concatenating
and then doing 'eval'? Dereferencing with '${}' does not seem flexible
enough because
the application does not know in advance how many 'path steps' the
user will
supply in the query string. Is there an XPath-like syntax module
available, or an alternative approach?
[...]

FWIW, although others have suggested recursive solutions, it is
perfectly possible to do this iteratively.

This statement is always correct...not just for the given problem. ;-)

Recursiveness on the other hand looks rather natural here although it is
a slight waste of ressources compared with an iterative approach.

The original contestant was something like

eval '$hash' . $some_accessor;

That would presumably be in the recursive camp.

Anno
 
V

valued customer

You my want to take a look at how the Template module does this: it has
an extremely flexible syntax

news.top_stories.0.headline
This is (at least partially) what inspired the question to begin with.
In fact, I could have gone without posting a sample data structure at
all, since that is not really what is driving this particular inquiry.

Assume you have a data structure of uknown composition, and all you
know is that it is restricted to a combination of arrayrefs and
hashrefs. If the previous assumption sounds far-fetched, just consider
a filesystem where people can add and remove files and directories
any time they want. Also consider that, in a previous post in this
very thread, someone adopted a different structure just to make an
example! This happens all the time, and the syntax should be flexible
enough to adapt.

The point is, any syntax like ...

news.top_stories.0.headline
news/top_stories/0/headline
news\\weather\\forecast
television::abctv::
Are all brain-dead easy to work with, because in order to translate
between them, all you have to do is a simple string search and
replace for whatever you happen to use as your delimiter. Moreover,
if someone decides to change the structure (which will happen) no
code has to be modified.

On the other hand the perl 'native' syntax options ...

$news->{top_stories}->[0]->{headline}
$news->{top_stories}[0]{bodytext}
$$news{commentary}[0]{bodytext}

Are all a bit more cumbersome. It takes a bit more work to convert
a path string...

### single sample data branch
$news->{top_stories} = [{qw(headline foo bodytext blah blah)}];

### convert 'xml-style' path to 'perl-style' path
$sPath = 'news/top_stories/[0]/headline';
$sPath = join '->',
map{
$iCtxw++;
if(substr($_,0,1)eq'[' && substr($_,-1,1)eq']'){$_}
elsif($iCtxw == 1){$_}
else{'{'.$_.'}'}
}split m!/!,$sPath;
print eval('$'."$sPath");

If you go the recursion route, you still have to have a way of
converting the original query string into the individual 'branches'
of the native data structure.

This is a disappointment, since it seems like a lot of
additional work that would be obviated if there were another
alternative perl syntax that allowed uniform path step
delimiters for native data structures. It would be nice (IMHO) if this
were built into the language instead of part of an external module.
 
B

Ben Morrow

This is (at least partially) what inspired the question to begin with.
In fact, I could have gone without posting a sample data structure at
all, since that is not really what is driving this particular inquiry.
### single sample data branch
$news->{top_stories} = [{qw(headline foo bodytext blah blah)}];

Don't use the symbol table when a hash will do. If 'news' is part of
your path, then a hash will do:

my %data;
$data{news}{top_stories} = [
{ headline => 'foo', bodytext => 'blahblah' },
];
### convert 'xml-style' path to 'perl-style' path
$sPath = 'news/top_stories/[0]/headline';
$sPath = join '->',
map{
$iCtxw++;
if(substr($_,0,1)eq'[' && substr($_,-1,1)eq']'){$_}
elsif($iCtxw == 1){$_}
else{'{'.$_.'}'}
}split m!/!,$sPath;
print eval('$'."$sPath");

YUUK! Firstly, by using [0] instead of simply 0 you are unduly
restricting the implementation: by the code above, anything accessed as
[\d+] is an array deref. Secondly, you do *not* need to use string eval.

# (untested)

use Scalar::Util qw/blessed/;

my $path = 'news/top_stories/0/headline';
my @path = split m!/!, $path;
my $item = \%top_of_data_structure;

for (@path) {
$item =
blessed $item ? $item->$_() :
ref $item eq 'HASH' ? $item->{$_} :
ref $item eq 'ARRAY' ? $item->[$_] :
ref $item eq 'CODE' ? $item->($_) :
die qq/invalid path "$path"/;
}

(P.S. I love ?: :)
If you go the recursion route, you still have to have a way of
converting the original query string into the individual 'branches'
of the native data structure.

Eh? Yes... a recursive approach would be very similar to the above, but
slower and (IMHO) slightly harder to follow. If the actions to be taken
for each different type were more complicated it might make more sense.
This is a disappointment, since it seems like a lot of
additional work that would be obviated if there were another
alternative perl syntax that allowed uniform path step
delimiters for native data structures. It would be nice (IMHO) if this
were built into the language instead of part of an external module.

Don't be silly. All the most useful bits of Perl are 'part of an
external module'.

Ben
 
V

valued customer

print eval('$'."$sPath");
YUUK! Firstly, by using [0] instead of simply 0 you are unduly
restricting the implementation: by the code above, anything accessed as
[\d+] is an array deref. Secondly, you do *not* need to use string eval.
Yuuk indeed. It's even worse than that, since the code above doesn't even
test for \d+, its just a substring test for the square brackets; perfect
example of the kind of code I want to *avoid* (along with the string eval).
use Scalar::Util qw/blessed/;

Ahh. Now *this* is interesting, had not considered Scalar::Util.
(P.S. I love ?: :)

Also, clever compound use of the '?:' operator, another interesting
bit to investigate.
Don't be silly. All the most useful bits of Perl are 'part of an
external module'.

Sometimes you have to download the useful bits from the cranium of
someone on Usenet.
 
M

Malcolm Dew-Jones

valued customer ([email protected]) wrote:
: ### QUESTION
: You have a complex perl data structure that you want to access using a
: flexible 'xpath-like' syntax supplied in a simple string. How can this
: be done elegantly??

: ### EXAMPLE
: Suppose you have a perl data structure that looks something like this
: ...

: $news = {};
: $news->{top_stories} = ##* ... code-omitted ...*;
: $news->{sports} = ##* ... code-omitted ...*;
: $news->{weather} = ##* ... code-omitted ...*

: You can infer what the omitted parts look like from the following
: sample data nodes ...

: ### get stuff about todays top story
: print $news->{top_stories}[0]{headline};
: print $news->{top_stories}[0]{bodytext};

: ### get stuff about the weather
: print $news->{weather}{forecast}[3]{temp}; ## forecast day 3
: print $news->{weather}{current}{temp};

: ### PROBLEM DETAILS
: How do you allow a user to arbitrarily access any single data node of
: any arbitrary depth in the data structure, simply by specifying a
: single query string?

: eg; 'news/top_stories/[0]/headline'
: eg; 'news/weather/current/temp'
: eg; 'news/weather/forecast/[2]/temp'
: eg; 'news/classifieds/for_sale/vehicles/trucks/[12]/asking_price'

$0.02

Is there a reason to use the data structures at all?

How about simply using the syntax itself

$news{'news/weather/current/temp'} = $the_item;
$news{'news/classifieds/for_sale/vehicles/trucks/[12]/asking_price'}
= $the_price;
 
V

valued customer

Is there a reason to use the data structures at all?
How about simply using the syntax itself

$news{'news/weather/current/temp'} = $the_item;
$news{'news/classifieds/for_sale/vehicles/trucks/[12]/asking_price'}
= $the_price;

The problem with that is, when the project requirements call for a
change in the structure (and they will) or the client requests a
change in functionality (and they will) you end up with a lot of
hassle. Think of the analogy of a filesystem with 'folders' and
'files'.

For example, what happens if someone decides to change the name
of the 'trucks' branch (aka folder, aka directory) to
'off_road_vehicles'? If you simply use a flat hash structure, you
have to change the string for all 13 (or howevermany) keys ... and
you better hope you dont accidentally rename a key that u weren't
supposed to:

$news{'news/classifieds/for_sale/toys/trucks/'}

Also, using a data structure makes it easy to 'segment' and
manipulate the data:

my $trucks = $news->{classifieds}{for_sale}{vehicles}{trucks};
for my $truck (@{$trucks})
{
print $truck->{asking_price} ."\n";
}

Using a flat hash structure would be like putting all the files on your
computer into one 'folder' and organizing them according to the
filenames only, or like putting the functionality of 1000 CPAN modules
into one 'MyHugeLibrary.pm' file, or like printing a newspaper on one
single sheet of paper rolled into a scroll.

The purpose is to make accessing nested data structures intuitive,
flexible and very easy. It's a paradigm shift. It's like working with
XML or an OS filesystem, only in Perl. Of course, one could use
XML::Twig, or Template::Toolkit or (fill_in_the_blank), but why
go through a middleman when all you really need is Perl?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,732
Latest member
ArronPalin

Latest Threads

Top