create a hierarchical list from a text file

M

Michael Friendly

I need to create hierarchical menu files from text input in a simple form
of comma delimited lines with fields (topic, url, description) like

Top1,,Desc1
Item1
Item11,url11,Description11
Item12,url12,Description12
Item13,url12,Description13
Item2,url2,Description2
Item21,url21,Description21
Item3,url3,Description3
Item31,url31,Description31
Item32,url32,Description32
Top2,,Desc2
Item1,url1,Description1
Item11,url11,Description11
Item12,url12,Description12
Item2,url2,Description2
Item21,url21,Description21

where the number of leading tabs in each item implicitly indicates level
in the tree. The output must be in the form of an explicitly structured,
comma-separated list, [ item, item, ... item ] where each item is either
a simple item or another list, e.g.,
[ item, [ item, item, ... item ], ... item ].
Missing fields in an item are represented by 'null', so the example
would appear as:

[
['Top1', null, null
, ['Item1', null, null
, ['Item11', 'url11', 'Description11']
, ['Item12', 'url12', 'Description12']
, ['Item13', 'url12', 'Description13']
]
, ['Item2', 'url2', 'Description2'
, ['Item21', 'url21', 'Description21']
]
, ['Item3', 'url3', 'Description3'
, ['Item31', 'url31', 'Description31']
, ['Item32', 'url32', 'Description32']
]
]
, ['Top2', null, null
, ['Item1', 'url1', 'Description1'
, ['Item11', 'url11', 'Description11']
, ['Item12', 'url12', 'Description12']
]
, ['Item2', 'url2', 'Description2'
, ['Item21', 'url21', 'Description21']
]
]
]

I'm reading the items and creating a list of hashes with the subroutine
below, but I can't figure out how to get from there to my desired
output. There must be some perl modules I can use. Can someone help?

sub read_items {
$file = shift;
open(IN, $file) || die("cant open $file\n");
$nitems = 0;
undef @items;
while( $line = <IN>) {
chomp($line);
next if $line =~ /^#/;
$line =~ s/(\t*)//;
$level = length($1); # level in tree
next unless $line;
$nitems++;
($title, $url, $desc) = split(/,\s*/, $line);
$desc =~ s/\'/\\'/g; # escape quotes in e.g., Murphy's Law
$item = {
level => $level,
title => $title,
url => $url,
desc => $desc,
};
push @items, $item;
}
close IN;
print STDERR "Read $nitems items\n";
return @items;
}


--
Michael Friendly Email: (e-mail address removed)
Professor, Psychology Dept.
York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT M3J 1P3 CANADA
 
A

attn.steven.kuo

Michael said:
I need to create hierarchical menu files from text input in a simple form
of comma delimited lines with fields (topic, url, description) like

Top1,,Desc1
Item1
Item11,url11,Description11
Item12,url12,Description12
Item13,url12,Description13
Item2,url2,Description2
Item21,url21,Description21
Item3,url3,Description3
Item31,url31,Description31
Item32,url32,Description32
Top2,,Desc2
Item1,url1,Description1
Item11,url11,Description11
Item12,url12,Description12
Item2,url2,Description2
Item21,url21,Description21

where the number of leading tabs in each item implicitly indicates level
in the tree. The output must be in the form of an explicitly structured,
comma-separated list, [ item, item, ... item ] where each item is either
a simple item or another list, e.g.,
[ item, [ item, item, ... item ], ... item ].
Missing fields in an item are represented by 'null', so the example
would appear as:

[
['Top1', null, null
, ['Item1', null, null
, ['Item11', 'url11', 'Description11']
, ['Item12', 'url12', 'Description12']
, ['Item13', 'url12', 'Description13']
]
, ['Item2', 'url2', 'Description2'
, ['Item21', 'url21', 'Description21']
]
, ['Item3', 'url3', 'Description3'
, ['Item31', 'url31', 'Description31']
, ['Item32', 'url32', 'Description32']
]
]
, ['Top2', null, null
, ['Item1', 'url1', 'Description1'
, ['Item11', 'url11', 'Description11']
, ['Item12', 'url12', 'Description12']
]
, ['Item2', 'url2', 'Description2'
, ['Item21', 'url21', 'Description21']
]
]
]

(snipped)

One way:

use Text::CSV;
use Data::Dumper;

my $csv = Text::CSV->new();

sub parse
{
my $status = $csv->parse(shift);
if ($status)
{
my @fields = $csv->fields;
return (3 == grep length $_, @fields)
? @fields
: ($fields[0], 'null', 'null');
}
else
{
return undef;
}
}


my @stack;
while (<DATA>)
{
chomp;
my $level = 0;
$level = length $1 if s/^(\t+)//;
my $data = [ parse($_) ];
push @{$stack[$level]}, $data;
$stack[$level+1] = $data;
}

print Dumper $stack[0];

__DATA__
Top1,,Desc1
Item1
Item11,url11,Description11
Item12,url12,Description12
Item13,url12,Description13
Item2,url2,Description2
Item21,url21,Description21
Item3,url3,Description3
Item31,url31,Description31
Item32,url32,Description32
Top2,,Desc2
Item1,url1,Description1
Item11,url11,Description11
Item12,url12,Description12
Item2,url2,Description2
Item21,url21,Description21
 
T

Tad McClellan

sub read_items {
$file = shift;


You should not use global variables without a good reason.

my $file = shift;

open(IN, $file) || die("cant open $file\n");


You should include the *reason* why the open() failed in your message:

open(IN, $file) || die("cant open '$file' $!");

undef @items;


If you weren't using a global variable there, then you wouldn't
need to clear it out each time.

You should have

use strict;

in all of your Perl programs, then declare each variable at its first use:

my @items;


$line =~ s/(\t*)//;
$level = length($1); # level in tree


You should never use the dollar-digit variables unless you
have ensured that the match *succeeded*, otherwise there
will be a stale, left-over value from some previous match.


my $level = length($1) if $line =~ s/(\t*)//;


But you don't even need regular expressions at all:

my $level = $line =~ tr/\t//d;


$item = {
level => $level,
title => $title,
url => $url,
desc => $desc,
};
push @items, $item;


You don't need the $item temporary variable if you just us
an anonymous hash constructor in the first place:

push @items, {
level => $level,
title => $title,
url => $url,
desc => $desc,
};
 
E

Eric Bohlman

my $level = length($1) if $line =~ s/(\t*)//;


But you don't even need regular expressions at all:

my $level = $line =~ tr/\t//d;

In this case it would be better to use a regex since the OP should have
anchored it to the beginning of the line so that only leading tabs signify
level. Without the anchor, the data can't contain embedded tabs without
throwing the level calculations off, and IMHO it's best not to create such
"unwritten law" constraints.

my $level = length($1) if $line =~ s/^(\t*)//;
 
A

Anno Siegel

Michael said:
Top1,,Desc1
['Top1', null, null

Did you mean

['Top1', null, Desc1

I noticed that, too, while I was still deciding whether to deal with
the posting or not. It tipped the scale.

This kind of inconsistency makes dealing with a post potentially twice
as time-consuming, because every aspect may have to be considered under
two (or four, or eight...) variant readings.

The effort in answering a post grows exponentially with the poster's
sloppiness.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top