A
Andy Baxter
I'm working on a web-based database project written in perl, which is a
virtual library for people in the town where I live to share details of
books and videos they have - the idea is if you find a book you like you
click on 'borrow', and you are given details of how to contact the person
who owns it.
I'm not sure exactly where to ask for this, as I'm looking for help with
finding a good algorithm, rather than details of coding, but I've chosen
this group as it's written in perl, and more of a coding problem than a
cgi problem.
I have decided on a categorisation system for the items, where each item
can belong to many categories, and each category can also be grouped as a
subcategory under one or more higher level categories. Everything is
stored in a mysql database, and the links between books and categories,
and between categories and higher level categories are done through two
linking tables, which have entries like:
CategoryMembership
------------------
CatID ItemId
1 1
2 1
1 2
1 3
3 3
1 4
4 4
7 5
CategoryGrouping
----------------
CatID SubCatId SortOrder Stop
1 2 1
1 3 2
1 4 3
3 4 1 1 (yes.)
4 5 1
4 6 2
4 7 3
I.e. item 1 is in cats 1 & 2, item 2 is in cat 1, item 3 is in cat 3, and
item 4 is in cats 1 & 4. Categories 2 and 3 are in Cat 1, and Category 4
is in cat 3 and cat 1 etc.
I want any set of category memberships or groupings of categories that can
be represented like this without leading to loops to be allowed. (I.e. the
only thing that isn't allowed is things like category 1 is grouped under
cat 2 which is grouped under cat 3 which is grouped under cat 1, and more
complex variations on this.)
I have written some code which reads these tables and generates a
tree-like data structure, where each category is represented by a hash
like this:
{Name=>"category name",
Id=>"Id on database",
DownList=>ref of array with list of links to subcategories,
UpList=>ref of array with list of links to higher categories
}
The links between categories are hashes like this:
{Up=>ref to category hash,
Down=>ref to (sub) category hash,
Stop=>stop listing subcategories at this point?,
Sort=>sort order when displaying this subcategory}
This structure is then used by the scripts which render the pages
displaying the categories. This bit is working fine.
The bit I'm finding difficult is how to count how many items there are
under a given category, so that a count can be shown in the category index
page.
The code I have at the moment is here: (This code is in a module called
LibMod.pm)
sub CountCatItems($$) {
# Count how many items there are under this category, either directly or indirectly.
my ($mod,$pCat, # pointer to the category hash.
$pStack # pointer to stack which
)= @_;
my ($query, # query handle.
$catId, # category Id no.
$pLink # ref to link between cats.
);
$catId=${$pCat}{Id};
DPrint "counting ${$pCat}{Name} - $catId = ";
$query=$db->prepare("select count(ItemId) from CategoryMembership where " .
"CategoryId=$catId group By CategoryId");
$query->execute;
my ($dCount) = $query->fetchrow_array();
$query->finish;
${$pCat}{CountDirect}=$dCount;
DPrint "$dCount<br>"; # debug print.
foreach $pLink ( @{${$pCat}{DownList}} ) {
LibMod->CountCatItems(${$pLink}{Down}); # call self recursively
};
};
This works, but only counts the number of items directly included in a
category, not the ones which belong to a subcategory.
The thing that makes it hard is the multiple category membership plus
multiple category grouping, but I want to keep this if I possibly can,
because one of the problems with traditional non-computerised indexes is
that you can't do this even though books can be relevant to several
different topics.
My thoughts at the moment are either:
- give each category a hash where the keys are all the items included in
the category. Read these in from the database, then each time a
subcategory is counted, the code goes back up the tree to the root, adding
the items that belong to it to all the higher level categories. When the
recursive calls to subcategories return, (i.e. end of this code fragment),
then count the keys in the hash. This would work I think, but it would
involve retrieving all the category memberships from the database, rather
than just the counts, and it might cause speed and memory problems if a
lot of items were added to the system.
- or when you put an item in a category, category memberships are
automatically created in the database for all the higher level categories.
This would be quicker to count, but I'm thinking that if someone has
chosen to put an item in a given subcategory, and then later on I decide
to move that subcategory to a different group, the category memberships
should shift to follow this, whereas with this system the item would stay
in the old top-level categories.
Finally - my questions are:
- does anyone know of an algorithm for working out the counts just from
the counts in each category - i.e. without having to get details of
exactly which items belong to each category? (I have a feeling this may be
impossible)
- or generally, and maybe more useful in the long run, a good place to
start looking on the web for answers to problems like this when they crop
up.
- also if you have any thoughts on either of the solutions I've
described, or can think of a better way of doing it, I'd appreciate it.
andy.
virtual library for people in the town where I live to share details of
books and videos they have - the idea is if you find a book you like you
click on 'borrow', and you are given details of how to contact the person
who owns it.
I'm not sure exactly where to ask for this, as I'm looking for help with
finding a good algorithm, rather than details of coding, but I've chosen
this group as it's written in perl, and more of a coding problem than a
cgi problem.
I have decided on a categorisation system for the items, where each item
can belong to many categories, and each category can also be grouped as a
subcategory under one or more higher level categories. Everything is
stored in a mysql database, and the links between books and categories,
and between categories and higher level categories are done through two
linking tables, which have entries like:
CategoryMembership
------------------
CatID ItemId
1 1
2 1
1 2
1 3
3 3
1 4
4 4
7 5
CategoryGrouping
----------------
CatID SubCatId SortOrder Stop
1 2 1
1 3 2
1 4 3
3 4 1 1 (yes.)
4 5 1
4 6 2
4 7 3
I.e. item 1 is in cats 1 & 2, item 2 is in cat 1, item 3 is in cat 3, and
item 4 is in cats 1 & 4. Categories 2 and 3 are in Cat 1, and Category 4
is in cat 3 and cat 1 etc.
I want any set of category memberships or groupings of categories that can
be represented like this without leading to loops to be allowed. (I.e. the
only thing that isn't allowed is things like category 1 is grouped under
cat 2 which is grouped under cat 3 which is grouped under cat 1, and more
complex variations on this.)
I have written some code which reads these tables and generates a
tree-like data structure, where each category is represented by a hash
like this:
{Name=>"category name",
Id=>"Id on database",
DownList=>ref of array with list of links to subcategories,
UpList=>ref of array with list of links to higher categories
}
The links between categories are hashes like this:
{Up=>ref to category hash,
Down=>ref to (sub) category hash,
Stop=>stop listing subcategories at this point?,
Sort=>sort order when displaying this subcategory}
This structure is then used by the scripts which render the pages
displaying the categories. This bit is working fine.
The bit I'm finding difficult is how to count how many items there are
under a given category, so that a count can be shown in the category index
page.
The code I have at the moment is here: (This code is in a module called
LibMod.pm)
sub CountCatItems($$) {
# Count how many items there are under this category, either directly or indirectly.
my ($mod,$pCat, # pointer to the category hash.
$pStack # pointer to stack which
)= @_;
my ($query, # query handle.
$catId, # category Id no.
$pLink # ref to link between cats.
);
$catId=${$pCat}{Id};
DPrint "counting ${$pCat}{Name} - $catId = ";
$query=$db->prepare("select count(ItemId) from CategoryMembership where " .
"CategoryId=$catId group By CategoryId");
$query->execute;
my ($dCount) = $query->fetchrow_array();
$query->finish;
${$pCat}{CountDirect}=$dCount;
DPrint "$dCount<br>"; # debug print.
foreach $pLink ( @{${$pCat}{DownList}} ) {
LibMod->CountCatItems(${$pLink}{Down}); # call self recursively
};
};
This works, but only counts the number of items directly included in a
category, not the ones which belong to a subcategory.
The thing that makes it hard is the multiple category membership plus
multiple category grouping, but I want to keep this if I possibly can,
because one of the problems with traditional non-computerised indexes is
that you can't do this even though books can be relevant to several
different topics.
My thoughts at the moment are either:
- give each category a hash where the keys are all the items included in
the category. Read these in from the database, then each time a
subcategory is counted, the code goes back up the tree to the root, adding
the items that belong to it to all the higher level categories. When the
recursive calls to subcategories return, (i.e. end of this code fragment),
then count the keys in the hash. This would work I think, but it would
involve retrieving all the category memberships from the database, rather
than just the counts, and it might cause speed and memory problems if a
lot of items were added to the system.
- or when you put an item in a category, category memberships are
automatically created in the database for all the higher level categories.
This would be quicker to count, but I'm thinking that if someone has
chosen to put an item in a given subcategory, and then later on I decide
to move that subcategory to a different group, the category memberships
should shift to follow this, whereas with this system the item would stay
in the old top-level categories.
Finally - my questions are:
- does anyone know of an algorithm for working out the counts just from
the counts in each category - i.e. without having to get details of
exactly which items belong to each category? (I have a feeling this may be
impossible)
- or generally, and maybe more useful in the long run, a good place to
start looking on the web for answers to problems like this when they crop
up.
- also if you have any thoughts on either of the solutions I've
described, or can think of a better way of doing it, I'd appreciate it.
andy.