J
Justin C
We have a database of thousands of clothing items. Some of the items are
almost identical apart from their size. Consequently we use the same
image in our web-shop to advertise items of the same style, design and
colour.
In a program I have to get new images from the art guy's computer I end
up grepping the entire list of items $#(list-of-items) times, there must
be a better way. The file names are exactly the same as the style codes
apart from the size suffix being dropped. I'm using File::Find.
Here's some code:
find(\&set_flag, (keys %{ $stock_groups->{text2code} }));
sub set_flag {
return unless (-f $_ );
(my $item_code_part = $_) =~ s/\.jpg//;
$item_code_part = uc($item_code_part);
$item_code_part =~ s|_|/|g;
my @matches = grep(/$item_code_part/, keys %{ $stock_items });
foreach my $i (@matches) {
$stock_items->{$i}{got_image} = 1;
}
}
The 'find' iterates through two top level directories, 36 next level
directories in each of the top level, and about 20k files accross the
entire 72 level 2 directories. It then passes the filename to the sub
which compares the filename (which is only a partial stock code because
it may have several matches) with the hash of stock_items, pulling out
matches. Those matches are then iterated over and the items with an
available image get a hash element added and set to 1.
I can probably do the grep and iteration over the matches with map{...},
grep{...}, keys %{ $stock_items}; but I don't think that'll save much
time, and I'm not certain how to do, I can see how to create a new hash,
but I'm not sure if changing the hash while grep iterates through it is
a good idea.
The bottle-neck, as I see it, is running grep 20k times, once for each
image found. Can anyone suggest a better way?
Justin.
almost identical apart from their size. Consequently we use the same
image in our web-shop to advertise items of the same style, design and
colour.
In a program I have to get new images from the art guy's computer I end
up grepping the entire list of items $#(list-of-items) times, there must
be a better way. The file names are exactly the same as the style codes
apart from the size suffix being dropped. I'm using File::Find.
Here's some code:
find(\&set_flag, (keys %{ $stock_groups->{text2code} }));
sub set_flag {
return unless (-f $_ );
(my $item_code_part = $_) =~ s/\.jpg//;
$item_code_part = uc($item_code_part);
$item_code_part =~ s|_|/|g;
my @matches = grep(/$item_code_part/, keys %{ $stock_items });
foreach my $i (@matches) {
$stock_items->{$i}{got_image} = 1;
}
}
The 'find' iterates through two top level directories, 36 next level
directories in each of the top level, and about 20k files accross the
entire 72 level 2 directories. It then passes the filename to the sub
which compares the filename (which is only a partial stock code because
it may have several matches) with the hash of stock_items, pulling out
matches. Those matches are then iterated over and the items with an
available image get a hash element added and set to 1.
I can probably do the grep and iteration over the matches with map{...},
grep{...}, keys %{ $stock_items}; but I don't think that'll save much
time, and I'm not certain how to do, I can see how to create a new hash,
but I'm not sure if changing the hash while grep iterates through it is
a good idea.
The bottle-neck, as I see it, is running grep 20k times, once for each
image found. Can anyone suggest a better way?
Justin.