deleteing records from DB_File doesnt decrease file size?

B

botfood

I am working on a DB cleanup tool that purges records from a tie()ed
file created with DB_File. I have a first pass done on the tool, and
seems to be functioning. On a test DB it correctly reports that it
found about 2500 'old' records according to my criteria,and deleted
them...

problem is that the file itself, which was about 10.3MB for 14k
records, did not change size after 2500 records were deleted. I sort of
expected some reduction in file size?!

Is there some kind of 'compact' or cleanup utility that I need to run
on the database to squeeze out the empty holes or something?


snippets shown... not working code
=======
....
use DB_File;
tie ( %tempHash , 'DB_File' , "${cfgRelPath_cgi2DB}/${dbfile}" ) ;


then later in a loop I delete a specific record with something like
this

delete($tempHash{$tempKey}) ;

=================
 
P

paddy3118

botfood said:
I am working on a DB cleanup tool that purges records from a tie()ed
file created with DB_File. I have a first pass done on the tool, and
seems to be functioning. On a test DB it correctly reports that it
found about 2500 'old' records according to my criteria,and deleted
them...

problem is that the file itself, which was about 10.3MB for 14k
records, did not change size after 2500 records were deleted. I sort of
expected some reduction in file size?!

Is there some kind of 'compact' or cleanup utility that I need to run
on the database to squeeze out the empty holes or something?


snippets shown... not working code
=======
....
use DB_File;
tie ( %tempHash , 'DB_File' , "${cfgRelPath_cgi2DB}/${dbfile}" ) ;


then later in a loop I delete a specific record with something like
this

delete($tempHash{$tempKey}) ;

=================
Sometimes the DB optimisations means that record 'deletions' only mean
that the record is marked as unused. This is rather like in some file
systems, when you delete a file, although the file is nolonger listed
in the directory, other programs that look at the underlying disk
structure may well be able to pick up data that was in the deleted
file.

One way to recover the space may be to just copy active records to
another DB file, double check the copy, then delete the original. (But
not if the DB can be accessed concurrently etc).

- Paddy.
 
B

botfood

One way to recover the space may be to just copy active records to
another DB file, double check the copy, then delete the original. (But
not if the DB can be accessed concurrently etc).
----------------

seems to work!
I changed my utility to write the 'keepers' to a new DB, and the 'old'
records to another; then deleted the orginal, renamed the purge one and
archived the old record db. This procedure DID result in a smaller file
when I was done.

thanks.

The REASON this was required hasnt gone away, and I'm not quite sure
what to do about it long term. Its not a perl issue, but is a web
server memory issue.... I have a site running on Apache with this
pretty big database, grew to about 20k records with a total file size
that grew to around 12MB. It recently acted very strangely and would
not write any new records. My best guess at this point is that settings
like the Apache::SizeLimit have something to do with it, but it remains
to be seen whether the Host is willing to alter config files for me.

ThePERL part of this is that I'm open to what people may suggest as
ways to reduce the memory used by any single process accessing the DB.
Especially if I have a report that needs to go thru the whole DB, how
can I reduce the hit on the server? Would it work to split out record
contents into a couple different 'tables' and pull them in if required?

I had thought that by tie()ing to a file on disk,I'd avoid eating up
the RAM and process memory except, but I dont really understand memory
and paging and all that.....

comments? ideas?

d
 
J

J. Gleixner

botfood said:
The REASON this was required hasnt gone away, and I'm not quite sure
what to do about it long term. Its not a perl issue, but is a web
server memory issue.... I have a site running on Apache with this
pretty big database, grew to about 20k records with a total file size
that grew to around 12MB.

That sounds more like a pretty small database.
It recently acted very strangely and would
not write any new records. My best guess at this point is that settings
like the Apache::SizeLimit have something to do with it, but it remains
to be seen whether the Host is willing to alter config files for me.
They/You should be able to determine if it's a memory issue, before
blindly changing things.
ThePERL part of this is that I'm open to what people may suggest as
ways to reduce the memory used by any single process accessing the DB.
Especially if I have a report that needs to go thru the whole DB, how
can I reduce the hit on the server? Would it work to split out record
contents into a couple different 'tables' and pull them in if required?

I had thought that by tie()ing to a file on disk,I'd avoid eating up
the RAM and process memory except, but I dont really understand memory
and paging and all that.....

comments? ideas?

Only a small part of the DBM is in memory at one time, so I'd doubt it's
a memory issue with DBM access. The file size of the DBM really doesn't
matter. More than likely, if it is a memory issue, it's in how you are
using that data (e.g. looping through the records and storing it in a
data structure) that is causing a problem. Since you don't provide any
code, that's only a guess.

Since it wouldn't write new records, it might have been corrupted, so be
sure to lock the DBM appropriately. See the documentation for "Locking:
The Trouble with fd" in perldoc DB_File for possible solutions.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top