how to secure documents in server

T

The Natural Philosopher

AlmostBob said:
All I do is this:

SELECT id FROM table;
print "<img src=url/to/$id.jpg>";

Compared to your way:
- Simpler
- No need to start new php scripts to output raw binary stream for
every image
- No sockets
- No need to read heavy binary BLOB from DB
- No chance for possible cache attacks in MySQL, PHP, filesystem or
Apache

I don't want to sound religious, but I think my way is much better.

There is no better: it depends on the requirements.

Your way there is no chance to protect the image directory from random
downloads for example.

In my case the user may be a user with far greater access than the
general public, and have access to internal data - like plans drawings
and specifications.

I don't want script kiddies stealing vital info: Putting them in a
database is one giant leap in that sense.

execution speed and efficiency is only one of many many issues.

In my case the above, plus a general requirement to try and get all
important corporate data in the data base, under one backup regime, were
more significant. I especially did NOT want user accessible image files
that might get deleted by accident. I could protect the database area by
making it only accessible by root or the mysql daemon: direct access to
download areas had to be at lest readable, and if uploaded, wrteable, by
the permissions the web server and php ran at.


In practice at moderate loads the download speeds are far more dominant
that CPU or RAM limitations. And indeed the ability to make a special
download script that re-sizes the images on the fly, turned out to be a
better way to go than storing thumbnails of varying sizes. One trades
disk space for processing overhead.

As a practicing engineer all my working life, it still amazes me that
people will always come up with what amounts to a religious statement
about any particular implementation, that it is universally 'better'.

If that were the case, it would be universally adopted instantly.

Jerry has (for once) made an extremely valid point about directory sizes
as well. Databases are far better at finding things quickly in large
amounts of data: far better than a crude directory search. Once the
overhead in scanning the directory exceeds the extra download
efficiency, you are overall on a loser with flat files.

AND if you run into CPU or RAM limitations, its a lot easier to - say -
move your database to a honking new machine, or upgrade the one you have
than completely re-write all your applications to use the database, that
used to use a file.

I am NOT claiming that a database is te 'right' answer in all cases,
just pointing out that it may be a decision you want to make carefully,
as it is somewhat hard to change later on, and in most cases the extra
overhead on using it is more than compensated by the benefits,
particularly in access control.

Which was the primary concern of the OP.
 
J

Jerry Stuckle

Bart said:
Jerry said:
[...]
But don't count MS Access in there. Use a real database. MySQL
qualifies. And it has to be configured properly.

Not the real communism ? [*] I partly agree for MS Access [**], but I
have reasons to believe that my MySQL databases are set up properly.
This is not a thing I do myself, but sysadmins in one of the giant
datacenters who stick to one config for the entire park.

Not necessarily. Sysadmins cannot correctly set up a system in the
dark. They need communications from the developers on what data is
being stored, how it is being handled, etc.

Unfortunately, most sysadmins know very little about how to tune a
database (not just MySQL) and the results is poor response.
I think the question is how BLOBs are handled. My situation is a
browser-based application that consists of many read actions (public
+intranet) and few update/delete actions (admin). Now suppose:

(1) Read actions without BLOB:
- Application does not load any BLOB data from database.
- Application uses a var holding the system-path (usr/my/path/to/
pics/), adds the ID to it, adds .jpg to it, tests if file exists (-e).
- If yes, use URL-path in stead of system-path and output inside an
<IMG> to screen.
- No binary data has to be handled; the major memory use here (if any)
is the -e check for file existance. But even this could be skipped
with a workaround.

Wrong - binary data is still handled.
(2) Read actions with BLOB:
- Load BLOB from column (already a memory-intensive task of its own).
- Store in some folder (id.).
- Output with <img>.

Not very intensive at all. And you don't store it in some folder.
(3) Update & delete actions without BLOB:
- Update/delete instructions stay out of DB, affects file system only.

Yep.

(4) Update & delete actions with BLOB:
- Update/delete instructions stay out of file system, affects DB only

Yep.

It is my experience that (1) has huge memory benefits compared to
(2).

Memory is nothing nowadays. Sure, you need more memory for the database
to effectively handle large blobs. But a few more megabytes is nothing.

The difference between (3) and (4) is not so clear; especially because
MySQL probably optimizes this processus. I think in practice you would
see that (3) is faster for environment A, and (4) for environment B;
but never with real considerable differences.

And (1) and (2) are much more important since they count for 99.x% of
the queries in my case.

And the difference is much less than you claim.
[*] -"Communism is great." -"But look how things went in the USSR."
-"That was not the real communism."
[**] Many tendencies in MS Access are a good thermometer for general
database issues; MS Access is just the first that fails :)

Databases are optimized for retrieving data - especially from large
groups of data. File systems are just low level databases which handle
small amounts of data (a few files) very well.

One of the big differences is that as your data grows, the database
efficiency remains fairly static. However, file system performance
degrades. Eventually, the file system will actually perform worse than
the database does. Try putting 100K files in one directory. Good luck.
But a database handles 100M rows with ease.

And no, MS Access is not a real database, and is not a good thermometer
for anything other than how bad it really is. Real databases work in an
entirely different way and perform much differently.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
(e-mail address removed)
==================
 
J

Jerry Stuckle

Bart said:
All I do is this:

SELECT id FROM table;
print "<img src=url/to/$id.jpg>";

Compared to your way:
- Simpler
- No need to start new php scripts to output raw binary stream for
every image
- No sockets
- No need to read heavy binary BLOB from DB
- No chance for possible cache attacks in MySQL, PHP, filesystem or
Apache

I don't want to sound religious, but I think my way is much better.

It's easier for YOU. And you THINK your way is better. But you've
never really tried with lots of images, have you? In fact, I suspect
you've never really checked it at all with a real database which has
been designed and configured to do this type of operation.

So all you really have to go on is your opinion.

OTOH, some of us have been doing it for years (over 20, in my case,
starting with DB2 on mainframes), and have both designed databases and
configured RDBMS's to handle these operations efficiently. We've seen
the difference in performance, and it isn't what you claim.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
(e-mail address removed)
==================
 
B

Bart Van der Donck

Jerry said:
It's easier for YOU.  And you THINK your way is better.  But you've
never really tried with lots of images, have you?  

Yes I have, and the tests with BLOBs were disastrous for my case
(although I must admit this study was done already 9 years ago).

Perhaps you're right that my requirements were a bit particular; I'm
facing a read load of a few MB/sec and a modest update/delete load
only peaking at nightly cronjobs. Images are spread on the machine
over 57 directories, the largest directory is holding 22,241 images at
this moment. Maybe it's BSD or the running shell that is optimal (?);
one thing I know -and tested well enough- is that my MySQL cannot
handle this kind of BLOB "abuse" under such conditions.

I can understand it might be desirable that the URL to the image must
be unknown, like Natural Philosopher said, or other requirements which
make this or that approach more preferable. In my case the binaries
are about hotel photos having their telephone number as the name of
the JPG's. This level of protection is acceptable here; performance
critera are more crucial.
In fact, I suspect you've never really checked it at all with
a real database which has been designed and configured to do
this type of operation.
So all you really have to go on is your opinion.

It's unwise to draw a conclusion from something you only suspect.

But you're right, it's my opinion, but based on experience and
proceeded by quite some study and benchmarks. I think that, for my
case, it was the best possible design under the given requirements.
 
J

Jones

Not necessarily. Sysadmins cannot correctly set up a system in the
dark. They need communications from the developers on what data is
being stored, how it is being handled, etc.

Once upon a time the term, "system analyst" actually meant something.
And then Alan Sugar started selling desktop PC's to everyone and now
everyone thinks they're a "software engineer" just because they can hack
a few lines of PHP or type ./configure.

The "developers" should have worked it all out before the project even started.
Thats the REAL problem - here presumably and elsewhere for certain.
 
J

Jerry Stuckle

Jones said:
Once upon a time the term, "system analyst" actually meant something.
And then Alan Sugar started selling desktop PC's to everyone and now
everyone thinks they're a "software engineer" just because they can hack
a few lines of PHP or type ./configure.

The "developers" should have worked it all out before the project even started.
Thats the REAL problem - here presumably and elsewhere for certain.

No, there are still sysadmins, who are responsible for system tuning.
It isn't just the needs of the database developers which needs to be
taken into consideration - there are others, also.

Of course, you're right - nowadays there are too many "system
administrators" who only hold that title because they failed Programming
101.


--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
(e-mail address removed)
==================
 
J

Jerry Stuckle

Bart said:
Yes I have, and the tests with BLOBs were disastrous for my case
(although I must admit this study was done already 9 years ago).

How many is a lot? I've done it with over 50M images (several terabytes
- but that was a mainframe) in a database with no performance
degradation. But the database and RDBMS were designed to do it, also.

And this was under live conditions, averaging > 10K queries/second.
Perhaps you're right that my requirements were a bit particular; I'm
facing a read load of a few MB/sec and a modest update/delete load
only peaking at nightly cronjobs. Images are spread on the machine
over 57 directories, the largest directory is holding 22,241 images at
this moment. Maybe it's BSD or the running shell that is optimal (?);
one thing I know -and tested well enough- is that my MySQL cannot
handle this kind of BLOB "abuse" under such conditions.

Do it all in one directory. That's what the database effectively does.
And it means you don't need to sort images into different directories,
create new directories when the images get too large...
I can understand it might be desirable that the URL to the image must
be unknown, like Natural Philosopher said, or other requirements which
make this or that approach more preferable. In my case the binaries
are about hotel photos having their telephone number as the name of
the JPG's. This level of protection is acceptable here; performance
critera are more crucial.


It's unwise to draw a conclusion from something you only suspect.

But you're right, it's my opinion, but based on experience and
proceeded by quite some study and benchmarks. I think that, for my
case, it was the best possible design under the given requirements.

Yep, but your "study" and "benchmarks" were not necessarily accurate.
So neither are your conclusions.

Tune the RDBMS and design the database correctly, and there is virtually
no overhead. After all, all a file system is is a dumb dbms.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
(e-mail address removed)
==================
 
J

Jerry Stuckle

Geoff said:
Don't you mean, a file system is a database?

No, the files are a database. A file system is a dump database
management system.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
(e-mail address removed)
==================
 
J

Jerry Stuckle

Jerry said:
No, the files are a database. A file system is a dump database
management system.

Whoops - mistype. That should be "A file system is a dumB database
management system". But come to think of it, it is kind of a dump, also :)

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
(e-mail address removed)
==================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,142
Messages
2,570,818
Members
47,362
Latest member
eitamoro

Latest Threads

Top