How to best update remote compressed, encrypted archives incrementally?

R

robert

Hello,

I want to put (incrementally) changed/new files from a big file tree
"directly,compressed and password-only-encrypted" to a remote backup
server incrementally via FTP,SFTP or DAV.... At best within a closed
algorithm inside Python without extra shell tools.
(The method should work with any protocol which allows somehow read,
write & seek to a remote file.)
On the server and the transmission line there should never be
unencrypted data.

Usually one would create a big archive, then compress, then encrypt
(e.g. with gpg -c file) , then transfer. However for that method you
need to have big free temp disk space and most costing: transfer always
the complete archive.
With proved block-file encryption methods like GPG I don't get the
flexibility needed for my task, I guess?

ZIP2 format allows encryption (Is this ZIP encryption method supported
with Python somehow/basically?). Somehow it would be possible to
navigate in a remote ZIP (e.g. over FTP) . But ZIP encryption is also
known to be very weak and can be cracked within some hours computing
time, at least when every file uses the same password.

Another method would be to produce slice files: Create inremental
TAR/ZIP archives, encrypt them locally with "gpg -c" and put them as
different files. Still a fragile setup, which allows only rough control,
needs a common archive time stamp (comparing individual file attributes
is not possible), and needs external tools.

Very nice would be a method which can directly compare agains and update
a single consistent file like
ftp://..../archive.zip.gpg

Is something like this possible?

Robert
 
S

Steven D'Aprano

Hello,

I want to put (incrementally) changed/new files from a big file tree
"directly,compressed and password-only-encrypted" to a remote backup
server incrementally via FTP,SFTP or DAV.... At best within a closed
algorithm inside Python without extra shell tools.

What do you mean by "closed algorithm"?

The only thing I can think of is you mean a secret algorithm, one which
nobody but yourself will know. So let's get this straight... you are
asking a public newsgroup dedicated to an open-source language for
somebody to tell you a secret algorithm that only you will know?

Please tell me I've misunderstood.

(The method should work with any protocol which allows somehow read,
write & seek to a remote file.)
On the server and the transmission line there should never be
unencrypted data.

Break the job into multiple pieces. Your task is:

- transmit information to the remote server;

Can you use SSH for that? SSH will use industrial strength encryption,
likely better than anything you can create.

- you want to update the files at the other end;

Sounds like a job for any number of already existing technologies, like
rsync (which, by the way, already uses ssh for the encrypted transmission
of data).
 
R

robert

Steven said:
What do you mean by "closed algorithm"?

The only thing I can think of is you mean a secret algorithm, one which
nobody but yourself will know. So let's get this straight... you are
asking a public newsgroup dedicated to an open-source language for
somebody to tell you a secret algorithm that only you will know?

Please tell me I've misunderstood.

no. I meant it terms of 'cohesive' : A Python solution without a lot of
other tools. (Only the password has to be secret)
Break the job into multiple pieces. Your task is:

- transmit information to the remote server;

Can you use SSH for that? SSH will use industrial strength encryption,
likely better than anything you can create.

Yes, sftp (=SSH) or ftp with TSL (=SSL) are good protocols. They can
also read/navigate in a remote fila and append-to-file. But how about
incremental+encrypted?
- you want to update the files at the other end;

Sounds like a job for any number of already existing technologies, like
rsync (which, by the way, already uses ssh for the encrypted transmission
of data).

As far as I know, rsync cannot update compressed+encrypted into an
existing file(set) ?
I any case with rsync I would have to have a duplicate of the backup
file geometry on the local machine (consuming another magnitude of the
file stuff itself) ?

Thats why I ask: how to get all these tasks into a cohesive encrypted
backup solution not wasting disk space and network bandwidth?

Robert
 
S

Steven D'Aprano

As far as I know, rsync cannot update compressed+encrypted into an
existing file(set) ?
I any case with rsync I would have to have a duplicate of the backup
file geometry on the local machine (consuming another magnitude of the
file stuff itself) ?

Let me see if I understand you.

On the remote machine, you have one large file, which is compressed and
encrypted. Call the large file "Archive". Archive is made up of a number
of virtual files, call them A, B, ... Z. Think of Archive as a compressed
and encrypted tar file.

On the local machine, you have some, but not all, of those smaller
files, let's say B, C, D, and E. You want to modify those smaller files,
compress them, encrypt them, transmit them to the remote machine, and
insert them in Archive, replacing the existing B, C, D and E.

Is that correct?
Thats why I ask: how to get all these tasks into a cohesive encrypted
backup solution not wasting disk space and network bandwidth?

What's your budget for developing this solution? $100? $1000? $10,000?
Stop me when I get close. Remember, your time is money, and if you are a
developer, every hour you spend on this is costing your employer anything
from AUD$25 to AUD$150. (Of course, if you are working for yourself, you
might value your time as Free.)

If you have an unlimited budget, you can probably create a solution to do
this, keeping in mind that compressed/encrypted and modify-in-place
*rarely* go together.

If you have a lower budget, I'd suggest you drop the "single file"
requirement. Hard disks are cheap, less than an Australian dollar a
gigabyte, so don't get trapped into the false economy of spending $100 of
developer time to save a gigabyte of data. Using multiple files makes it
*much* simpler to modify-in-place: you simply replace the modified file.
Of course the individual files can be compressed and encrypted, or you can
use a compressed/encrypted file system.

Lastly, have you considered that your attempted solution is completely the
wrong way to solve the problem? If you explain _what_ you are wanting to
do, rather than _how_ you want to do it, perhaps there is a better way.
 
R

robert

Steven D'Aprano wrote:

Let me see if I understand you.

On the remote machine, you have one large file, which is compressed and
encrypted. Call the large file "Archive". Archive is made up of a number
of virtual files, call them A, B, ... Z. Think of Archive as a compressed
and encrypted tar file.

On the local machine, you have some, but not all, of those smaller
files, let's say B, C, D, and E. You want to modify those smaller files,
compress them, encrypt them, transmit them to the remote machine, and
insert them in Archive, replacing the existing B, C, D and E.

Is that correct?

Yes, that is it. In addition a possiblity for (fast) comparison of
individual files would be optimal.
What's your budget for developing this solution? $100? $1000? $10,000?
Stop me when I get close. Remember, your time is money, and if you are a
developer, every hour you spend on this is costing your employer anything
from AUD$25 to AUD$150. (Of course, if you are working for yourself, you
might value your time as Free.)

If you have an unlimited budget, you can probably create a solution to do
this, keeping in mind that compressed/encrypted and modify-in-place
*rarely* go together.

If you have a lower budget, I'd suggest you drop the "single file"
requirement. Hard disks are cheap, less than an Australian dollar a
gigabyte, so don't get trapped into the false economy of spending $100 of
developer time to save a gigabyte of data. Using multiple files makes it
*much* simpler to modify-in-place: you simply replace the modified file.
Of course the individual files can be compressed and encrypted, or you can
use a compressed/encrypted file system.

Lastly, have you considered that your attempted solution is completely the
wrong way to solve the problem? If you explain _what_ you are wanting to
do, rather than _how_ you want to do it, perhaps there is a better way.

So, there seems to be a big barrier for that task, when encryption is on
the whole archive. A complex block navigation within a block cipher
would be required, and obviously there is no such (handy) code already
existing. Or is there a encryption/decryption method which you can can
use like a file pipe _and_ which supports 'seek'?

Thus, a simple method would use a common treshold timestamp or
archive-bits and create multiple archive slices. (Instable when the file
set is dynamic and older files are copied to the tree.)

2 nearly optimal solutions which allows comparing individual files

1st:
+ an (s)ftp(s)-to-zip/tar bridge seems to be possible. E.g. by hooking
ZipFile to use a virtual self.fp
+ the files would be individually encrypted by a password
- an external tool like "gpg -c" is necessary; (or is there a good
encryption with a native python module? Is PGP (password only) possible
with a native python module? )
- the filenames would be visible

2nd:
+ manage a dummy file-tree locally for speedy comparision (with 0-length
files)
+ create encrypted archive slices for upload with iterated filenames
- an external tool like "gpg -c" is necessary
- extra file tree or file attribute database
- unrolling status from multiple archive slices is arduous

Robert
 
S

Steven D'Aprano

So, there seems to be a big barrier for that task, when encryption is on
the whole archive. A complex block navigation within a block cipher
would be required, and obviously there is no such (handy) code already
existing. Or is there a encryption/decryption method which you can can
use like a file pipe _and_ which supports 'seek'?

[snip]

Let's try again: rather than you telling us what technology you want to
use, tell us what your aim is. I suspect you are too close to the trees to
see the forest -- you are focusing on the fine detail. Let's hear the big
picture: what is the problem you are trying to solve? Because, frankly, as
far as I can see, the solution you are looking for doesn't exist. But
maybe I'm too far from the forest to see the individual trees.

"I need encryption that supports seek" -- no, that's you telling us _how_
you want to solve your problem.

Perhaps you can tick some/all of the following requirements:

- low bandwidth usage when updating the remote site

- transmission needs to be secure

- data on the remote site needs to be secure in case of theft or break-ins

- remote site is under the control of untrusted parties;
or remote site is trusted

- remote site is an old machine with limited processing power and very
small disk storage;
or remote site can be any machine we choose

- local site needs to run Windows/Macintosh/Linux/BSD/all of the above

- remote site runs on Windows/Macintosh/Linux/BSD/anything we like

- we are updating text files/binary files

- anything else you can tell us about the nature of your problem
 
R

robert

Steven said:
So, there seems to be a big barrier for that task, when encryption is on
the whole archive. A complex block navigation within a block cipher
would be required, and obviously there is no such (handy) code already
existing. Or is there a encryption/decryption method which you can can
use like a file pipe _and_ which supports 'seek'?


[snip]

Let's try again: rather than you telling us what technology you want to
use, tell us what your aim is. I suspect you are too close to the trees to
see the forest -- you are focusing on the fine detail. Let's hear the big
picture: what is the problem you are trying to solve? Because, frankly, as
far as I can see, the solution you are looking for doesn't exist. But
maybe I'm too far from the forest to see the individual trees.

"I need encryption that supports seek" -- no, that's you telling us _how_
you want to solve your problem.

Perhaps you can tick some/all of the following requirements:

- low bandwidth usage when updating the remote site

- transmission needs to be secure

- data on the remote site needs to be secure in case of theft or break-ins

- remote site is under the control of untrusted parties;
or remote site is trusted

- remote site is an old machine with limited processing power and very
small disk storage;
or remote site can be any machine we choose

- local site needs to run Windows/Macintosh/Linux/BSD/all of the above

- remote site runs on Windows/Macintosh/Linux/BSD/anything we like

- we are updating text files/binary files

- anything else you can tell us about the nature of your problem

The main requirement is, that it has to be become a cohesive, reusable,
portable (FTP/SFTP standard) functionality as mentioned in the OP. A
Python module at best. For integration in a bigger Python app. not a
one-time admin hack with a bunch of tools to be fiddled together on each
user machine. So the 'how' is mostly =='what'. Its a Python question so far.

The last 2 methods I mentioned already are maybe a way to a compromise,
(if integrated one-stream encryption cannot be managed)

The only issue remaining: A native Python module for pgp-(pwd
only)-encryption or another kind of good (commonly supported)
encryption. ZIP2-encryption itself seems to be too weak? (Still so in
recent ZIP formats? what about the mode of 7zip etc?) But I found no
python modules for either.

http://www.amk.ca/python/code/gpg just calls into an external gpg
installation.

Can the functionality of "gpg -c" maybe fiddled together with PyCrypto
easily ? (variable length key/pwd only - no public key stuff required)

And what about ZIP password-only encryption itself? Are there maybe any
usable improvents ?

And: when there are many files encrypted with the same password (both
PGP and ZIP), will this decrease the strength of encryption?

Robert
 
R

robert

Would rsync into a remote encrypted filesystem work for you?

the sync (selection) is custom anyway. The remote filesystem is
general/unknow. FTP(S) / SFTP is the only standard given.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top