making sure a file isn't being written/copied before moving

P

Paul Archer

So, I'm (still) working on some scripts to rename and reorganize my image
files.
Right now, I have a script on my laptop that pulls images off the CF card
and stores them locally. I have another script that rsync's the images to
my server at home whenever there's a connection available. These scripts
don't step on each other because they look at the process list for
instances of rsync.

But I need to write two more scripts that take the image files from an
incoming directory, rename them, and drop them into a directory for me to
work on them; then once I've done whatever I'm going to do (cull, keyword,
etc), then I put them into a directory for archiving. The images are going
to be pulled from that directory, put in an archive directory, and have
the immutable extended attribute (xattr) set.

My problem/issue is that I don't want to do anything with a file that is
in the process of being moved into one of these directories. I have to be
sure that the file is not still being moved/copied. Now, these directories
are all on the same filesystem, so it *should* be an atomic change by the
filesystem (technically a rename of the file from one directory/file name
to another). But I want to be sure--plus I may be dropping files into the
incoming directory from elsewhere from time to time. (I plan on going
through my backlog of old untagged files eventually.)

Can someone suggest an easy (or at least reliable) way to make sure that
any file I'm about to modify isn't being touched by another program?

Paul
 
B

Brian Candler

Paul said:
Can someone suggest an easy (or at least reliable) way to make sure
that
any file I'm about to modify isn't being touched by another program?

As you said yourself: rely on the atomic semantics of the filesystem.
Rename it to an extension which the other program will not recognise, or
into another directory which the other program won't be looking in.

This is how Maildir works, so maybe reading up on the semantics of
Maildir will help you.

http://www.qmail.org/qmail-manual-html/man5/maildir.html
 
P

Paul Archer

As you said yourself: rely on the atomic semantics of the filesystem.
Rename it to an extension which the other program will not recognise, or
into another directory which the other program won't be looking in.

Perhaps you missed it in my original post (or perhaps I simply wasn't
clear), but I may be operating on files that are added arbitrarily. My
concern is that the script starts acting on a file that is still being
copied. Renaming it won't help there.

Paul
 
G

Gary Wright

Perhaps you missed it in my original post (or perhaps I simply
wasn't clear), but I may be operating on files that are added
arbitrarily. My concern is that the script starts acting on a file
that is still being copied. Renaming it won't help there.

Just use the renaming semantics in the first program (the one doing
the copying) also. Basically you are using the filename appearance as
a synchronization mechanism between the multiple processing steps.

If you can't control the name of the file itself, then control the
directory in which it appears.

Gary Wright
 
P

Paul Archer

Just use the renaming semantics in the first program (the one doing the
copying) also. Basically you are using the filename appearance as a
synchronization mechanism between the multiple processing steps.

If you can't control the name of the file itself, then control the directory
in which it appears.

That still leaves me with the same problem: I have to read out of a
directory. Plus, there isnt' just going to be a script putting files in my
incoming directory. I'll be doing that myself as I clean up all my old
files.

Paul
 
P

Paul Archer

I think I solved my problem. I was looking at inotify in order to avoid
having the script have to check the directories on a regular basis. Turns
out that it can report when a file is closed for writing, *and* return the
path and basename of the file.
The only downside is that ruby-inotify doesn't (as far as I can tell) do
recursive checks of the directory, so I'm using Open3 to call inotifywait,
and parsing its output.

Here's my test program:

#!/usr/bin/ruby -w

require 'open3'
require 'ftools'

def inwait(path)
Open3.popen3("inotifywait -m -r #{path}"){ |stdin, stdout, stderr|
while line = stdout.gets
next unless line.include?("CLOSE_WRITE")
yield line
end
}
end

inwait("/tmp") do |line|
path, action, file = line.split
puts "path: \t #{path}"
puts "action: \t #{action}"
puts "file: \t #{file}"
File.move(path+file, "/tmp")
end


Paul
 
B

Brian Candler

Paul said:
Perhaps you missed it in my original post (or perhaps I simply wasn't
clear), but I may be operating on files that are added arbitrarily. My
concern is that the script starts acting on a file that is still being
copied. Renaming it won't help there.

The program which drops files into the directory has to work the same
way:

- open temporary file
- write to it
- close it
- fsync if you want to be sure it's on disk even if power is pulled
- rename it to final location

That's why I said to look at Maildir semantics - adding new E-mails to a
maildir works like this. (They are written into the tmp/ directory, and
then renamed into the new/ directory)
 
P

Paul Archer

The program which drops files into the directory has to work the same
way:

- open temporary file
- write to it
- close it
- fsync if you want to be sure it's on disk even if power is pulled
- rename it to final location

That's why I said to look at Maildir semantics - adding new E-mails to a
maildir works like this. (They are written into the tmp/ directory, and
then renamed into the new/ directory)
--

I see what you're saying. My issue was that I will be moving files into
this directory by hand as I go through my old, unmanaged digital images
and put them in this directory to be renamed and start the DAM (digital
asset management) process.

Of course, this is all moot now that I've found inotify will solve the
problem for me. Actually, it solves three problems:

1) It blocks, so I don't have to poll the directory.
2) It lets me know when a file has been moved to or written in the
directory (even if it's in a subdirectory).
3) It tells me the name of the file, so I don't have to go out and find
it.

Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top