making sure a file isn't being written/copied before moving

Paul Archer · Jul 16, 2009

So, I'm (still) working on some scripts to rename and reorganize my image
files.
Right now, I have a script on my laptop that pulls images off the CF card
and stores them locally. I have another script that rsync's the images to
my server at home whenever there's a connection available. These scripts
don't step on each other because they look at the process list for
instances of rsync.

But I need to write two more scripts that take the image files from an
incoming directory, rename them, and drop them into a directory for me to
work on them; then once I've done whatever I'm going to do (cull, keyword,
etc), then I put them into a directory for archiving. The images are going
to be pulled from that directory, put in an archive directory, and have
the immutable extended attribute (xattr) set.

My problem/issue is that I don't want to do anything with a file that is
in the process of being moved into one of these directories. I have to be
sure that the file is not still being moved/copied. Now, these directories
are all on the same filesystem, so it *should* be an atomic change by the
filesystem (technically a rename of the file from one directory/file name
to another). But I want to be sure--plus I may be dropping files into the
incoming directory from elsewhere from time to time. (I plan on going
through my backlog of old untagged files eventually.)

Can someone suggest an easy (or at least reliable) way to make sure that
any file I'm about to modify isn't being touched by another program?

Paul

Brian Candler · Jul 16, 2009

Paul said:
Can someone suggest an easy (or at least reliable) way to make sure
that
any file I'm about to modify isn't being touched by another program?

As you said yourself: rely on the atomic semantics of the filesystem.
Rename it to an extension which the other program will not recognise, or
into another directory which the other program won't be looking in.

This is how Maildir works, so maybe reading up on the semantics of
Maildir will help you.

http://www.qmail.org/qmail-manual-html/man5/maildir.html

Paul Archer · Jul 16, 2009

As you said yourself: rely on the atomic semantics of the filesystem.
Rename it to an extension which the other program will not recognise, or
into another directory which the other program won't be looking in.

Perhaps you missed it in my original post (or perhaps I simply wasn't
clear), but I may be operating on files that are added arbitrarily. My
concern is that the script starts acting on a file that is still being
copied. Renaming it won't help there.

Paul

Gary Wright · Jul 17, 2009

Perhaps you missed it in my original post (or perhaps I simply
wasn't clear), but I may be operating on files that are added
arbitrarily. My concern is that the script starts acting on a file
that is still being copied. Renaming it won't help there.

Just use the renaming semantics in the first program (the one doing
the copying) also. Basically you are using the filename appearance as
a synchronization mechanism between the multiple processing steps.

If you can't control the name of the file itself, then control the
directory in which it appears.

Gary Wright

Paul Archer · Jul 17, 2009

Just use the renaming semantics in the first program (the one doing the
copying) also. Basically you are using the filename appearance as a
synchronization mechanism between the multiple processing steps.

If you can't control the name of the file itself, then control the directory
in which it appears.

That still leaves me with the same problem: I have to read out of a
directory. Plus, there isnt' just going to be a script putting files in my
incoming directory. I'll be doing that myself as I clean up all my old
files.

Paul

Paul Archer · Jul 17, 2009

I think I solved my problem. I was looking at inotify in order to avoid
having the script have to check the directories on a regular basis. Turns
out that it can report when a file is closed for writing, *and* return the
path and basename of the file.
The only downside is that ruby-inotify doesn't (as far as I can tell) do
recursive checks of the directory, so I'm using Open3 to call inotifywait,
and parsing its output.

Here's my test program:

#!/usr/bin/ruby -w

require 'open3'
require 'ftools'

def inwait(path)
Open3.popen3("inotifywait -m -r #{path}"){ |stdin, stdout, stderr|
while line = stdout.gets
next unless line.include?("CLOSE_WRITE")
yield line
end
}
end

inwait("/tmp") do |line|
path, action, file = line.split
puts "path: \t #{path}"
puts "action: \t #{action}"
puts "file: \t #{file}"
File.move(path+file, "/tmp")
end

Paul

Brian Candler · Jul 17, 2009

Paul said:
Perhaps you missed it in my original post (or perhaps I simply wasn't
clear), but I may be operating on files that are added arbitrarily. My
concern is that the script starts acting on a file that is still being
copied. Renaming it won't help there.

The program which drops files into the directory has to work the same
way:

- open temporary file
- write to it
- close it
- fsync if you want to be sure it's on disk even if power is pulled
- rename it to final location

That's why I said to look at Maildir semantics - adding new E-mails to a
maildir works like this. (They are written into the tmp/ directory, and
then renamed into the new/ directory)

Paul Archer · Jul 18, 2009

The program which drops files into the directory has to work the same
way:

- open temporary file
- write to it
- close it
- fsync if you want to be sure it's on disk even if power is pulled
- rename it to final location

That's why I said to look at Maildir semantics - adding new E-mails to a
maildir works like this. (They are written into the tmp/ directory, and
then renamed into the new/ directory)
--

I see what you're saying. My issue was that I will be moving files into
this directory by hand as I go through my old, unmanaged digital images
and put them in this directory to be renamed and start the DAM (digital
asset management) process.

Of course, this is all moot now that I've found inotify will solve the
problem for me. Actually, it solves three problems:

1) It blocks, so I don't have to poll the directory.
2) It lets me know when a file has been moved to or written in the
directory (even if it's in a subdirectory).
3) It tells me the name of the file, so I don't have to go out and find
it.

Paul

Get await function in loop to finish before script ends	0	Oct 14, 2021
What should I do Before I give up programming?	6	Jan 14, 2023
Accessing file while it is being copied	8	Mar 5, 2007
I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
Css not being loaded	1	Oct 3, 2017
win32 How to make sure a file is completely written?	3	May 11, 2009
Making safe file names	2	May 7, 2013
Help figuring out a directory permission change problem	1	May 12, 2023

making sure a file isn't being written/copied before moving

Paul Archer

Brian Candler

Paul Archer

Gary Wright

Paul Archer

Paul Archer

Brian Candler

Paul Archer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads