This weekend, I finally got around to spending a little time cleaning up a single-file script I wrote to automatically process files as they are transferred into a local directory, via rsync.

The fact that I was able to do this in a short script is largely thanks to pyinotify, which takes care of interfacing with the linux internals which track file changes, so the hardest part was really done for me - hurray for open source!

That said, I would have thought rsync transfers were a prime use-case for this sort of tool, and yet I couldn't find any good examples on the web, which is why I thought it might be worth writing up (and pull-requesting). Handling of rsync files is a little more complex than the standard examples, because it creates temporary files for partial downloads, then renames them once the download is complete. If you only want to catch new files (my particular use case) then you need to track these temporary files as they are created and renamed - but this is easily done with a few lines of Python.

Anyway, for future reference: coding up custom file-tracking behaviour with pyinotify is not too hard:

  1. Profile your file transfer behaviour by running:

    python -m pyinotify /path/to/folder_to_watch
    

    Try manually transferring a single file, then pick out the sequence of events you want to track.

  2. Create a class inheriting from pyinotify.ProcessEvent that will perform the required state-tracking and job-dispatching. Customising its behaviour is just a case of overriding certain class-methods.

  3. Profit! That's it, basically. Now you just have to plug your custom class into one of the standard pyinotify usage examples.

More details can be found at the pyinotify tutorial, in the pyinotify examples dir, and (if you want to track rsync'ed files, or use asynchronous pool processing) by reading through my script.