26

My Déjà Dup backups have become quiet large and I noticed they contain a huge number of unnecessary files (e.g. *.pyc files, **__pycache__ folders and other build related temporary stuff).

I know that I can ignore specific folders, but is there a way to exclude files and or folders by patterns?

I thought there might be more options usable trough a configuration file, but Déjà Dup doesn't use one. So I looked at duplicity (the CLI it is based on), but the man page doesn't mention a configuration file either. I know that duplicity can ignore files and folders based on patterns (--exclude, --exclude-filelist), but I have no idea how to combine this with Déjà Dup.

Do I have to ditch Déjà Dup and use duplicity manually? Or is there a way to set the needed options, so that they are used automatically, when duplicity is used by Déjà Dup?

Brutus
  • 864

12 Answers12

7

You can edit the exclude list like:

gsettings get org.gnome.DejaDup exclude-list
# remove comment to execute
# gsettings set org.gnome.DejaDup exclude-list ['path1', 'path2']

Source: https://answers.launchpad.net/deja-dup/+question/280954

I tried to add patterns like '**/.git' and '**/build' into that list, like this:

gsettings get org.gnome.DejaDup exclude-list > exclude-list
gedit exclude-list
gsettings set org.gnome.DejaDup exclude-list "`cat exclude-list`"

But to me it seems like the **'s were not passed to duplicity. So instead I ended up doing seaches like

locate "/home/*/.svn"
locate "/home/*/build"

and added them to the exclude-list manually

  • I made a script to export a long list of the (~350) excluded paths. However, it seems this really slows down the backup, though CPU is about the same. – jessexknight Sep 24 '20 at 13:04
  • when you set the list, you need to enclose the list with quote – ying May 14 '21 at 20:02
6

Using ** patterns do not (any longer) work because deja-dub escapes [?* characters in duplicity command. See https://git.launchpad.net/deja-dup/tree/libdeja/tools/duplicity/DuplicityJob.vala#n303 :

  string escape_duplicity_path(string path)
  {
    // Duplicity paths are actually shell globs.  So we want to escape anything
    // that might fool duplicity into thinking this isn't the real path.
    // Specifically, anything in '[?*'.  Duplicity does not have escape
    // characters, so we surround each with brackets.
    string rv;
    rv = path.replace("[", "[[]");
    rv = rv.replace("?", "[?]");
    rv = rv.replace("*", "[*]");
    return rv;
  }

  void process_include_excludes()
  {
    expand_links_in_list(ref includes, true);
    expand_links_in_list(ref excludes, false);

    // We need to make sure that the most specific includes/excludes will
    // be first in the list (duplicity uses only first matched dir).  Includes
    // will be preferred if the same dir is present in both lists.
    includes.sort((CompareFunc)cmp_prefix);
    excludes.sort((CompareFunc)cmp_prefix);

    foreach (File i in includes) {
      var excludes2 = excludes.copy();
      foreach (File e in excludes2) {
        if (e.has_prefix(i)) {
          saved_argv.append("--exclude=" + escape_duplicity_path(e.get_path()));
          excludes.remove(e);
        }
      }
      saved_argv.append("--include=" + escape_duplicity_path(i.get_path()));
      //if (!i.has_prefix(slash_home_me))
      //  needs_root = true;
    }
    foreach (File e in excludes) {
      saved_argv.append("--exclude=" + escape_duplicity_path(e.get_path()));
    }

    // TODO: Figure out a more reasonable way to order regexps and files.
    // For now, just stick regexps in the end, as they are more general.
    foreach (string r in exclude_regexps) {
      saved_argv.append("--exclude=" + r);
    }

    saved_argv.append("--exclude=**");
  }
6
  1. install dconf-editor
sudo apt install dconf-editor
  1. run dconf-editor as normal user. (don't use sudo)
dconf-editor
  1. locate org -> gnome -> deja-dup -> exclude-list
  2. set Custom value to (replace leo with your user name)
['$TRASH', '$DOWNLOAD', '/home/leo/.anaconda', '/home/leo/**/node_modules', '/home/leo/**/__pycache__', '/home/leo/**/*.pyc']
  1. You might need to reboot/re-signin. I run Screenshot which automatically update the value. I don't know why, maybe someone else can explain.

Screenshots:

Replace leo with your user name

replace 'leo' with your user name

It should look like this way

damadam
  • 2,833
LeoZ
  • 161
  • 4
    I tried this and the ~/**/node_modules does show in the 'Folder to ignore', but still they are backed-up..., so does not seem to work... – musicformellons Jan 21 '20 at 19:22
  • 1
    This answer doesn't work because Deja Dup escapes the asterisks when passing the exclude list to the underlying tool duplicity, cf. https://gitlab.gnome.org/World/deja-dup/-/issues/112#note_912936 – Salim B Jan 31 '22 at 22:43
6

There is no way currently with Deja Dup to do advanced filtering like that. See upstream bug https://bugs.launchpad.net/deja-dup/+bug/374274

5

I came up with a working workaround for this. The problem seems to be that duplicity does not itself expand wildcards (except for ** apparently) but relies on the shell to do that, and when it is run from deja-dup there is no shell involvement, which is why that now blocks configuring wildcard excludes. Sure, you can use dconf-editor to force them into the saved excludes list, but they don't work (from the monitoring in the script below I actually found that deja-dup would drop excludes containing a '*' and not pass them to duplicity at all).

To make it work we need shell expansion of the wildcards. You can do that manually and insert the results via dconf-editor, as suggested here, but this solution does it automatically at backup run time.

First find where duplicity is on your path ("which duplicity") then find a path location ahead of it in the path ("echo $PATH"). In my case it is /usr/bin/duplicity and /usr/local/bin comes ahead of that, which is perfect. Create a text file named duplicity in that latter path location (e.g. /usr/local/bin/duplicity), make it executable (chmod +x ...) and put this content in there:

#! /bin/bash

Shim script run from deja-dup in place of duplicity, to add in file/pattern

exclude arguments for duplicity.

The excludes are read from ~/.config/deja-dup-excludes (one-per-line).

ARGS="$*"

EXCLUDES=$(cat $HOME/.config/deja-dup-excludes | sed -e 's/#.$//' -e 's/^[ \t]//' -e '/^$/d')

if ( echo "$ARGS" | grep -q '--exclude'); then for EXCL in $EXCLUDES do EXCL_ARG=$(find $EXCL -printf '--exclude %p ')

    ARGS="$EXCL_ARG$ARGS"
done

fi

#echo "$ARGS" >>/tmp/dup.out

/usr/bin/duplicity $ARGS

Make sure that the last line has the correct path for the real duplicity on your machine, and you can un-comment the echo statement if you want to check your work.

Then create a file .config/deja-dup-excludes under your home directory, with the excludes listed one-per-line, e.g.:

# Exclude files/patterns for deja-dup
# (used by the /usr/local/bin/duplicity script).

/home/Ian/core.* /etc/postfix/sasl_passwd*

Any lines beginning with a '#' will be taken as comment lines and ignored.

deja-dup will now execute that script instead of the real duplicity, and it will add in the necessary --exclude arguments before calling the latter.

A hack, admittedly, but it works a charm.

  • Thank you for the idea. This particular implementation was not working for me, but I added a simplified version that works on my system (Ubuntu 18). See https://askubuntu.com/a/1378432/347084 – Bruno Bossola Nov 30 '21 at 16:38
5

I am using a variation of Ian Puleston answer, much simplified as the proposed script was not working for me. Refer to his answer, but use this content instead for the duplicity shim script:

#!/bin/bash

Shim script run from deja-dup in place of duplicity

which forces the exclusions read from ~/.config/deja-dup-excludes

using the standard filelist exclusion mechanism

/usr/bin/duplicity --include=$HOME/.cache/deja-dup/metadata --exclude-filelist=$HOME/.config/deja-dup-excludes "$@"

This is tested on Ubuntu 18.04 and 20.04, and it uses duplicity's standard --exclude-filelist command, which supports some extended shell globbing patterns (for older duplicity versions you can swap it with --exclude-globbing-filelist). It's important that the exclusion appears first in order for everything to work. Be careful with what you exclude, no liabilities accepted, use at your own risk.

Notes about --include=$HOME/.cache/deja-dup/metadata

--include=$HOME/.cache/deja-dup/metadata is hardcoded in deja-dups invocation of duplicity. The file is created during backup and used for basic sanity checking, thus required to be included in the backup. If it's not, deja-dup fails with the following error at the end of the backup creation:

'metadata' file not found when creating backup ("Could not restore ‘/home/user/.cache/deja-dup/metadata’: File not found in backup"

Thus the --include=$HOME/.cache/deja-dup/metadata argument is required (an must come first) in the above shim script in case your exclusion rules in ~/.config/deja-dup-excludes happen to exclude $HOME/.cache (e.g. via /**/.cache). See the FILE SELECTION section in duplicity's manpage for details.

Salim B
  • 133
2

I tried Jacob Nordfalk's method, but it did not work for me (maybe the syntax changed).

However, I was able to change the setting using dconf-editor. You can modify the list at path /org/gnome/deja-dup/exclude-list

jost21
  • 230
  • 1
  • 3
  • 12
2

So I've write a small script to deal with the issue with a .dejadupignore file. But before one dives into this I have to say it is at least as worthwhile to dive into duplicity docs using include_list.txt as Justin Solms' answer suggests and further explained in this post..

The script first fetches the already existing ignores, then uses the locate command on every regex line specified in .dejadupignore and adds all found locations to a single array. It finally calls the gsettings command as proposed by Paul Smith to add all found files to ignore to the DejaDup exclude list.

A major setback however is the ulimit which defaults to 8192 on Linux. And even after I expanded it, it couldn't deal with the amount of __pycache__ locations as an argument to the gsettings command.

Anyway, hope it is of use to anyone.

import subprocess, os
from ast import literal_eval
from subprocess import PIPE


raw_lines = subprocess.run('gsettings get org.gnome.DejaDup exclude-list'.split(' '), stdout=PIPE, stderr=PIPE).stdout.decode('utf-8')
ignore_lines = literal_eval(raw_lines)

with open('.dejadupignore', 'r') as f:
    contents = f.readlines()

lines = [l.rstrip('\n') for l in contents]
for line in lines:
    line = os.path.expanduser(line)
    p = subprocess.run(["locate", line], stdout=PIPE, stderr=PIPE, shell=True)
    pstring = p.stdout.decode('utf-8')
    to_ignore = pstring.split('\n')
    to_ignore = [i for i in to_ignore if i != '']
    ignore_lines.extend(to_ignore)

command = 'gsettings set org.gnome.DejaDup exclude-list'.split(' ') 
command.extend([str(ignore_lines)])

for line in ignore_lines:
    print(line)

def smallfunk():
    i = input("Do you want to add these lines to the DejaDup exclude list? \n Press 'y' for yes and 'n' for no \n")
    if i == 'y':
        subprocess.run(command)
    elif i == 'n':
        pass
    else:
        print('Not an option')
        smallfunk()


smallfunk()

With the .dejadupignore file being:

~/*/venv
~/*/.pyc
~/*/git
~/*/.git
1

I successfully achieve exclusion using my include_list.txt file containing:

- /home/justin/**/.insync-trash
- /home/justin/**/__pycache__
- /home/justin/**/*.pyc
- /home/justin/**/node_modules
- /home/justin/**/Google Photos
+ /home/justin/Documents
- /home/justin/*

The /**/ is important to match through to any directory depth.

Rule 1: The order is important. First be specific first and general later.

Rule 2: What has already been matched in a line (include or exclude) cannot be changed by subsequent matches in later lines. The documentation mentions this; but in terribly confusing English. Hope mine is better ;) The lines above achieve:

  • Line 1: exclude any __pycache__ at any depth.
  • Line 2: exlude any file with extension .pyc.
  • Line 6: include my specific and only Documents folder.
  • Line 7: exclude all my other home folder such as Pictures, Videos, Downloads, etc. Note that this cannot stop Documents from being included as it was already matched in Line 6! Order matters!
1

Get the current exclude list with:

$ gsettings get org.gnome.DejaDup exclude-list

which produces something like:

['', '/home/me/delete_me', '/home/me/eclipse', '/home/me/Music', '/home/me/R', '/home/me/Videos']

Then set your new list by wrapping the old output in quotes and adding your changes:

$ gsettings set org.gnome.DejaDup exclude-list "[ '', '/home/me/delete_me', '/home/me/eclipse', '/home/me/Music', '/home/me/R', '/home/me/Videos', '/home/me/**/.git']"

and run the get again to verify your changes.

Paul Smith
  • 130
  • 4
1

This answer is a variation of Daan Koetsenruijter's response who built upon half the thread here and upon LeoZ's response. So this is a remix of most of this thread. I remixed it because I wanted more control over what's going on and don't like a probably little tested (python) script that I can only read somewhat to mess with my settings. That's not critisizing Daan, it's my problem - thanks a lot Daan (and LeoZ, too), you basically solved it for me!

So, what I did is write a little command line that dumps the files I want to exclude and copy that to the respective value into the dconf editor (see LeoZ's response!).

And this is an example of such a command line:

find /home/myUserName -name node_modules -printf "'%P',\n" | grep -v 'node_modules/.*/node_modules' | sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g'

This does three things. The first command, "find" finds all directories I want to exclude (node_modules in this case) and formats it into the required format using "-printf". Don't do "find . [...]" because then the output will also be relative paths but you want absolute (and replace my username, obviously). Note that -printf adds a newline. This is in order to simplify the next command, the newline will be removed with the last command. Depending on your use case, you may simplify this.

The second command "grep" removes nested node_modules directories, in my case reducing the number of exclude entries from 15,000 to 250 or so.

Finally "sed" removes the newlines again. This can be simplified by not adding the newlines in the first place and using a smarter regExp in grep, but I couldn't be bothered :-)

I recommend removing the "sed" command and reviewing the exclude list first.

Caveats: If you have ' characters in the file-/dir-names you want to exclude, you need to solve another problem, but dconf editor probably saves you from breaking your settings (it checks the syntax). I guess there's more that can go wrong, so take care.

Thanks everybody in this thread for your help!

0

Sadly neither Duplicity nor Déjà Dup uses a configuration file :( But there might be a possible workaround, the user @mterry mentioned the following in the bug report linked above:

if you gconf-edit the exclude-list and add patterns like "**/parts", the pattern is passed to duplicity and everything works as expected..."

Now, where are those gconf settings stored these days?

Brutus
  • 864