Delete oldest files recursively until max limit is reached

Question

I'm trying to modify Sergiy's script to act recursively (currently it only removes 1 file, but I would like it to remove all files until the max file limit is reached). But I cannot figure it out. I feel like I should be able to modify the first function ("delete oldest") to read multiple timestamps and pass that to the following line, but I'm unfamiliar with this command and cannot find any info on it. Any ideas would be appreciated!

Ìt doesn't remove only 1 file. It removes the oldest files which are over your max limit set. Which means if you only have 4 files and set the max limit size to 3 it will only delete one. If you want it to act recursive add another check for e.g. an R and create a function that loops over the main with the different folders it finds. — Ziazis, Oct 27 '17 at 08:27
It only deletes one file @Ziazis, there's no loop to delete more. I think the OP doesn't mean recursively in the sense of recursing through subdirectories, but in the sense of deleting multiple files. — Arronical, Oct 27 '17 at 08:48
Ah, yeah you are right, just needs a loop around the filecount_above_limit. — Ziazis, Oct 27 '17 at 08:51

score 4 · Accepted Answer · answered Oct 27 '17 at 08:46

If you want the script to still operate on one directory, without recursing through subdirectories, the counting and deletion can be done in a while loop. The last section of the main function should be altered to look like this:

local file_inodes=$(get_files)
    while filecount_above_limit 
    do
        printf  "@@@ File count in %s is above %d." "$directory" $max_files
        printf "Will delete oldest\n"
        sort -k1 -n <<< "$file_inodes" | delete_oldest
        local file_inodes=$(get_files)
    done
    printf "@@@ File count in %s is below %d."  "$directory" $max_files
    printf "Exiting normally"

Warning!
The problem with this simple alteration is that, if you haven't commented in the delete line at the top, the script will loop endlessly, as it recalculates the number of files after each deletion. If a file isn't deleted, the file count stays the same and the loop never exits.

It is possible to alter the script in a more complex way, to remove the file inode from the file_inodes array once deleted and negatively increment the file_count variable, rather than repeat the local file_inodes=$(get_files) line. This would deal with the non-deleting checking situation, but I'll leave that to someone else.

Thanks, I had actually tried to put in this while loop, but didn't get the "local file_inodes=$(get_files)" line, so it was endlessly looping even when deleting the file. It would be nice to have the more elegant method described so that it works in "safe" mode as well, but for now this will do. — MysticEagle, Oct 28 '17 at 08:07

pa4080 · Answer 2 · 2017-10-31T06:21:00.057

I would suggest another solution, that will walk recursively within the destination directory tree structure and will delete all files, but except a predefined certain number of the new files. That solution is based on: (1) Recursive bash script and (2) Explaining a shell script to recursively print full directory tree.

1. Create executable script file, called walkr (walk and remove), that is located in /usr/local/bin to be accessible as shell command (more detailed steps).

2. The content of the script walkr is pretty simple:

#!/bin/bash

[[ -z "${NFK}" ]] && NFK='7' || NFK="$NFK"
[[ -z "${1}" ]] && ABS_PATH="${PWD}" || cd "${1}" && ABS_PATH="${PWD}"

file_operations() {
        local IFS=$'\t\n' # Change the value of the Internal Field Separator locally
        rm $(ls -lt | grep -Po '^-.*[0-9]{2}:[0-9]{2} \K.*' | tail -n +"$((NFK+1))") 2>/dev/null
}

walk() {
        cd "$1" && file_operations # Change directory to the destination path and call the above function
        for item in "$1"/*; do [[ -d "$item" ]] && walk "$item"; done # Make the recursion
}

walk "${ABS_PATH}"

3. Explanation:

In the beginning the the script will check if the variable $NFK (that determinate the number of the files to be kept) is set in advance - the condition [[ -z "${NFK}" ]]. If it's not set the default value is 7.
Next the script deals with the destination path (stdin of the command). If it is not provided - the condition [[ -z "${1}" ]] - the script will work into the current directory.
Finally the main function walk() will be executed.

The function walk():
- Initially it will change the directory to the destination path cd "$1" and then it will call the function file_operations(), that will work inside.
- Further, for each $item, within the current directory "$1"/*, which is also directory [[ -d "$item" ]] the function walk() will be executed again, thus we create the recursion.
The function file_operations():
- Initially it will set a local value of the internal Bash variable $IFS, thus we can properly handle <the list of the files to be removed>, no matter there are spaces inside the separate file names.
- Further the command rm $(<the list of the files to be removed>) will be executed. The redirection of the errors 2>/dev/null is for these cases when there is nothing to remove. <the list of the files to be removed> is taken in this way:
  - The command ls -lt will list the content of the current directory with long listing format -l and the list will be sorted by modification time, newest first -t. And this list is piped | to the next command.
  - The next command grep -Po '^-.*[0-9]{2}:[0-9]{2} \K.*' will crop these lines that begin ^ with -^†, from their beginning to the the pattern [0-9]{2}:[0-9]{2}_^‡. The option -P with the option -o will output the strings that match to the pattern ^-.*[0-9]{2}:[0-9]{2}_. The \K notify will ignore the matched part come before itself. (source - this useful answer)
    
    ^†Thus we will get only the names of the files from the list. Within the output of ls -l the lines that describe directories start with d, and these for files start with -. (source of the idea)
    
    ^‡This pattern matches to the time format 00:00.
  - Finally the command tail -n +"$((NFK+1)) will cut the first few lines of our file list. The number of these first few lines is equal to the value of $NFK plus 1, this is a requirement of the command tail.

4. Examples of usage:

To run walkr for the current directory:

walkr      # You shouldn't use any argument, 
walkr ./   # but you can use also this format

To run walkr for any child directory:

walkr <directory name>
walkr ./<directory name>
walkr <directory name>/<sub directory>

To run walkr for any other directory:
```
walkr /full/path/to/<directory name>
```
To change the number of the files to be kept (to 3 for example), use this format
```
NFK=3 walkr
NFK=3 walkr /full/path/to/<directory name>
# etc.
```

5. Let's play with the script walkr:

We can use the command touch file.name -d "1 hour ago" to create an empty file, dated one hour ago. So we can use the following commands to create a directory structure as this presented here.

mkdir -p ~/temp/dir{A..C} && cd ~/temp ;\
DEST='';      touch ${DEST}new_file{1..7} && touch ${DEST}older_file{1..7} -d "1 hour ago" && touch ${DEST}oldest_file{1..7} -d "2 hour ago" ;\
DEST='dirA/'; touch ${DEST}new_file{1..7} && touch ${DEST}older_file{1..7} -d "1 hour ago" && touch ${DEST}oldest_file{1..7} -d "2 hour ago" ;\
DEST='dirB/'; touch ${DEST}new_file{1..7} && touch ${DEST}older_file{1..7} -d "1 hour ago" && touch ${DEST}oldest_file{1..7} -d "2 hour ago" ;\
DEST='dirC/'; touch ${DEST}new_file{1..7} && touch ${DEST}older_file{1..7} -d "1 hour ago" && touch ${DEST}oldest_file{1..7} -d "2 hour ago"

Now we can perform some tests:

Update of the script's functionality. Here is presented updated version of the above script:

#!/bin/bash

[[ -z "${1}" ]] && ABS_PATH="${PWD}" || cd "$1" && ABS_PATH="${PWD}"
[[ -z "${2}" ]] && NFK='7' || NFK="$2"  # Number of the files to be kept
[[ -z "${3}" ]] && REC='1' || REC="$3"  # REC='1' - work recursively
[[ -z "${4}" ]] && VRB='1' || VRB="$4"  # VRB='1' - work in verbose mode

file_operations() {
        local IFS=$'\t\n' # Change the value of the Internal Field Separator locally

        if [ "$VRB" == "1" ]
        then # Verbose mode:
                rm -v $(ls -lt | grep -Po '^-.*[0-9]{2}:[0-9]{2} \K.*' | tail -n +"$((NFK+1))") 2>/dev/null && printf " -from: '%s' \n" "$1" || echo "nothing to remove in: '$1'"
        else # Quiet mode:
                rm $(ls -lt | grep -Po '^-.*[0-9]{2}:[0-9]{2} \K.*' | tail -n +"$((NFK+1))") 2>/dev/null
        fi
}

walk() {
        # Change directory to the destination path and call the above function, pass $1 for the verbose mode
        cd "$1" && file_operations "$1"
        # If REC='1': Recursive mode -- Make the recursion; otherwise work on the curent level
        if [ "$REC" == "1" ]; then for item in "$1"/*; do [[ -d "$item" ]] && walk "$item"; done; fi
}

walk "${ABS_PATH}"

This version of the script can handle few more input variables. It has quiet and verbose modes and can work non recursively.
The full format is:
```
walkr '<destination path>' '<number of lines to be kept>' '<no recursion>' '<quiet>'
```
Where the exact content of <no recursion> and <quiet> has no matter. Just the input variables $3 and $4 shouldn't be empty to be overwritten the default behaviour.

thanks for this alternative method, it looks quite nice. I'll have to check it out in more detail as it deviates significantly from what I already have implemented, but could be a really nice solution. I think the filename issue was why the original script I started with chose to go with the inode route instead: Does your last comment fix that issue? Thanks again! — MysticEagle, Oct 28 '17 at 08:12
Hi, @MysticEagle, I've updated the answer as best as I can. Now the script handling the names with spaces properly. Additionally I've added more complete version of the script. I've added this version as separate script to the bottom of the answer, with intention to keep the above part as simple as it's possible. — pa4080, Oct 28 '17 at 16:50

Delete oldest files recursively until max limit is reached

2 Answers2

Linked