Remove bulk files

Question

I need to remove all *.html files. But, I have a parent folder that has so many child folders in it and an HTML file is present in every single child folder. I can able to remove files by visiting each child's folder. But, I need to remove every HTML file directly from a parent folder

In addition to the answer already provided below, there is another way of doing it, with -exec or by piping to rm. In case of curiosity, here is the link: https://unix.stackexchange.com/a/167824/264443 — , Dec 12 '20 at 13:20

Rinzwind · Accepted Answer · 2020-12-28T09:29:07.737

10

Easiest method would be:

find . -type f -name '*.html' -delete

Do this from the directory where you want to start it from.

find is a command to search for something
. means from the directory you currently are in
f means only search for files (d is for directories)
and search for name ending in .html
and find has an option that deletes them

edited Dec 28 '20 at 09:29

answered Dec 12 '20 at 12:51

Rinzwind

299,756

hc_dev · Answer 2 · 2020-12-13T19:28:33.987

Before giving you a paste & run ready solution, there are some best practices and automation patterns for linux shell I would like to explain. Same concepts are also applied in other IT fields:

divide & conquer: split a problem into (isolated) parts to solve easily
separation of concerns: here delete as command; find as query (CQRS)
pipes and filters: the combination of best tools in a chain

Preview and pipe explained

Although the following recipe doesn't work for subdirectories (recursive), it illustrates a safe and modular method combining commands.

a) simple list and remove (preview first before action):

list all HTML files (just list, not revursive) ls *.html
then remove rm *.html

This is a safety first approach. You preview and check the list of files before removing. When sure to proceed, just alter the command command.

Alternatively you can easily extend your workflow using pipe:

b) directly combined with xargs

ls *.html | xargs -r0 rm

The xargs is necessary because rm can't read from stdin. See: http://stackoverflow.com/questions/20307299/ddg#20307392

The -r0 combines two options: -0 will deal with cornercase filenames (having newlines, spaces or quotes like it's done.html in them) which more robustly are separated by NUL character. The -r will stop running if no input is given.

See: How can I recursively delete all files of a specific extension in the current directory?

and: https://unix.stackexchange.com/a/83712

Directly remove using glob

In some shells (bash since version 4, ksh, zsh) you can recursively remove files by using wildcards (globstar) in path.

See: https://unix.stackexchange.com/questions/23576/how-do-i-recursively-delete-directories-with-wildcard

and how to check/enable the globstar

directly remove (dangerous):

rm **/*.html

This direct remove commands are dangerous because a small typo will have adverse impact. For example when accidentally inserted a blank in rm *.html it will become rm * .html. Unfortunately it will remove all files and may print the error of not finding second path '.html'.

Combine: find and remove

find (search for files in a directory hierarchy) and remove

using pipe (POSIX compliant: runs on more systems)

find . -type f -name "*.html" | xargs -r0 rm
directly executing a following command (like the trash-cli, for later undelete):

find . -type f -name "*.html" -exec trash {}
directly using convenient actions (reliable: avoid spawning and race-conditions)

find . -type f -name "*.html" -print0 -delete

Explanation:

-type f only looks for regular files (not directory d, not symlinks)
-name filters for names like the given pattern (can also be extended by -or and alternatively -regex)
-print0 prints the found file-paths (robustly NUL separated) before
-exec trash {} executes the given command (here trash) with found file as argument (substitutes the curly braces)
-delete directly removes the found file (performant because in the same process, although not as compatible with all find versions)

See notable warnings: How can I recursively delete all files of a specific extension in the current directory?

More in the GNU manual for Find, Deleting Files

Caution with mass delete: Linux has no built-in undelete

So first check what's found before deleting in batch. Hence I recommend safety measures.

interactive instead forcefully option: for rm -i instead of -f
list or find first, then check, then delete (e.g. by appending pipes or first running find without action)
using trash-folder tools that support later undo: like trash-cli or gvfs-trash

@steeldriver Notable advice against intuition: some commands only accept arguments (thus | xargs rm). Also differs when interpreting a given path (no recursion), but you can workaround expressing the path itself as kind of vector using globs if supported by your shell Thanks for contributing! — hc_dev, Dec 13 '20 at 17:39
OK they're better now - but there's really no need to use constructs like ls *.html | xargs ... when you can more robustly use printf '%s\0' *.html | xargs -r0 .... Similarly the find version could use -print0 | xargs -r0 rm (although if you're going to use find, the accepted answer is better). — steeldriver, Dec 13 '20 at 17:43

Remove bulk files

2 Answers2

Preview and pipe explained

Directly remove using glob

Combine: find and remove

Caution with mass delete: Linux has no built-in undelete