Find is a beautiful tool

I have blogged before that knowledge of command-line tools is essential to take the next step in programming productivity. I think it would be useful to provide simple tutorials for these powerful tools, starting with find. I hope you agree, and would appreciate your feedback via the contact page or in the comments.

Tutorial

If you’re on Windows, I would recommend installing Cygwin to bring the power of a real shell to your OS. Let us start with a simple example and build upon it:

find . -name "*.css"

This will list all CSS files (and directories ending with ".css") under the current directory (represented by "."). We only want to match files so we’ll go ahead and change it to this:

find . -type f -name "*.css"

Now we will only match CSS files. Nothing special? Fine, I see how it is. Let’s find all CSS files that do something with your HTML ID #content next:

find . -name "*.css" -exec grep -l "#content" {} \;

Here we combine find with grep (covered in detail later) using the
We’re starting to get productive now, so let’s keep going. Suppose now we want to change every reference to the color #FF0000 (red) to #00FF00 (green). Normally you would have to have your editor search and replace them, if it even has that capability. Even then it’s slow, this statement is fast:

find . -name "*.css" -exec sed -i -r 's/#(FF0000|F00)\b/#0F0/' {} \;

Gasp! Wait a minute, I just searched for both ways to specify red and replaced it with green in my CSS!! How long would that have taken otherwise? Do you see now how you can code faster by automating it and combining powerful tools? Let’s look at some other cool search options find has to offer:

Other Examples

# find files changed in the last 1 day
find . -ctime -1 -type f 

#find files larger than 1 Mb in /tmp
find /tmp -size 1M -type f

#find files newer than main.css in ~/src
find ~/src -newer main.css

Conclusion

By itself, find is only as good as say… Google Desktop. The real power, as with other shell tools, is the ability to combine with other tools seamlessly. Effective use of tools like find very often make the difference between an average programmer and one that is 10x more effective (actual multiples up for debate).

These are just some of the basic features of find. Take advice from Chris Coyier and use your new power responsibly. Find is a beautiful tool.

-Whoa this is one unique post!-

Responses (83)

  1. For anyone who’s developing in Linux I suggest you get to know your command line very well. This is a perfect example of how commands can make doing simple stuff much easier.

    Find works extremely well and so does grep.

  2. Awesome article. I use Cygwin religiously for basic WHOIS and DNS stuff. This will help me branch out more with it.

    Thank you Eric!

  3. Oh, and I also think that Kim Kardashian is a beautiful tool too!

  4. You’re hilarious, David! Yeah once I modified and checked in several hundred files with this stuff, which otherwise would have taken an entire day took only 10 minutes :)

  5. Good article, but if I may paraphrase Henry Spencer; “Those who don’t understand xargs are doomed to use exec, poorly.”

  6. Yes, find (and all the other unix tools are great).

    Beware though, the following example from above also replaces “#F00FFF” (some pink):
    sed -ie ’s/#(FF0000|F00)/#0F0/’

    You may want to use “sed -r” and “\b” to match a word delimiter:
    sed -ier ’s/#(FF0000|F00)\b/#0F0/’

  7. @Michael and Daniel:
    Thanks for the updates. Sorry I was careless about the color thing, I think just adding [ ;] after FOO) would also work. I think the readers get my point about using regex to match multiple things, though. Thanks!

  8. Some Dev in Holland

    But how should you change your examples if you have your files in a subversion checkout-folder where there are many folders named “.svn” in which you want nothing to be changed?

  9. Nice to see find getting some love from the Windows crowd; now let’s streamline one of those examples a bit. This should work fine as Cygwin is a port of the GNU utils…but I’m assuming that pipes work fine (and had dang well better. ;-)

    find . -name “*.css” -print0 | xargs -0 grep -l “#content”

    I know that looks longer, but this should be a lot faster. Here’s why. In the original example, find executes grep for each and every file. In the second, find executes, then passes the info to xargs, which passes the list on to grep as arguments. The -print0 changes spaces and newlines into s, and xargs uses -0 to change s into spaces and newlines. If you don’t use -print0 xargs will treat space-separated words into separate arguments.

    It may not sound like much, but if you’re processing thousands of files, it can make the difference between seconds and minutes, if not worse.

  10. @regeya:

    Thanks for giving an example using xargs. I knew it was better but I thought I should stick with the features of find for this post.

    Would you mind breaking each part of xargs down for the readers. If not I will probably have a separate post for it ;)

    I’m not in a place where I have Windows but I’m sure you all who do have Windows can verify.

    Thanks again!

  11. You don’t need find to do a grep across files…
    grep “#content” *.css
    will do fine.

    The advantage of using find here is that it will search sub folders as well - something grep will not do by itself.

  12. With zsh’s recursive globbing you can do recursive grepping without find.

    grep “#content” **/*.css

  13. @dok:

    Nice! I have not really used zsh so very cool!

  14. addition:

    grep -r –include ‘*.css’ “#content” .

    also works without zsh …

  15. You can do logic in find. “!” negates (needs to be escaped in most shells), “-a” means AND, “-o” means OR, and “(” and “)” are for grouping (also need to be escaped in most shells).

    Many finds also provide a -path option in addition to a -name, where the wildcards apply to the path instead of just the filename.

    So to eliminate SVN directories, something along the lines of:

    find . \! -path ‘*.svn*’ -type f

    Using \!, -a, -o, \( and \), and the other options can be fiddly; it’s generally a good idea to work in a small subtree and grow your find command one expression at a time, or at least to pipe the output through more or less so that you can easily abort the search.

    When using xargs, it’s important to use the -print0 and -0 trick as described above, unless you ensure that find doesn’t return files with whitespace (”\! -name ‘* *’” or somesuch) embedded in them. Some finds (Solaris comes to mind) have +exec in addition to -exec, which turns out to be much faster than find+xargs, but does the same thing.

    Of course, if all you’re looking at are filenames, “locate” is even faster; “locate `pwd`” is much faster than “find . -print” on nontrivial file trees (unless you’ve made recent changes), because the file tree structure has been cached.

  16. Paul Childs

    Thanks for the article. I’m new to Linux and I found it helpful. It lead me to do a little Googling and I found this http://www.kalamazoolinux.org/tech/find.html

  17. grep will do recursive searches just fine with the “-r” flag.

    I use this pretty often:
    grep -Hrn

    -H prints the filename
    -r does a recursive search from the path
    -n prints the linenumber of each hit

  18. grrr stupid last comment got messed up. lets try again:

    grep will do recursive searches just fine with the “-r” flag.

    I use this pretty often:
    grep -Hrn PATTERN PATH

    -H prints the filename
    -r does a recursive search from the path
    -n prints the linenumber of each hit

    grep will also print each line there is a hit. ie-
    grep -Hrn css .
    ./website/install.html:9:>link href=”install_styles.css” rel=”stylesheet” type=”text/css”<

  19. @SJS and Mark:
    Thanks for the additional info! I plan on having separate posts for grep and sed, etc.

    One of my favorites is grep -inH …

    @bogdan:
    Ack sure is sweet. I don’t mention it yet because this tutorial is aimed more at command-line beginners. I encourage those savvy enough to check out ack though :)

  20. I would check out REBOL (http://www.rebol.com) for this type of thing — it’s a very tiny shell and doesn’t require cygwin. Here’s an example.

    files: read %./
    remove-each f files [%.css suffix? f]
    foreach f files [if find f "#content" [print f]]

    More info: http://www.rebol.com/index-lang.html

  21. David Karr

    @Some Dev:

    Using pipes and xargs, excluding certain patterns is easy, but this technique can’t use “print0″:

    find . -name “*.css” | grep -v \.svn | xargs grep -l “#content”

  22. Please have a look at ack at http://petdance.com/ack/

    Your first example:
    $ find . -type f -name “*.css”
    $ ack -f –css

    The next:
    $ find . -name “*.css” -exec grep -l “#content” {} \;
    $ ack -l –css “#content”

    The find/grep combinations here are as old as Unix, and belong in everyone’s toolbox. ack takes the drudgery out of them, and it’s a single program you can drop in a ~/bin anywhere, no compilation necessary.

  23. Actually, if you’re on Windows, I recommend PowerShell to get a ‘real shell’. People still use cmd.exe?

  24. You can use perl instead of sed:

    find . -name “*.css” | xargs perl -pi.bak -e ’s/#(FF0000|F00)\b/#0F0/gxim’

    Having the .bak will create a backup of each fill with .bak appended to the file name. This isn’t necessary (nor suggested if you have good revision control).

  25. A couple of points.

    1. You don’t necessarily need xargs for efficiency.
    The POSIX spec has defined a syntax for find to
    execute a command on sets of paths like xargs does.
    In your example you do:
    find . -name “*.css” -exec grep -l “#content” {} \;
    Changing the last ; to + means the args are processed in sets:
    find . -name “*.css” -exec grep -l “#content” {} +

    2. Someone asked for how best to ignore .svn directories etc. You might find mym findrepo script of use:
    http://www.pixelbeat.org/scripts/findrepo

    very pretty site BTW

  26. @Garth and Padraig Brady:
    I did not know either of those things. Great information, thanks!

  27. ‘find’ can be dangerous if applied incorrectly… or correctly, for that matter.

    About ten years ago, I started working for a Very Big Corporation. Their web site was, at that time, several thousand static pages - no includes, no java, very little CGI.

    At about the same time that I started, they also acquired an intern, who happened to sit a few desks away from me. She was given the tedious task of going into each and every HTML file on the system (a printout listing them had been thoughtfully provided) and changing the copyright date on each of them.

    “Why not just do a global replace?” I said to the intern, after she’d been spending a day or two on the project. I sat at her terminal, typed out a “find” command, and piped it to “sed”. “There - you’re all done, and it took two minutes.”

    The next day, I got called on the carpet in the boss’s office. As it turned out, this had been a busywork project - they now had absolutely nothing for the intern to do, and didn’t want to give her any real responsibilities or let her anywhere near the actual web content… I had ruined a plan that was meant to keep an inexperienced person busy for weeks.

  28. I am learning Groovy right now and it seems to me that the -exec option to find is an example of a closure. Any thoughts?

  29. yo you have a cool blog mayne

  30. @Matt:
    Yeah, cuz it really sucks when those interns learn something and become productive ;)

    @John:
    Groovy is awesome. I would say -exec is not much of a closure because I think it can only operate on one command, but you can have multiple -exec(s). Since anything can be piped to anything, pretty much everything on the command-line is a sorta-closure, right?

    @branvs:
    Thanks!

  31. great blog. and yes, find is a beautiful tool. i deal with unix everyday and i can attest to it being a really useful command
    [howshouse.com]

  32. If your on Windows, just fire up VS.Net and use CTRL+H.

    Much easier!

  33. ^
    |
    |
    LOL.

  34. Re: skipping .svn directories. I use the -prune option:

    find . -path ‘*/.svn’ -prune -o -type f -print

  35. ” think it would be useful to provide simple tutorials for these powerful tools, starting with find. I hope you agree, and would appreciate your feedback via the contact page or in the comments.”

    YES PLEASE! This tutorial was great! Especially for a beginner like me. Incredibly helpful - hope to see some more soon!

  36. You should use “sed -ri” instead of “sed -ier”. The latter uses “er” as the backup extension, so you’ll get a bunch of “*.csser” files.

    Of course you can use find to delete them. :-P

  37. @Thomas and Paolo:
    Good things to know! I personally use the -prune option as well.

    @inlimbo:
    Glad you liked it. Another one is in the works :)

  38. Duh. Pádraig Brady is right… is “-exec command {} +” instead of “+exec…” — I blame xpilot, which uses + instead of - for an option to toggle behavior.

    Wait, that’s not right. I blame my brain for being so easily reprogrammed by xpilot. It’s not xpilot’s fault.

    Kudos to Brady for getting it right.

  39. Rich Kraft

    Two other options that are important to know are -mtime and -mmin which will find files modified in a specified time range.

    “find . -type f -mtime -1″ will find all files modified in the last 24 hours.

    “find . -type f -mtime +1″ will find all files modified more than 24 hours ago.

    -mmin works the same way, but in minutes, not days.

  40. @Rich:
    Yes, those are useful. I didn’t want my post to just be a giant list of find statements so I had to cut those out. Perhaps I’ll have a follow-up “advanced” post.

  41. don’t you mean “find /tmp -size +1M -type f”?

  42. @allyn:
    In all of my environments, the “+” before 1M is not necessary. Is this not the case in your shell?

  43. Mark Wilden

    @Mark: “grep will do recursive searches just fine with the “-r” flag.”

    The problem with grep’s -r option is that if none of the “pattern” files are found in the current directory, grep stops looking.

    In other words, if I do ” grep -Hrn playlist *.rb”, it won’t return anything if there aren’t any .rb files in the current directory, even if there are .rb files in subdirectories.

  44. Cygwin’s a bit heavy if you only want a few GNU utilities. Check out UnxUtils: http://unxutils.sourceforge.net/

  45. … whoops, I forgot to add “If you’re using a windows environment”

  46. Nice tricks here, but for changing file internals you should really look into perl oneliners. They are much more concise and readable than the find -exec | sed alternatives.

  47. How would find’s options be configured to find extensionless files? I’ve tried using ls -various -commands | grep regex but can not seem to output files lacking extensions. I’m using Fedora Core 6 and have been for the past 2 months. I’ve come up with a method that I think is overkill - using a Bash script with file name parsing [FILE##], etc.

  48. @Gabe: find -type f ! -name “*.*”

  49. @Pádraig Brady

    Thanks a lot. That is one efficient solution. I either didn’t read the entire man for find, skipped the negation, or not bright enough to negate what was found.

  50. @Mark Wilden: “..if I do ” grep -Hrn playlist *.rb”, it won’t return anything if there aren’t any .rb files in the current directory, even if there are .rb files in subdirectories.”

    Just use grep -Hrn playlist . | grep .rb

    Admittedly, that might be slower than
    find . -name “*.rb” | grep -Hn playlist

  51. Mark Wilden

    @John: grep -Hrn playlist . | grep .rb

    That would work, but as you say, it would be slower than using find. Potentially, much, much slower. I don’t know if grep is smart enough to ignore binaries, but even if it is, I don’t want it searching gigabyte-sized log files.

    I don’t really understand why -r doesn’t work “properly.” If I ask grep to search in folders under and including the specified folder, what does it matter that there are no .rb files in some of those folders? grep is saying “if a directory doesn’t contain any of the specified files, I’ll assume none of its subdirectories do, either.” Doesn’t make sense to me.

    ///ark

  52. Sadly, sed -i doesn’t work on Solaris :-( I use perl -e s///g -i instead.

  53. (1) find is extremely useful when I have 1000’s of logfiles.

    for file in `ls *.log` # won’t work if too many files
    do
    rm $file
    done

    This is because the shell expands `ls *.log` into 1000’s of individual arguments, causing this to barf.

    A better solution is:

    find . \( -type d ! -name . -prune \) -o \( -name “*.log” -print \) | while read file
    do
    rm $file
    done

    (2) I often use find along with perl to do mass replacements:

    find . -type f -name “*.txt” | xargs grep -l FOO | xargs perl -i -p -e ’s/FOO/BAR/g’

    Cheers, Neil

  54. David Karr

    @Neil Laurance: Concerning your example 1, I would have done that this way:

    ls *.log | xargs -n100 rm -f

    Much more concise.

  55. [ack](http://petdance.com/ack/) is a beautiful tool

  56. It’s amazing

  57. Прелестно)

  58. I totally agree about find. I use it all the time in bash scripts and like to combine it on the same line with grep. It can be pretty powerful, especially once you learn all the arguments for it.

    Explore, Learn, Review: (programming languages)
    http://www.codesplunk.com

  59. О! Тема :)

Trackbacks

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>