grep is a beautiful tool

Global Regular Expression Print is a staple of every command-line user’s toolbox. As with find, it derives a lot of power from being combined with other tools and can increase your productivity significantly.

Following is a simple tutorial that will help you realize the power of this simple and most useful command. If you are on Windows and haven’t already, download and install Cygwin. If you are also new to regular expressions (regex), here is a great regular expressions reference to get you started.

Tutorial

Suppose we want to search for duplicate functions in all of our JavaScript files. Let’s start basic and work up to it. This technique can be used to search for a TON of duplicate items like:

  • Duplicate HTML IDs
  • Check how many times a CSS class is used
  • Duplicate java classes
  • many, many more…
# Search JS files in this directory for "function"
grep function *.js

The above command will print the lines containing "function" in all JavaScript files in the current directory (NOT subdirectories). Printing out line contents would be much more helpful if we knew what files they come from and their line numbers:

# Print filenames, line #s, and lines that start with "(white space)function"
grep -EHn "^s*(function w+|w+ = function)" *.js

Depending on how you format your JavaScript files, something like this will omit comments, anonymous functions, and also words like "functionality" giving you better results.

# Print a list of: function <function-name> and sort it
grep -Eho "^s*function w+" *.js | sort

-o prints only the part that matches the regular expression. -E options gives me extended regex and -h suppresses printing of the file name. I am then piping to sort which just sorts the output so it a list of function <function-name>. If you don’t have a lot of files/functions to go through, you can just scan the list and then note the duplicate function names you see. Let’s go a step further for those that DO have a big list:

# Print only duplicate function names
grep -hEo "^s*function w+" *.js | sort | uniq -d

There we go! That will list only the duplcated functions. I know that we can expand this with awk or other stuff and get the file names and line numbers of the duplicates, but I don’t want to explaining the details of awk ;). I actually had it in this article and then removed it so leave a comment or contact me if you want the code for that.

Other Examples

# Count the number of functions in all JS files
grep -c function *.js

# Print lines that DO NOT have "function"
grep -v function *.js

# List processes that match "pidgin" (non-Windows)
ps -ef | grep pidgin

Conclusion

grep is one of the most used command-line tools, often piped to for filtering output. Understanding it is essential to increasing productivity on the command-line. There is so much more to grep than what I’ve shown here, and it would be cool to see your best uses in the comments!

Welcome to Eric Wendelin’s Blog! You’re guaranteed to get more quality tutorials like this one easily if you subscribe via RSS.

If you liked this post, please help me share it

Responses (68)

  1. Binny V A says:

    Waiting for the sed and awk posts :-)
    Here are some more useful grep commands

  2. Here is an interesting trick. It highlights the match in the results, and makes “grep foo *” less noisy:

    alias egrep=’egrep –color=tty -d skip’

    alias fgrep=’fgrep –color=tty -d skip’

    alias grep=’grep –color=tty -d skip’

  3. [...] Now that you’ve mastered find, programmer Eric Wendelin describes several ways in which grep (Global Regular Expression Print) can make you more productive at the command line. [...]

  4. [...] Now that you’ve mastered find, programmer Eric Wendelin describes several ways in which grep (Global Regular Expression Print) can make you more productive at the command line. [...]

  5. sharfah says:

    Here is my rgrep (recursive grep) script which I use all the time to grep within subfolders:
    {code}
    dir=.
    pattern=”$*”

    dirscan()
    {
    grep $pattern $dir/*
    for file in $dir/*
    do
    if [ -d $file ]; then
    dir=$file
    dirscan
    fi
    done;
    }
    dirscan
    {code}

  6. andrea says:

    Great tutorial.
    A stupid question: which wordpress plugin do you use for syntax highlightining?

    Thank you,

    Andrea

  7. Saibot says:

    sharfah, couldn’t you accomplish the same functionality with:
    find . -name <filenameSnippet> -exec grep <searchParameter>
    as described in Eric’s last post?

  8. [...] 14, 2008 · No Comments The title of this blog post is oh-so-very true, “grep is a beautiful tool.” In his post, Eric Wendelin has put [...]

  9. ohxten says:

    Heck yes grep is an awesome tool. It’s especially invaluable to when searching for patterns in a bunch of different code files.

  10. naf says:

    sharfah:

    try this instead:
    find ./ -exec grep pattern {} \;

    ##example – find all js files containing “function”. show the filename and the line on which the find occurred

    find ./ -name \*.js -exec grep -Hn function {} \;

  11. Vulcan Eager says:

    Andrea, click on the ‘?’ on the top-left of any code block to get the name of the plug-in.

  12. David Byron says:

    @Andrea re Syntax Highlighting in Wordpress:

    It looks as if he’s using

    http://wordpress.org/extend/plugins/syntaxhighlighter/

    For syntax highlighting.

  13. Vulcan Eager says:

    Sorry, not top-left; the end of the header line of each code-block.

  14. @Binny and Paolo:
    Thanks for the input! I actually alias “grep -color=tty” to “g” because I use grep so much.

    @Andrea:
    I use SyntaxHighlighter-plus. The only problem with it is it includes a lot of javascript files and can be a bit slow at times.

    @naf:
    Yes! You can check out more find stuff at my “find is a beautiful tool” post

  15. Ha! Three comments about it while I was answering. Thanks for your input and the link!

  16. [...] Now that you’ve mastered find, programmer Eric Wendelin describes several ways in which grep (Global Regular Expression Print) can make you more productive at the command line. [...]

  17. Benjamin says:

    sharfah, you could also just do:
    find . | xargs grep $pattern

  18. David V. says:

    Maybe oen of you can help me figure this out.

    I’ve been trying to run a grep -io on various text files. It works very erratically. In my example text, it’ll always find some words, and never find others. It works fine with a grep -i or a grep -o, but I can’t get it to consistantly work with a grep -io.

    Any ideas would be greatly appreciated.

  19. pollian says:

    What’s the difference between the find-based commands people are recommending and grep -l (which returns filenames), or in Sharfah’s case grep -rl which returns file names and searches recursively.

  20. andy says:

    i also find combining find with xargs to be very powerful. for example:

    $ find . -name \*.c | xargs grep function_to_find

    is a basic recursive grep on C source files. i typically don’t use the -exec flag to find because i don’t like the syntax. sometimes, though, it can’t be avoided, like when you’re passing a command to a function in bash and the `|’ in the pipeline will just confuse the heck out of the poor function.

    another one of my favorite flags to grep is the -o flag, which says to print only what matches the specified regex. for example, to print all anchor tags in index.html:

    $ grep -o ‘<a>]*>’

    will print only the anchor tags, not the whole line. also, it will find multiple anchor tags on a single line. neat!

    also, one of my favorite thing about Cygwin is that its grep supports the -P option, which enables perl-compatible regexes. i can’t find a way to get a -P grep on other *nixes, though, like Ubuntu, without compiling my own grep. anyone know the special sauce to install a -P grep from a package manager?

  21. seems to be a helpful tool, but is it just for linux?

  22. rp says:

    use Perl;

    I mean it. You get all of grep, including -P, a lot of awk, and a lot more specific to Perl, with just about the same syntax. Why settle for less??

  23. naf says:

    @pollian:
    you are right, grep -rl would do the same thing.

    @United Voices:
    grep is available “native” on linux and every other unix distribution, including Mac OS X. for windows, you can find 3rd party grep programs for the command line, or better yet, download Cygwin to have a very unix-like shell in windows.

    @rp:
    i find the quick and dirty syntax of grep, combined with sed or awk to be much nicer than the command line syntax of perl, particularly for simple one-off tasks. for anything more complicated, perl is awesome. also, youd need to use something like Find.pm in order to do the recursive search, which would require a script file or a much more complicated command line. keep in mind that many of the people reading and commenting on this forum aren’t familiar with grep or other command line utilities, so the jump to perl might be too much learning curve right off the bat.

  24. [...] Не зря я начал читать Regular Expression HOWTO и попалась мне статья grep is a beautiful tool [...]

  25. @rp:
    Perl is definitely more powerful should one be savvy enough for it. The problem is that it may be tougher to fully grasp for beginners and may not always be available. Personally, I love Perl because it’s ugly like my puppy but a lovable kind of ugly :)

  26. Rick says:

    If you’ve using grep on code, you want ack instead.

    http://petdance.com/ack/

  27. Rick says:

    What’s with all the grep recursion helpers (rgrep function and find). grep has a -r, people. :)

  28. @Rick:

    Mark pointed out the caveat to “grep -r” in his comment on the find post:
    http://eriwen.com/productivity/find-is-a-beautiful-tool/#comment-739


    The problem with grep’s -r option is that if none of the “pattern” files are found in the current directory, grep stops looking.

    In other words, if I do ” grep -Hrn playlist *.rb”, it won’t return anything if there aren’t any .rb files in the current directory, even if there are .rb files in subdirectories.

    grep -r is cool but not always does what we want.

  29. Rick says:

    Ah, right. Good point. Ack is good for this, too.

    ack –ruby playlist

  30. [...] – grep is a beautiful tool “Global Regular Expression Print is a staple of every command-line user’s toolbox.” [...]

  31. [...] grep is a beautiful tool (tags: linux grep shell bash reference commands tips tools) [...]

  32. blackbelt_jones says:

    Thanks for the tutorial! Here’s the only use of grep I know, for thinning out apt searches

    apt-cache search term1 | grep term2

  33. Guest says:

    For recursive grep use the -r option

  34. Michael says:

    If you want a better way at grep-ping processes, use pgrep. Should be installed by default in Linux distros. Say you’re looking for process(es) which contains the string “apache”, you do a:

    pgrep -f apache

    That’ll display all matching PIDs which have apache in the process string. There are a lot more args you can play with.

    Have fun!

  35. Matt says:

    the p in grep stands for parser, not print.

  36. @Guest:
    Yes, -r is cool but see the caveat I posted in a comment above.

    @Michael:
    Yes, I think this does come on the latest Ubuntu but you have to install it specifically in Cygwin.

    @Matt:
    I thought that too, but Wikipedia set me straight. I trust it just a bit more than myself

  37. David C says:

    big problem with grep: avoid non-text files (especially searching without specifying file extensions). My solution is a function named txt that uses file(1) to test the textness of a file:
    ========
    txt ()
    {
    if [ -z "$*" ]; then
    file -f -;
    else
    file $*;
    fi | awk -F: ‘$2 ~ /text/ {print $1}’
    }
    ========

    then insert this between a find(1) and while loop containing grep:

    ========
    find . -type f | txt | while read f
    do
    grep -H patt $f
    done

  38. Michael says:

    @Eric: Not sure how pgrep works on Cygwin. Haven’t tried it yet :) But I know it comes installed in Ubuntu, RHEL3/4.

  39. anon says:

    To avoid matching NON-TEXT files, just use the -I flag.

  40. [...] Quickies: grep Filed under: Linux — 0ddn1x @ 2008-07-15 20:02:17 +0000 http://eriwen.com/tools/grep-is-a-beautiful-tool/ [...]

  41. fsdaily.com says:

    Story added…

    This story has been submitted to fsdaily.com! If you think this story should be read by the free software community, come vote it up and discuss it here:

    http://www.fsdaily.com/HighEnd/grep_is_a_beautiful_tool...

  42. rp says:

    Yes, Perl doesn’t have -r, but otherwise it’s pretty much drop-in compatible with grep (and available, too). But I didn’t know about ack! Thanks.

  43. [...] Now that you’ve mastered find, programmer Eric Wendelin describes several ways in which grep (Global Regular Expression Print) can make you more productive at the command line. [...]

  44. Pascal says:

    I completely agree; I use grep for everything! Great site.

    Check this out for learning languages:
    http://www.codesplunk.com

  45. [...] grep is a beautiful tool (tags: commandline search tools reference) [...]

  46. [...] grep is a beautiful tool (tags: grep linux shell tutorial bash) [...]

  47. [...] Some terminal commands spit back a bit too much information, and that’s where grep comes in. Need to manually kill a faltering Thunderbird? Punch in ps aux | grep bird, and you’ll get back the specific number to kill. Need to know which files don’t have your company name in them? grep -v DataCorp *.doc. Programmer Eric Wendelin explains grep more in-depth. [...]

  48. [...] Some terminal commands spit back a bit too much information, and that’s where grep comes in. Need to manually kill a faltering Thunderbird? Punch in ps aux | grep bird, and you’ll get back the specific number to kill. Need to know which files don’t have your company name in them? grep -v DataCorp *.doc. Programmer Eric Wendelin explains grep more in-depth. [...]

  49. [...] should now be getting pretty proficient with sed. Use it along with find and grep and you will find yourself feeling much more comfortable on the [...]

  50. prasanna says:

    hi,

    i am trying to get all the classes in a .css file.

    I can able to print all the lines having the classes, but i need the exact word.

    Is it possible to print the words.

    The command which i used to print the words is

    grep -E “\\.[[:alnum:]]*\s*,|{” Style.css | grep -v “^#” | sort | uniq -d

    prasanna.

  51. [...] grep – הסבר ודוגמאות. [...]

  52. [...] we want to split into several files depending on the value in the last cell. We could do this with grep, or awk (coming soon), but with sed we can do it with more [...]

  53. [...] Now that you’ve mastered find, programmer Eric Wendelin describes several ways in which grep (Global Regular Expression Print) can make you more productive at the command line. [...]

  54. [...] Find is a beautiful toolgrep is a beautiful toolWhat I wanted to know before I left college: A programmer reflectsEarly Adopters: Your Firefox 3 [...]

  55. Vasudev Ram says:

    Since someone above asked, for non UNIX/Linux platforms – at least Windows – an alternative to installing Cygwin – which is somewhat big – if you don’t need the rest of it’s functionality, is to search for and download any one of a number of DOS / Windows grep clones. One simple way to get a good one is to download Turbo C 2.x from the Borland Museum (Borland’s development tools division is now CodeGear which is now part of Embarcadero Technologies). Google for “Borland Museum” or search around on CodeGear.com to get the download link. Turbo C comes with a GREP.COM utility – though it was for DOS, it still works at the Windows COMMAND prompt (which is basically a DOS emulation in Windows). And it has a “-d” option, IIRC, which is equivalent to “grep -r”. You can delete the installed Turbo C if you don’t want it, after copying the GREP.COM somewhere else in your PATH.

    Similarly, I’ve found and used DOS/Windows clones of awk and sed. The GNUWin32 project also has a collection of ports (to Windows) some of the classic UNIX command-line tools, IIRC. For example, here is its port of grep:

    http://gnuwin32.sourceforge.net/packages/grep.htm

    - Vasudev Ram
    http://www.dancingbison.com

  56. [...] you are there check out some of his other articles. There are some great ones on grep, sed, awk, and lots [...]

  57. henrikb says:

    Don’t think anyone mentioned the –include option of grep, which avoids matches in unrelated files such as svn-stuff, etc.

    grep -r ~/src --include=*.java.

    You can include multiple –include= statements and the complment –exclude is also available.

    As a side note, for Emacs users, there is a nice *grep* buffer which contains the result of your last grep, provided you execute via ‘ESC x grep’. When running grep like this from Emacs, you also have access to a searchable(!) grep-history with your previous grep commands. You navigate in the grep history by using ESC p (previous), ESC n (next) and ESC r (reverse search).

  58. henrikb says:

    Noticed that my example was a bit unclear, so here is a clarified version

    grep -r -n --include=*.java synchronized ~/src

    which finds all occurences of the synchronized keyword.

    Since I love to promote Emacs as the perfect editor, I should also mention that you, ofcourse(!), can grep for the word under the cursor by supplying an argument to the ‘ESC x grep’ command. This is easily done by executing ‘C-u M-x grep’, which replaces the pattern of the last grep command with the word under the cursor. Very efficient…

  59. [...] This blog post offers a great introduction to grep. I’m definitely going to have to give it a close read, and try to improve my skills with this tool. [...]

  60. richard says:

    I would like to have a conversion doc to transcribe from OS 9 arabic fonts to OS X arabic fonts. I wrote a page of equivalences a=^ and in BBedit noticed grep has a base there but how would I write a command line such that all the individual letters change at once? Or can i pay someone to write it?
    thanks

  61. [...] grep is a beautiful tool – Eric Wendelin’s Blog Global Regular Expression Print is a staple of every command-line user’s toolbox. As with find, it derives a lot of power from being combined with other tools and can increase your productivity significantly. Following is a simple tutorial that will help you realize the power of this simple and most useful command. (tags: reference howto tutorial linux unix bash shell commands grep) [...]

  62. Bob says:

    Is there a way to search multiple file extensions without having to type multiple ‘–include’ statements? It would be nice to be able to do something like this:

    grep -r –include=*.pas;*.dfm ‘object’ .

    But this doesn’t seem to work. Any insights?

  63. Vasudev Ram says:

    Actually, on UNIX / Linux and variants (and maybe also on Windows with Cygwin – though you may have to test that), this works for what Bob wants (and is simpler, since you don’t need to use find with grep for this):

    grep ‘object’ *.pas *.dfm *.any_other_number_of_extensions_or_patterns

    # Doesn’t even have to be a filename extension, e.g.:

    grep ‘object’ *abc* def* *ghi jkl*mno* # and so on …

    And this – i.e. the ability to pass any number of any kind of wildcard patterns in the arguments to a command – works for ALL commands at the *nix command line – the reason for it is a little known fact – that the wildcard (and more generally, *nix metacharacter expansion) is handled by the _shell_ (bash, ksh, etc.) – BEFORE the shell invokes the command at all. That is, the shell does all the expansions of metacharacters and then invokes the command with the expanded command line. (Contrast this with DOS where the wildcard expansion is handled by individual commands and so works only for those commands which implement it internally.) See Kernighan and Pike – The UNIX Programming Environment – for more on this.

    HTH
    Vasudev

  64. Vasudev Ram says:

    find with xargs and any other command is quite powerful, though, of course.

Leave a Reply