grep is a beautiful tool

Global Regular Expression Print is a staple of every command-line user’s toolbox. As with find, it derives a lot of power from being combined with other tools and can increase your productivity significantly.

Following is a simple tutorial that will help you realize the power of this simple and most useful command. If you are on Windows and haven’t already, download and install Cygwin. If you are also new to regular expressions (regex), here is a great regular expressions reference to get you started.

Tutorial

Suppose we want to search for duplicate functions in all of our JavaScript files. Let’s start basic and work up to it. This technique can be used to search for a TON of duplicate items like:

  • Duplicate HTML IDs
  • Check how many times a CSS class is used
  • Duplicate java classes
  • many, many more…
# Search JS files in this directory for "function"
grep function *.js

The above command will print the lines containing "function" in all JavaScript files in the current directory (NOT subdirectories). Printing out line contents would be much more helpful if we knew what files they come from and their line numbers:

# Print filenames, line #s, and lines that start with "(white space)function"
grep -EHn "^\s*(function \w+|\w+ \= function)" *.js

Depending on how you format your JavaScript files, something like this will omit comments, anonymous functions, and also words like "functionality" giving you better results.

# Print a list of: function <function-name> and sort it
grep -Eho "^\s*function \w+" *.js | sort

-o prints only the part that matches the regular expression. -E options gives me extended regex and -h suppresses printing of the file name. I am then piping to sort which just sorts the output so it a list of function <function-name>. If you don’t have a lot of files/functions to go through, you can just scan the list and then note the duplicate function names you see. Let’s go a step further for those that DO have a big list:

# Print only duplicate function names
grep -hEo "^\s*function \w+" *.js | sort | uniq -d

There we go! That will list only the duplcated functions. I know that we can expand this with awk or other stuff and get the file names and line numbers of the duplicates, but I don’t want to explaining the details of awk ;). I actually had it in this article and then removed it so leave a comment or contact me if you want the code for that.

Other Examples

# Count the number of functions in all JS files
grep -c function *.js

# Print lines that DO NOT have "function"
grep -v function *.js

# List processes that match "pidgin" (non-Windows)
ps -ef | grep pidgin

Conclusion

grep is one of the most used command-line tools, often piped to for filtering output. Understanding it is essential to increasing productivity on the command-line. There is so much more to grep than what I’ve shown here, and it would be cool to see your best uses in the comments!

If you liked this post, please help me share it
  • Reddit
  • StumbleUpon
  • description
  • del.icio.us
  • Digg
  • co.mments
  • Google
  • Slashdot
  • Technorati
  • TwitThis
  • E-mail this story to a friend!
  • Furl

Responses (56)

  1. Waiting for the sed and awk posts :-)
    Here are some more useful grep commands

  2. Here is an interesting trick. It highlights the match in the results, and makes “grep foo *” less noisy:

    alias egrep=’egrep –color=tty -d skip’

    alias fgrep=’fgrep –color=tty -d skip’

    alias grep=’grep –color=tty -d skip’

  3. Here is my rgrep (recursive grep) script which I use all the time to grep within subfolders:
    {code}
    dir=.
    pattern=”$*”

    dirscan()
    {
    grep $pattern $dir/*
    for file in $dir/*
    do
    if [ -d $file ]; then
    dir=$file
    dirscan
    fi
    done;
    }
    dirscan
    {code}

  4. Great tutorial.
    A stupid question: which wordpress plugin do you use for syntax highlightining?

    Thank you,

    Andrea

  5. sharfah, couldn’t you accomplish the same functionality with:
    find . -name <filenameSnippet> -exec grep <searchParameter>
    as described in Eric’s last post?

  6. Heck yes grep is an awesome tool. It’s especially invaluable to when searching for patterns in a bunch of different code files.

  7. sharfah:

    try this instead:
    find ./ -exec grep pattern {} \;

    ##example - find all js files containing “function”. show the filename and the line on which the find occurred

    find ./ -name \*.js -exec grep -Hn function {} \;

  8. Vulcan Eager

    Andrea, click on the ‘?’ on the top-left of any code block to get the name of the plug-in.

  9. @Andrea re Syntax Highlighting in Wordpress:

    It looks as if he’s using

    http://wordpress.org/extend/plugins/syntaxhighlighter/

    For syntax highlighting.

  10. Vulcan Eager

    Sorry, not top-left; the end of the header line of each code-block.

  11. @Binny and Paolo:
    Thanks for the input! I actually alias “grep -color=tty” to “g” because I use grep so much.

    @Andrea:
    I use SyntaxHighlighter-plus. The only problem with it is it includes a lot of javascript files and can be a bit slow at times.

    @naf:
    Yes! You can check out more find stuff at my “find is a beautiful tool” post

  12. Ha! Three comments about it while I was answering. Thanks for your input and the link!

  13. sharfah, you could also just do:
    find . | xargs grep $pattern

  14. Maybe oen of you can help me figure this out.

    I’ve been trying to run a grep -io on various text files. It works very erratically. In my example text, it’ll always find some words, and never find others. It works fine with a grep -i or a grep -o, but I can’t get it to consistantly work with a grep -io.

    Any ideas would be greatly appreciated.

  15. What’s the difference between the find-based commands people are recommending and grep -l (which returns filenames), or in Sharfah’s case grep -rl which returns file names and searches recursively.

  16. i also find combining find with xargs to be very powerful. for example:

    $ find . -name \*.c | xargs grep function_to_find

    is a basic recursive grep on C source files. i typically don’t use the -exec flag to find because i don’t like the syntax. sometimes, though, it can’t be avoided, like when you’re passing a command to a function in bash and the `|’ in the pipeline will just confuse the heck out of the poor function.

    another one of my favorite flags to grep is the -o flag, which says to print only what matches the specified regex. for example, to print all anchor tags in index.html:

    $ grep -o ‘<a>]*>’

    will print only the anchor tags, not the whole line. also, it will find multiple anchor tags on a single line. neat!

    also, one of my favorite thing about Cygwin is that its grep supports the -P option, which enables perl-compatible regexes. i can’t find a way to get a -P grep on other *nixes, though, like Ubuntu, without compiling my own grep. anyone know the special sauce to install a -P grep from a package manager?

  17. seems to be a helpful tool, but is it just for linux?

  18. use Perl;

    I mean it. You get all of grep, including -P, a lot of awk, and a lot more specific to Perl, with just about the same syntax. Why settle for less??

  19. @pollian:
    you are right, grep -rl would do the same thing.

    @United Voices:
    grep is available “native” on linux and every other unix distribution, including Mac OS X. for windows, you can find 3rd party grep programs for the command line, or better yet, download Cygwin to have a very unix-like shell in windows.

    @rp:
    i find the quick and dirty syntax of grep, combined with sed or awk to be much nicer than the command line syntax of perl, particularly for simple one-off tasks. for anything more complicated, perl is awesome. also, youd need to use something like Find.pm in order to do the recursive search, which would require a script file or a much more complicated command line. keep in mind that many of the people reading and commenting on this forum aren’t familiar with grep or other command line utilities, so the jump to perl might be too much learning curve right off the bat.

  20. @rp:
    Perl is definitely more powerful should one be savvy enough for it. The problem is that it may be tougher to fully grasp for beginners and may not always be available. Personally, I love Perl because it’s ugly like my puppy but a lovable kind of ugly :)

  21. If you’ve using grep on code, you want ack instead.

    http://petdance.com/ack/

  22. What’s with all the grep recursion helpers (rgrep function and find). grep has a -r, people. :)

  23. @Rick:

    Mark pointed out the caveat to “grep -r” in his comment on the find post:
    http://eriwen.com/productivity/find-is-a-beautiful-tool/#comment-739


    The problem with grep’s -r option is that if none of the “pattern” files are found in the current directory, grep stops looking.

    In other words, if I do ” grep -Hrn playlist *.rb”, it won’t return anything if there aren’t any .rb files in the current directory, even if there are .rb files in subdirectories.

    grep -r is cool but not always does what we want.

  24. Ah, right. Good point. Ack is good for this, too.

    ack –ruby playlist

  25. blackbelt_jones

    Thanks for the tutorial! Here’s the only use of grep I know, for thinning out apt searches

    apt-cache search term1 | grep term2

  26. For recursive grep use the -r option

  27. If you want a better way at grep-ping processes, use pgrep. Should be installed by default in Linux distros. Say you’re looking for process(es) which contains the string “apache”, you do a:

    pgrep -f apache

    That’ll display all matching PIDs which have apache in the process string. There are a lot more args you can play with.

    Have fun!

  28. the p in grep stands for parser, not print.

  29. @Guest:
    Yes, -r is cool but see the caveat I posted in a comment above.

    @Michael:
    Yes, I think this does come on the latest Ubuntu but you have to install it specifically in Cygwin.

    @Matt:
    I thought that too, but Wikipedia set me straight. I trust it just a bit more than myself

  30. big problem with grep: avoid non-text files (especially searching without specifying file extensions). My solution is a function named txt that uses file(1) to test the textness of a file:
    ========
    txt ()
    {
    if [ -z "$*" ]; then
    file -f -;
    else
    file $*;
    fi | awk -F: ‘$2 ~ /text/ {print $1}’
    }
    ========

    then insert this between a find(1) and while loop containing grep:

    ========
    find . -type f | txt | while read f
    do
    grep -H patt $f
    done

  31. @Eric: Not sure how pgrep works on Cygwin. Haven’t tried it yet :) But I know it comes installed in Ubuntu, RHEL3/4.

  32. To avoid matching NON-TEXT files, just use the -I flag.

  33. Yes, Perl doesn’t have -r, but otherwise it’s pretty much drop-in compatible with grep (and available, too). But I didn’t know about ack! Thanks.

  34. I completely agree; I use grep for everything! Great site.

    Check this out for learning languages:
    http://www.codesplunk.com

  35. hi,

    i am trying to get all the classes in a .css file.

    I can able to print all the lines having the classes, but i need the exact word.

    Is it possible to print the words.

    The command which i used to print the words is

    grep -E “\\.[[:alnum:]]*\s*,|{” Style.css | grep -v “^#” | sort | uniq -d

    prasanna.

Trackbacks

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>