awk is a beautiful tool

AWK is a very powerful programming language that we can use on the command-line for advanced text processing. I’d like to provide a guide so you can get started using it. I’ll be covering the basics of AWK (named after Alfred Aho, Peter Weinberger, and Brian Kernighan) and provide some useful examples.

Tutorial

To best introduce awk I’d like to start with a practical example. Most of the applications for awk that I’ve dealt with involve formatting some output or data into something cleaner and more usable. This is certainly not the limit of awk, it is a full fledged language with all the power and responsibility to go with it.

Awk operates on one "record" at a time, which is each line by default. Each "field" in a record is separated by a space (by default) or another defined separator (using the -F option).

We’re going to print the file names, line-numbers, and function names of all duplicate functions so they’re super easy to find and remove. Suppose we have some output from grep with file names and line numbers using a command like this (pulled from my grep tutorial):

# Prints javascript functions like this - <file>:<line-num>:<line-content>
grep -EnH "^\s*function \w+" *.js | sort

That’s all good and well, but this doesn’t quite give us what we want in a clear manner, it prints all functions and information in a kinda cludgy fashion. We can clean this up with a bit of awk. First let’s learn a basic awk command:

awk '/'^$'/ { print "blank line" }' myfile

This snippet will print "blank line" for line that matches the regex: ^$ (a blank line). The pattern (between the two slashes, inclusive) is optional and we’ll see in the next example:

awk 'BEGIN { print "Hello, awk!" } { print $2, $1 } END { print "Goodbye, awk!"}' myfile

How this reads is: Before processing (BEGIN), print "Hello, awk!" and then print the second field ($2) and then the first ($1) for each line of myfile, then after processing (END) print "Goodbye, awk!". The BEGIN and END clauses are optional.

As explained above, a field is a sequence of non-whitespace characters. So if a line contained "foo bar other stuff", awk would print "bar foo".

One more thing before continuing: awk scripts can get ugly so it is useful to know you can read a file for awk commands:

#My awk file: foo.awk
BEGIN { print "Hello, awk!" }
{
    print $2, $1
    # This is an awk comment
}
END { print "Goodbye, awk!" }

#Invoke foo.awk
awk -f foo.awk myfile

OK, let’s add some power to that old grep command. We can use the -F option to specify our delimiter so let’s add to our grep command like so:

grep -Eoni "^\s*function \w+" *.js | awk -F ':' '{print $3," ",$1,$2}' | sort

Now we have a sorted list of function names followed by their filename and line number by spaces. We needed to print $3 first so that we could easily sort by the function name and look for duplicates, but we’re not going to do that manually… oh noooooo way.

Let’s make it a bit prettier and uniq-ify our list of functions by function:

# New part of our command
awk '{print $3,"line",$4,$2}' | uniq -f 3 -D

# Full command
grep -Eoni "^\s*function \w+" *.js | awk -F ':' '{print $3," ",$1,$2}' | sort | awk '{print $3,"line",$4,$2}' | uniq -f 3 -D

Here we pipe our last command back into awk printing <filename> line <line-num> <function-name> and then use uniq on the 3rd field (-f 3) showing only the duplicates (-D).

Other awk examples

#Backup all JavaScript files with a .bak extension -- replace 'bash' with your shell
 ls *.js | awk '{print "cp "$0" "$0".bak"}' | bash
#Print the number of lines that contain "function"
awk '/'function'/ {i = i + 1} END {print i}' myfile.js

Conclusion

AWK is admittedly a ten ton gorilla of a tool, so there is no way I could cover everything that it can do in one post. If there is enough demand I can write about some of the more advanced features like conditionals, variables, and formatting. Stay tuned!

Further Reading

To see how far this rabbit hole goes, you should check out a couple of my favorite references:

Hope you found this informative and want to know more. I know you all probably have some awk gems to share, let’s see them!

-Whoa this is one unique post!-

Responses (17)

  1. Interesting use of awk. I agree, awk is a great tool - I’ve used it a ton over the years on UNIX and Linux - it enhances one’s productivity a lot.

    Minor nitpick - in these words:

    “Suppose we have some output from grep with line names and file numbers”

    it should be “file names and line numbers” :)

    - Vasudev

  2. @Vasudev:

    Wait you haven’t heard of “line names” and “file numbers” before? Just kidding, my editor has been sacked, though. :)

  3. ls *.js | awk ‘{print “cp “$0″ “$0″.bak”}’ | bash

    ???

    # for act in *js; do cp “$act” “$act.bak”; done

  4. @teki:
    That’s the beauty of the command line :). There are many ways to do the same thing.

  5. Great writing, Eric!

    May I suggest my Awk Cheat Sheet?

    It includes:

    Predefined Variable Summary, which lists all the predefined variables and which awk versions (original awk, nawk or gawk) have it built in.
    GNU Awk command line argument summary
    I/O statements
    Numeric functions
    Bit manipulation functions
    I18N (internatiolization) functions
    String functions, and
    Time functions

    There are also Awk One-Liners by Eric Pement, just like with sed. Once you go through those, you basically learn Awk.

    And I see that you added me to the list of the blogs you follow! Thanks :)

    Peteris

  6. Nice post Eric. Definitely awk is a great asset to have in your toolbox.

    And, yeah … nice site(blog) design :) good work.

  7. @Peteris:
    Thanks! I hadn’t gotten around to check out all of your site yet, so thanks for the heads up, I’ll add it to the list in the article :)

    @xk0der:
    Thanks, I’ll glad you like it!

  8. Eric, thanks for linking!

    ps. I also wrote a youtube video downloader in Gnu Awk to learn networking in Gnu Awk and binary file I/O.

  9. @Eric: Or, it may have been a bug in your awk script that switched the words :-) Kidding too.

  10. Nice post :)

    I knew sed and egrep but never knew awk before (may be I never needed it so badly). But I’ll try to use it from now on whenever I’ve deal with columns. Thanks for the beautiful posts, keep it coming.

  11. Awk, sed, perl is the great Unix trilogy.

  12. Here are some more awk (and other UNIX-related) resources that I think are pretty useful and interesting:

    1. The book “The UNIX Programming Environment” by Brian Kernighan and Rob Pike. A classic. Has a lot of material on awk, sed, grep, and many other command-line UNIX tools and how to use them well together (in pipelines and shell scripts), along with the shell (it mainly uses /bin/sh, but most/all of it should work with /bin/bash). I’ve read it more than once, and learned more stuff even on the 2nd and later readings. I once conducted a UNIX training course for a company, and used this book as the main course material. Both the participants and me had a great time - I ended up solving most of the exercises in the book for them (with some help from a few of the students :-), including some marked as * (hard). There was one involving bash that was particularly tricky …

    2. The book “The AWK Programming Language” by the authors of the awk tool itself. (Haven’t actually read this one, it may be out of print, but is likely to be quite good, since all other books I’ve read by Kernighan (and anyone else writing with him), are excellent - and that includes “The UNIX Programming Environment”, “The C Programming Language” and “The Practice of Programming”.

    Link for the book:

    http://netlib.bell-labs.com/cm/cs/awkbook/

    The link above also says this:

    “Alternative books that contain significant amounts of AWK include Effective AWK Programming, 3rd edition by Arnold Robbins (O’Reilly, 2001, ISBN 0-596-00070-7), and Sed & Awk, 2nd edition by Dale Dougherty and Arnold Robbins (O’Reilly, 1997, ISBN 1-56592-225-5).”

    I’ve not read the “Effective AWK Programming” book, but have read some parts of the “Sed and Awk” book - it is also good. If I’m not wrong, Arnold Robbins is/was the maintainer of GNU awk.

    See:

    Link for “Effective AWK Programming” book:
    http://www.oreilly.com/catalog/awkprog3

    Link for “Sed and Awk” book:
    http://www.oreilly.com/catalog/sed2

    About Arnold Robbins (the “Books” tab on that page shows he’s written some other O’Reilly UNIX books too):
    http://www.oreillynet.com/pub/au/459

    3. The books “Programming Pearls” and “More Programming Pearls” by Jon Bentley.
    Either or both of them (can’t remember which, right now) has got somewhat advanced awk material, particularly on awk’s associative arrays - quite powerful code examples in relatively very few lines - which would be a whole lot longer and more difficult to write in, say, C.

    [ BTW - Jon also wrote "Writing Effective Programs", a really good book about performance tuning. It's out of print now, I think, but very worth reading if you can get hold of it. ]

    HTH,
    Vasudev

  13. Oops, mistake in my previous comment:

    >Jon also wrote “Writing Effective Programs”

    That should be “Writing Efficient Programs”.

    (Hmm - has anyone ever written an ineffective program?
    That should be something to see … :)

    I know this is off topic for your post, but since I think that book is really good, writing a bit more:

    Here are some relevant links about the book:

    http://www.hipecc.wichita.edu/bentley.htm

    http://portal.acm.org/citation.cfm?id=539147

    Interestingly, contrary to what one might think (due to the book’s topic being performance tuning), it is not dry or boring at all - there are lots of “war stories” sprinkled throughout it. Two of my favorites are the one about how quicksort on a supercomputer was made many times faster - by tuning at multiple levels of both the hardware and software stacks, and about a guy who made it possible to run programs in very little memory by writing “an interpreter for an interpreter” :)

  14. Hey Erik,

    This is Peter again. Wanted to let you know that I just wrote the 1st part of a 3 part article on the famous awk one-liner file awk1line.txt.

    I will explain all of the one-liners there.

    The post is here:

    Famous Awk One-Liners Explained

    Sincerely,
    Peteris

  15. @Peteris: That article on awk one-liners looks interesting. Scanned it a bit, got to check it out more …

    Coincidentally, I just blogged about an awk one-liner that I came up with for killing a hung Firefox process on Linux. The post is here.

Trackbacks

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>