Get sed savvy - part 2

Now that you know a bit about the Stream EDitor from the last sed tutorial, we are going to expand our knowledge of substitution and line printing with an interesting scenario.

Suppose we want to let someone else know what kinds of functions are in a given Javascript file. Think of it as a simple sort of Javadoc for CSS or Javascript. The way we are going to do this is look at all of the files modified in the last day and then extract the comments out of them and put them somewhere (on a wiki perhaps?). Doing this kind of automation will increase team communication and productivity immensely if done correctly.

Tutorial

Download and install Cygwin if you’re on Windows to follow along.

# Single-line comments - grep is better but we can use sed
sed -n '/\/\/p' blah.js > /tmp/comments.out

# Multi-line comments
sed -n '/\/\*/,/\*\//p' blah.js >> /tmp/comments.out

Now, the sed commands above are tricky so here is how you can understand them: The -n option tells sed not to print anything unless you tell it specifically what to print. The comma [,] in between the two patterns tells sed to match everything between the two patterns, in this case everything between multi-line comments /* and */ and then the p-command prints whole lines that match the pattern space.

We can combine these two commands to streamline a killer process.

# sed script file
/\/\//p
/\/\*/,/\*\//p

# Use the sed script to print all comments
sed -n -f sedscr blah.js > /tmp/comments.out

Now we have a nice little summary of our Javascript files we can post to a wiki or diff with another version to see what was added. Note that the sed print command prints the whole line, so if you have comments at the end of a line you will get the beginning of that line also. Not a perfect solution, but something quick and easy!

Other Examples

# Print lines longer than 80 characters
sed -n '/^.\{81\}/p' myfile

# Delete blank lines
sed '/^$/d' myfile

# Substitution optimized for speed
sed '/Yahoo/ s//Not Microhoo/g' myfile

Conclusion

You should now be getting pretty proficient with sed. Use it along with find and grep and you will find yourself feeling much more comfortable on the command-line.

I encourage you to experiment a bit and use this even in circumstances where you know it’s not necessary, just to get the hang of it. In the long run you’ll end up increasing your productivity by using these most powerful tools.

Get sed savvy - part 1

Today I’ll continue the series on command-line tools for productivity, with sed. Stream EDitor is the most complicated tool so far, an entire language in its own right. It is much too big to cover completely in one post, so I’m going to have a few posts covering the major parts of sed.

The bread and butter of sed is its search-and-replace functionality. Let’s start with that and then throw in some other fun commands.

Tutorial

As with the previous posts, if you are on Windows you’ll want to install Cygwin or one of the various other tools suggested in the previous comments. sed also uses regular expressions so you’ll want to keep your regex reference handy. From Wikipedia:

[sed] reads input files line by line (sequentially), applying the operation which has been specified via the command line (or a sed script), and then outputs the line.

sed 's/#FF0000/#0000FF/g' main.css

We can read this like so: search [s/] for red [#FF0000/] and replace it with blue [#0000FF], globally [/g] in main.css. Two notes here: 1) This does not actually modify the file, but outputs what the file would look like if it did the replace and 2) If we left off the "g" at the end it would only replace the first occurrence. So let’s modify the file this time.

sed -i -r 's/#(FF0000|F00)\b/#0F0/g' main.css

This is an example from the find tutorial that replaces all instances of red with green in our CSS file. The -r option here gives us extra regex functionality. As Sheila mentioned in the find post, -i does not work on Solaris and she suggests something like perl -e s/foo/bar/g -i instead.

Suppose we want to change a whole color scheme though, the best way is to use a sed script file like so:

# sedscript - one command per line
s/#00CC00/#9900CC/g
s/#990099/#000000/g
s/#0000FF/#00FF00/g
...

# use sedscript with -f
sed -i -f sedscript *.css

sedscript is obviously a new file we have created. Note that we don’t quote the commands in the file. Now we have successfully changed our color scheme in our CSS files.

Other Examples

# Trim whitespace from beginning and end of line
# You *might* have to type a tab instead of \t here depending on your version of sed
sed -r 's/^[ \t]*//;s/[ \t]*$//g'

# Delete all occurances of foo
sed 's/foo//g'

Conclusion

You should start seeing how you can make a lot of changes with simple one-liners with sed. Using it effectively can really increase your efficiency with some tasks.

Here are some good references you should bookmark (including this page of course ;)

I’d say that 90% of the time I use sed for search-and-replace, so you’ve got a good start here. As I mentioned earlier, there is a LOT more to sed. Later, I’ll show you how to make deletions, add line numbers to files, print specific lines by line number, and much more. Stay Tuned, and share your favorite one-liners in the comments!

grep is a beautiful tool

Global Regular Expression Print is a staple of every command-line user’s toolbox. As with find, it derives a lot of power from being combined with other tools and can increase your productivity significantly.

Following is a simple tutorial that will help you realize the power of this simple and most useful command. If you are on Windows and haven’t already, download and install Cygwin. If you are also new to regular expressions (regex), here is a great regular expressions reference to get you started.

Tutorial

Suppose we want to search for duplicate functions in all of our JavaScript files. Let’s start basic and work up to it. This technique can be used to search for a TON of duplicate items like:

  • Duplicate HTML IDs
  • Check how many times a CSS class is used
  • Duplicate java classes
  • many, many more…
# Search JS files in this directory for "function"
grep function *.js

The above command will print the lines containing "function" in all JavaScript files in the current directory (NOT subdirectories). Printing out line contents would be much more helpful if we knew what files they come from and their line numbers:

# Print filenames, line #s, and lines that start with "(white space)function"
grep -EHn "^\s*(function \w+|\w+ \= function)" *.js

Depending on how you format your JavaScript files, something like this will omit comments, anonymous functions, and also words like "functionality" giving you better results.

# Print a list of: function <function-name> and sort it
grep -Eho "^\s*function \w+" *.js | sort

-o prints only the part that matches the regular expression. -E options gives me extended regex and -h suppresses printing of the file name. I am then piping to sort which just sorts the output so it a list of function <function-name>. If you don’t have a lot of files/functions to go through, you can just scan the list and then note the duplicate function names you see. Let’s go a step further for those that DO have a big list:

# Print only duplicate function names
grep -hEo "^\s*function \w+" *.js | sort | uniq -d

There we go! That will list only the duplcated functions. I know that we can expand this with awk or other stuff and get the file names and line numbers of the duplicates, but I don’t want to explaining the details of awk ;). I actually had it in this article and then removed it so leave a comment or contact me if you want the code for that.

Other Examples

# Count the number of functions in all JS files
grep -c function *.js

# Print lines that DO NOT have "function"
grep -v function *.js

# List processes that match "pidgin" (non-Windows)
ps -ef | grep pidgin

Conclusion

grep is one of the most used command-line tools, often piped to for filtering output. Understanding it is essential to increasing productivity on the command-line. There is so much more to grep than what I’ve shown here, and it would be cool to see your best uses in the comments!