Comparing Apples and Oranges
When you have used your UNIX machine for a while, you have piles of files (say that six times quickly) lying around. Often, many of the files are duplicates, or near duplicates, of each other. Two programs can help sort out this mess:cmp and diff .The simplest comparison program is cmp; it just tells you whether two files are the same or different. To use cmp to compare two files, type this line:cmp onefile anotherfile
You replace onefile and anotherfile with the names of the files you want to compare, of course. If the contents of the two files are the same, cmp doesn’t say anything (in the finest UNIX tradition). If they’re different, cmp tells how far into the files it got before it found something different. You can compare any two files, regardless of whether they contain text, programs, databases, or whatever, because cmp cares only whether they’re identical.
A considerably more sophisticated comparison program is diff . This program attempts to tell you not only whether two files are different but also how different they are. The files must be plain text, not word processor documents or anything else, or else diff becomes horribly confused. Here’s an example that uses two versions of a story one of us wrote. We compared files tse1 and tse2 by typing this command:
diff tse1 tse2
Enter the name of the older file first and the name of the new, improved second file. The diff program responds: 45c45
< steered back around, but the sheep screamed in panic and reared back.
-
> steered back around, but the goats screamed in panic and reared back.
46a47
> handlebars and landed safely in the snow.
The changes between tse1 and tse2 are that, in line 45, the sheep changed to goats, and a new Line 47 was added after Line 46.diff reports, in its first line of output (45c45 ) that changes (that’s what the c stands for) were made in lines 45 through 45 (that is, just line 45). Then it displays the line in the first file, starting with a < , and the line in the second file, starting with a > . We think of this as diff ’s way of saying that you took out the lines starting with < and inserted the lines starting with > . Then diff reports that a new line is between lines 46 and 47 in the original file, and it shows the line that was inserted. This is a great way of seeing what changes were made when you get a new revision of a document you wrote. Most versions of diff can also show you the context — a few lines around each change — by giving an option like -3 (which shows three lines of context).Tip BSD versions of diff (including the version that usually runs under Linux) can compare two directories to tell you which files are present in one and not in the other, and to show you the differences between files with corresponding names in the two directories. Run diff and give it the names of the two directories.