I use the command “diff” almost every day to compare files and see difference. I’ve never thought about fancy features of “diff” until today.
Here is the story. I’ve copied 203 huge files — 12T in total over the network and would like to see if any files are corrupted. So I did a check of md5 checksum on them and no surprise there is corruption for a bunch of files. I put the output of md5 checksum in a file with filename as the first column and md5 checksum as the second column using awk, sed, sort etc. I got two files — one from the source, one from the destination. They are sorted by the file name as the following:
/backup/PRODTMP/00unil95_1_1 73091b76be5133e76df3b1e925551e78
/backup/PRODTMP/01unim2v_1_1 76e5d8e74c0f1e6dd95baa4d2ad4f399
/backup/PRODTMP/02unimtm_1_1 682208fea2bf915a3e5b88bbd315f7eb
/backup/PRODTMP/03uninn5_1_1 b39969c28b427877644740482632f139
Now I need to present difference to other folks, how can I do that with “diff“. After reading its man page, yes, there is a way to do it with the two options:
-y (--side-by-side) Put the output side by side
--suppress-common-lines Do not output common lines.
So the command will be something like this:
diff -y --suppress-common-lines md5_sorted_pri.txt md5_sorted_std.txt|awk '{print $1"\t"$2"\t"$5}'
And the results look very nice:

So don’t forget man pages when googling has become a habit for lots of us these days.