Linux grep a string within a ZIP file containing multiple files — zgrep or zipgrep?

From time to time, you might need to search certain strings within an archive file in the zip format which contains multiple files on a Linux system. If you have never done this, you might ask — what tools to use? Are there existing commands which can be used instead of writing your own script to unzip the archive and search?

Fortunately the answer is yes. There are commands like zgrep and zipgrep. What’s the difference between them then?

When I got a request as mentioned above to grep a string within an archive, I didn’t know which commands to use and which commands for which scenarios. I first tried the command zgrep because I wasn’t aware of zipgrep.

Using zgrep to searching a zip file doesn’t really work with two issues:

  1. It can only search the first file within the zip file.
  2. It doesn’t list the file name.

For example, I created two zip files which contians two plain text files in different orders –aaa.log contained the string “jli” I was looking for, the other one bbb.log didn’t.

test_grep1.zip had aaa.log as the first one while test_grep.zip had bbb.log as the first file within the archive.

zip test_grep1.zip aaa.log bbb.log
zip test_grep.zip bbb.log aaa.log

root@jlitest:/var/log# unzip -l test_grep.zip
Archive:  test_grep.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
   376908  03-24-2022 20:52   bbb.log
        8  03-24-2022 20:52   aaa.log
---------                     -------
   376916                     2 files

root@jlitest:/var/log# unzip -l test_grep1.zip
Archive:  test_grep1.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        8  03-24-2022 20:52   aaa.log
   376908  03-24-2022 20:52   bbb.log
---------                     -------
   376916                     2 files

When searching the string “jli” within test_grep1.zip, it found it. But searching within test_grep.zip, it didn’t.

root@jlitest:/var/log# zgrep jli test_grep1.zip
jli
jli

root@jlitest:/var/log# zgrep jli test_grep.zip

Then I took a close look at zgrep, it is actually just a bash file wrapping grep & gzip (using gzip’s options “-c”, “-d” to decompress to stdout) as its man page states “zgrep — a wrapper around a grep program that decompresses files as needed”

root@jlitest:/var/log# which zgrep
alias zgrep='zgrep --color=auto'
        /usr/bin/zgrep

No wonder it doesn’t work well with zip files. As you can see from the following example:

root@jlitest:/var/log# gzip -d -c test_grep1.zip
jli
jli
gzip: test_grep1.zip has more than one entry--rest ignored

It will stop after the first file because it only expects 1 compressed file.

Then I realized there is another command zipgrep (bash file again) which wraps egrep & unzip — exactly what I was looking for my task.

Its man page says “zipgrep: Use unzip and egrep to search the specified members of a zip archive for a string or pattern.”

root@jlitest:/var/log# zipgrep jli test_grep1.zip
aaa.log:jli
aaa.log:jli
root@jlitest:/var/log# zipgrep jli test_grep.zip
aaa.log:jli
aaa.log:jli

Again the concept of zipgrep is to unzip files within an archive (zip) file to stdout and egrep patterns from there. It uses unzip’s “-p” option to extract files (only the file data) to pipe (stdout).

unzip has another “-c” option to extract files to stdout/screeen — similar to “-p” but with the name of each file printed.

So you can use simple commands to implement zipgrep.

Using “-p”, no file name is printed.

root@jlitest:/var/log# unzip -p test_grep1.zip|egrep "extracting|inflating|jli"
jli
jli

Using “-c”, the file name is printed.

 extracting: aaa.log
jli
jli
  inflating: bbb.log
root@jlitest:/var/log# unzip -c test_grep.zip|egrep "extracting|inflating|jli"
  inflating: bbb.log
 extracting: aaa.log
jli
jli

Extracting files to stdout is quite useful for another scenario to search a zip file — what if you only want to search a string in a specific file within a zip file? In this case, you don’t want to use zipgrep because it will try to scan all files within the zip file. For example, you have a file named aaa.log within logs.zip and you want to just search this aaa.log for the string “jli”, as showed in the above example, we could just use the “-p” option of unzip:

root@joetest:~/Service_Tools# unzip -p logs.zip aaa.log|grep jli
jli

Having fun with your searching a zip file now!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s