Javatpoint Logo
Javatpoint Logo

How to ZIP and UNZIP folder in Ubuntu

Zip is a archive file format that is used to store compress data. We can store any number of files with this compression technique. This format was originally created in 1989 by Phil Katz.

Introduction to unzip

unzip can list, check, and extract files through a ZIP archive. It is commonly found in MS-DOS systems. Its default behavior is to extract into the recent directory each file from the given ZIP archive. zip, a companion program that builds ZIP archives; these programs are suitable with archives built by PKWARE's PKUNZIP and PKZIP for MS-DOS. However, the program's default behavior and options differ in several cases.

Arguments of Unzip

  • file[.zip]: It is the ZIP archive path. All matching files are processed in a sequence decided by the OS if the specification of the file is a wildcard. The filename can only be a wildcard; its path can't. Wildcard expressions are the same as those supported in generally utilized Unix shells and may include:
    • *: It matches an order of 0 or other characters.
    • ?: It exactly matches 1 character.
    • [...]: It matches a single character within the brackets; ranges are determined by a starting character, an ending character, and a hyphen.
  • [file(s)]: It is an optional list of the members of an archive to be processed, isolated by spaces. Wildcards or regular expressions may be utilized to match two or more members.
  • [-x xfile(s)]: It is an optional list of the members of an archive to be eliminated from processing. This option may be utilized to eliminate files that are inside subdirectories because characters generally match directory separators.
  • [-d exdir]: By default, every subdirectory and file is recreated within the current directory, this option permits extraction inside an arbitrary directory; this option doesn't need to occur at the completion of the command line.

Options of Unzip

Note: The usage screen of unzip is limited to 23 or 22 lines, so it should be considered just a reminder of the common unzip syntax instead of an exhaustive list of every possible flag in order to support obsolescent hardware.

The exhaustive list is below:

  • -Z: It shows zipinfo mode. The rest of the options are considered to be zipinfo options if the initial option is -Z on the command line.
  • -A: It shows extended help for the programming interface of DLL.
  • -c: It extracts files to screen/stdout. This option is the same as the -p option, excluding that all file names are printed as it's extracted, the option, i.e., a, is permitted, and ASCII-EBCDIC conversion is implemented automatically if appropriate. Also, this option isn't mentioned in the usage screen of unzip.
  • -f: It freshens available files, i.e., extracts just those files that are available on the disk and newer as compared to the disk copies. unzip by default queries before going to overwrite. However, the option, i.e., -o, may be utilized to suppress those queries.

Note: The timezone environment variable must be in order correctly for -u and -f to properly work under several operating systems. For it, the reasons are somewhat indirect but need to do with many differences between Unix-format times and DOS-format file times and the requirement for comparing the two. The value of a standard timezone is "PST8PDT".

  • -l: It lists the archive files. Modification time and date, uncompressed file sizes, and names of the given files are displayed with totals for every specified file. If unzip was arranged with OS2-EAS specified, the option, i.e., -l, also lists size columns of stored OS/2 access control lists and OS/2 extended attributes. In addition, the individual file comments and zipfile comments are shown.
  • -p: This option is used to extract the files to stdout (pipe). Nothing but the data from the file is transferred to stdout, and these files are extracted in a binary format, similarly like they're stored.
  • -t: It represents the test archive files. It extracts all given files in memory with a comparison of the file CRC and the original stored CRC value of the file.
  • -T: It sets the timestamp over the archive. This option is associated with the -go option of zip, except that it can also be utilized on wildcard zipfiles and is much faster.
  • -u: It updates the available files and makes new ones if required. It performs a similar function to the -f flag, extracting files newer than the same name files on disk. Additionally, it extracts the files that don't available on disk.
  • -v: It lists the archive files or displays the information of the diagnostic version. This option has been derived and now acts as both a modifier and an option.
  • -z: It shows just the archive comment.

Modifiers of unzip

  • -a: It converts the text files. Every file is ordinarily extracted as they were stored. This option causes files recognized as text files by zip to be extracted automatically, as such, end-of-file characters, converting line endings, and the character set as necessary.
  • -b: It treats every file as binary. It is a shortcut for the ---a option.
  • -B: It stores a backup copy of all overwritten files. This backup file gets the target file name along with optionally a special sequence number and a tilde appended. This sequence number is used whenever other files using the actual name and tilde exist already.
  • -C: It uses case-insensitive matching for the archive entry selection from the extract selection pattern's command-line list. The default behavior of unzip is to match both literal filenames and wildcards case-sensitively because a few file systems are case-sensitive, and both unzip itself, and ZIP archives are portable around platforms.
  • -D: It skips timestamp restoration for extracted items. Unzip normally attempts to restore every meta-information for those extracted items supplied inside the ZIP archive. By specifying this option, unzip is instructed to explicitly suppress the timestamp restoration for directories made from Zip archive entries.
  • -DD: It forces timestamp restoration suppression for every extracted entry. It provides a timestamp setting for every extracted entry to the recent time.
  • -E: It shows MacOS extra field contents during the restore task.
  • -F: It suppresses deletion of NFS filetype extension using stored file names.
  • -i: It avoids filenames that are stored in the extra fields of MacOS.
  • -j: It represents junk paths. The directory structure of the archive isn't recreated; every file is deposited inside the extraction directory.
  • -J: It represents junk file attributes. The BeOS file attributes of the file aren't restored, only the data of the file.
  • -K: It retains Tacky/SGID/SUID file attributes. The bits of these attributes are removed for security purposes without this option.
  • -L: It converts any filename to lowercase derived on the uppercase-only file system or operating system.
  • -M: It pipes every result from an internal pager, the same as the Unix more command.
  • -n: It never overwrites available files.
  • -N: It extracts the comments of a file as Amiga filenotes.
  • -o: It overwrites available files without asking. It is a risky option, so we need to use it carefully.
  • -P: It uses a password for decrypting encrypted zipfile entries.
  • -q: It quietly performs operations.
  • -s: It transforms spaces into filenames to underscores. By default, unzip extracts filenames along with spaces intact because every PC permits space in their filenames.
  • -S: It transforms text files into a record format, i.e., Stream_LF, rather than the default text file, variable length record format.
  • -U: It modifies or deactivates UTF-8 handling.
  • -V: It retains the version numbers of the file.
  • -W: It modifies the matching pattern routine so that '*' and '?" don't the same as the '/' directory separator character.
  • -X: It restores protection/owner information under VMS, group and user information under Unix, ACLs (access control lists) under possible network-enabled releases of OS/2, or security access control lists under Windows NT.
  • -Y: It treats the endings of the archived file name of ".nnn" as if they were version numbers of VMS.

What is a Zip file?

Zip can be described as an archive file format. A Zip file may include one or more directories or files that may have been compressed. This file format allows several compression algorithms, and DEFLATE is one of the most basic ones. Originally, the zip format was made in 1989, and it was initially implemented in the PKZIP utility of PKWARE, Inc. as a substitution for the older ARC format.

After that, the Zip format was instantly supported by various software utilities than PKZIP. The built-in support of Zip was included by Microsoft in Microsoft Windows versions since 1998 through the "Plus! 98".

History of ZIP

The .ZIP format was developed by PKWARE's Phil Katz and Infinity Design Concept's Gary Conway. This format was made after SEA (Systems Enhancement Associates) reported a lawsuit against PKWARE stating that the archiving products of the latter, called PKARC, were copies of SEA's ARC archiving system.
Infinity Design Concepts and PKWARE created a joint press on 14 February 1989, publishing the .ZIP format for the public domain.

ZIP version history

The File Format Specification of ZIP has its version number, which doesn't necessarily relate to the PKZIP tool's version number, specifically with the six or later versions of PKZIP. PKWARE has included prior aspects that permit PKZIP products for extracting archives with standard aspects, but PKZIP products that make such archives aren't made available before the next major version. Other organizations and companies support the specifications of PKWARE at their pace.
Formally, the specification of the .zip file format is called "APPNOTE - .ZIP File Format Specification".

It has been released on the website, i.e., PKWARE.com since the late 1990s. Various specification versions were not released. Specifications of a few aspects, including BZIP2 compression, robust encryption specification, and many others were released by PKWARE some years after they were created. The online specification URL was modified on the PKWARE website many times.

A summary of significant advances in several PKWARE specification versions:

  • 0: In 1993, file entries could be compressed using DEFLATE and apply classic PKWARE encryption, i.e., ZipCrypto.
  • 1: It is a Deflate64 compression developed in 1996.
  • 5: It is a 64-bit documented zip format developed in 2001.
  • 6: It is a BZIP2 compression developed in 2001.
  • 0: It supports RC4, RC2, Triple DES, and DES for encryption and was developed in 2002.
  • 2: It supports AES encryption for SES and AES through WinZip; the corrected RC2-64 version supported SES encryption. It was developed in 2003.
  • 1: It is a documented certificate storage developed in 2004.
  • 2.0: It is a documented Central Directory Encryption developed in 2004.
  • 3.0: It is a UTF-8 (Documented Unicode) filename storage developed in 2006. Expanded list of hashes, encryption algorithms (Twofish and Blowfish), and supported compression algorithms (PPMd+ and LZMA).
  • 3.1: It is the hash values of corrected standard for SHA-256/384/512 developed in 2007.
  • 3.2: It is method 97 (WavPack) of documented compression developed in 2007.
  • 3.3: It defines document formatting modifications to provide referencing the Application Note of PKWARE from many standards with the help of some methods like the JTC 1 RER (Referencing Explanatory Report) as conducted by JTC 1/SC 34 N 1621. It was developed in 2012.
  • 3.4: It was developed in 2014, and it updates the office address of PKWARE, Inc.
  • 3.5: It was developed in 2018. It is the documented compression methods 99, 96, and 16, DOS timestamp precision and epoch, added other fields for decryption and keys, as well as clarifications and typos.
  • 3.6: It is a revised typographical error and was developed in 2019.

Standardization

ISO/IEC JTC 1 stated that a ballot to decide whether any project should be stated to establish an ISO/IEC International Standard format suitable with ZIP in April 2010. The expected project, named Document Packaging, is considered a ZIP-suitable 'minimal compressed archive format' compatible with several standard usages, including EPUB, Office Open XML, and OpenDocument.

"Document Container File-Part 1: Core" of ISO/IEC 21320-1 was released, which specifies that "Document container files are in agreement with ZIP files" in 2015. It needs the below primary ZIP file format restrictions:

  • Archives might not span two or more volumes or be disjointed.
  • The "patched data" aspects are prohibited.
  • The digital signature aspects are prohibited.
  • The encryption aspects are prohibited.
  • In ZIP archives, the file may just be uncompressed or utilizing the "deflate" compression.

Design of ZIP

.ZIP file is the archive that saves two or more files. It permits included files to be compressed with various methods and simply stores any file without compressing it. All files are stored independently, permitting distinct files in a similar archive to be compressed with several methods. Due to the files being individually compressed in a zip archive, it's possible to add newer ones or extract them without using decompression or compression to the whole archive. It contrasts with the compressed tar file format, for which these types of random-access processing aren't possible easily.

A directory is positioned at the completion of a ZIP file. It recognizes what files are within ZIP and recognizes where that file is placed in the ZIP. It permits ZIP readers for loading the file lists without reading the whole ZIP archive. Also, ZIP archives can add additional data not associated with the ZIP archive. It permits a ZIP archive to be established into a self-extracting archive by adding the program code to any ZIP archive and naming the file executable. Also, saving the catalog makes it possible at the end to cover a zipped file by fixing it to an innocuous file, including a GIF image file.

A ZIP format Utilizes a 32-bit CRC algorithm and adds two copies of all entry's metadata to facilitate better protection against the loss of the data.

Structure of ZIP

A ZIP file is recognized by the availability of an end of central directory record correctly, which is placed at the archive structure's end in order to permit the easy fixing of new files. The name of all directories or files in the archive should be named in a central directory entry with other metadata of the entry, and the offset in the ZIP file, representing the original entry data if the central directory record's end represents a non-empty archive.

  • It permits a file archive listing to be preferred quickly because the whole archive doesn't need to be read to find the file lists.
  • In the ZIP file, the entries also contain this information in the local file header for redundancy.
  • Files only named inside the central directory are valid at the file's end, as ZIP files might be fixed to.
  • For the headers of a local file, scanning any ZIP file is invalid because the central directory may announce that a few files have been removed and several files have been upgraded.
  • In the central directory, the sequence of the file entries requires not correspond to the sequence of the file entries inside the archive.

All entries stored inside a ZIP archive are announced by a local file header along with details of the file, including file name, file size, and comment, followed by "extra" data fields, and after that, the possibly encrypted, possibly compressed file data. These "Extra" data fields are vital to the ZIP format's extensibility.

"Extra" data fields are accomplished to support file attributes, WinZip-compatible AES encryption, ZIP64 format, and higher-resolution Unix and NTFS file timestamps. Many other extensions are potential through the "Extra" data field. Besides, ZIP tools are needed by the specification to avoid Extra fields they don't identify.

The ZIP format utilizes specific four-bytes "signatures" to represent several structures inside the file. All file entries are highlighted by a particular signature. The central directory record's end is represented with its particular signature, and all entries start with the four-byte signature of the central file header in the central directory.

In this tutorial, we will zip and unzip directory using Ubuntu terminal.

First see, number of directories present in current directory.

Software Zip and Unzip 1

Currently, we have a javatpoint directory that we will zip in next step.

ZIP Directory

Use the following command to ZIP javatpoint directory.


Software Zip and Unzip 2

See again, number of directories present in current directory. Here, now we have one more directory that is created as zip file.

Software Zip and Unzip 3

Like zip, we can unzip zipped directory. See, as we did below.

Unzip Directory

Use following command to unzip zipped directory.


Software Zip and Unzip 4

Extra Info

zip command has various flags to set attributes for zip file. For more about these flags, ask for help.


Software Zip and Unzip 5





Youtube For Videos Join Our Youtube Channel: Join Now

Feedback


Help Others, Please Share

facebook twitter pinterest

Learn Latest Tutorials


Preparation


Trending Technologies


B.Tech / MCA