Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The HPC team is updating this page. Check back for new information.

Compressibility By Files Types

Compressing files types that are ≥0.80 should be avoided unless it’s part of a directory.

Type

Extension

Avg Ratio (compressed size/original size)

binary

<null>/.bin/.exe

0.25-1+

java binary

.jar

0.75-0.90

docx

.docx

0.80-0.85

gif

.gif

0.80-0.95

compressed files

.gzip/.zip/.bz2

>1

image files

.jpg/.jpeg/.png

0.93-1+

data files

.json/.xml

0.30-0.60

audio files

.mp3/.ogg/.mp4

0.80-0.95

pdf

.pdf

0.50-0.95

svg

.svg

0.30-0.57

fonts

.ttf

0.46-0.71

txt

<null>/.txt

0.32-0.55

wav

.wav

0.45-0.95

source files

.c/.cpp/.h/.java/.js/.py/.html/.css/.hpp/.lua

0.10-0.45

library files

.so

0.25-0.45

log files

.log

0.05-0.25

Source

To determine file type:

Code Block
% ls -l
-rw-r--r-- 1 root root 2625604 Jun 15  2022 mstflint-4.16.0-1.53100.x86_64.rpm
-rwx------ 1 root root    1415 Mar 16 10:35 weka_install.sh

% file mstflint-4.16.0-1.53100.x86_64.rpm
mstflint-4.16.0-1.53100.x86_64.rpm: RPM v3.0 bin i386/x86_64 mstflint-4.16.0-1.53100

% file weka_install.sh
weka_install.sh: POSIX shell script, ASCII text executable

...

Code Block
# List directory
% ls -l
drwxr-xr-x  5 aaa0000 Domain_Users       4096 Jun 22  2018 data1
drwxr-xr-x  5 aaa0000 Domain_Users       4096 Jun 22  2018 data2
drwxr-xr-x  5 aaa0000 Domain_Users       4096 Jun 22  2018 data3

# Make compressed tarballs directly
% tar czf data1.tgz  data1
% tar czf data2.tgz  data2
% tar czf data3.tgz  data3

# Make tarballs
% tar czf data1.tar  data1
% tar czf data2.tar  data2
% tar czf data3.tar  data3

# Compress tarballs
% gzip data1.tar
% gzip data2.tar
% gzip data3.tar

It is VERY IMPORTANT that you remove the original copy after taring/compressing the file/directory.

...