Analyze File

An example on how to use the analyzer to analyze a file.

Imports

For this example we will use only xarray and analyze_dataset from enstools-compression.

[1]:
import xarray
from pathlib import Path
from enstools.compression.analyzer import analyze_files
WARNING: eccodes c-library not found, grib file support not available!

Path to the destination file

We need to provide the path to a file, in this example we will define a path and then write some data in there. Even though here we are using a path object the same could be done with a simple string.

[2]:
file_path = Path("dummy.nc")
[3]:
dataset_name = "air_temperature"
dataset = xarray.tutorial.open_dataset(dataset_name)
dataset.to_netcdf(file_path)
/tmp/ipykernel_1095/3206858703.py:3: SerializationWarning: saving variable air with floating point data as an integer dtype without any _FillValue to use for NaNs
  dataset.to_netcdf(file_path)

Analyze file using default constrains

Use analyze to obtain the compression specification that guarantee quality constrains while maximising compression ratios. In this case if the argument constrains is not provided it will use the default ones, which are "correlation_I:5,ssim_I:2".

Note:

correlation_I is computed like: -log10(1-pearson_correlation). i.e. number of nines of correlation

correlation_I:5 == correlation:0.99999

Similarly ssim_I is computed like: -log10(1-ssim).

[4]:
encoding, metrics = analyze_files(file_path)

Analyzing files to determine optimal compression options for compressor None with mode None to fulfill the following constrains:
correlation_I:5,ssim_I:2


WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
Compression options:
air: lossy,sz,pw_rel,0.000549

The function returns two dictionaries, one containing the best encoding and another containing the resulting metrics.

[5]:
encoding
[5]:
{'air': 'lossy,sz,pw_rel,0.000549'}
[6]:
metrics
[6]:
{'air': {'correlation_I': 5.064244624947207,
  'ssim_I': 3.856431344802704,
  'compression_ratio': 7.868889904065783}}

Analyze dataset using custom constrains

If we want to specify different constrains we can do it like this:

[7]:
encoding, metrics = analyze_files(file_path,constrains="correlation_I:3,ssim_I:1")

Analyzing files to determine optimal compression options for compressor None with mode None to fulfill the following constrains:
correlation_I:3,ssim_I:1


WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
Compression options:
air: lossy,sz3,abs,1.67

Delete temporary file

[8]:
if file_path.exists():
    file_path.unlink()