Analyze File
An example on how to use the analyzer to analyze a file.
Imports
For this example we will use only xarray and analyze_dataset from enstools-compression.
[1]:
import xarray
from pathlib import Path
from enstools.compression.analyzer import analyze_files
WARNING: eccodes c-library not found, grib file support not available!
Path to the destination file
We need to provide the path to a file, in this example we will define a path and then write some data in there. Even though here we are using a path object the same could be done with a simple string.
[2]:
file_path = Path("dummy.nc")
[3]:
dataset_name = "air_temperature"
dataset = xarray.tutorial.open_dataset(dataset_name)
dataset.to_netcdf(file_path)
/tmp/ipykernel_1095/3206858703.py:3: SerializationWarning: saving variable air with floating point data as an integer dtype without any _FillValue to use for NaNs
dataset.to_netcdf(file_path)
Analyze file using default constrains
Use analyze to obtain the compression specification that guarantee quality constrains while maximising compression ratios. In this case if the argument constrains is not provided it will use the default ones, which are "correlation_I:5,ssim_I:2".
Note:
correlation_Iis computed like:-log10(1-pearson_correlation). i.e. number of nines of correlation
correlation_I:5 == correlation:0.99999Similarly
ssim_Iis computed like:-log10(1-ssim).
[4]:
encoding, metrics = analyze_files(file_path)
Analyzing files to determine optimal compression options for compressor None with mode None to fulfill the following constrains:
correlation_I:5,ssim_I:2
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
Compression options:
air: lossy,sz,pw_rel,0.000549
The function returns two dictionaries, one containing the best encoding and another containing the resulting metrics.
[5]:
encoding
[5]:
{'air': 'lossy,sz,pw_rel,0.000549'}
[6]:
metrics
[6]:
{'air': {'correlation_I': 5.064244624947207,
'ssim_I': 3.856431344802704,
'compression_ratio': 7.868889904065783}}
Analyze dataset using custom constrains
If we want to specify different constrains we can do it like this:
[7]:
encoding, metrics = analyze_files(file_path,constrains="correlation_I:3,ssim_I:1")
Analyzing files to determine optimal compression options for compressor None with mode None to fulfill the following constrains:
correlation_I:3,ssim_I:1
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
Compression options:
air: lossy,sz3,abs,1.67
Delete temporary file
[8]:
if file_path.exists():
file_path.unlink()