Analyze Dataset

An example on how to use the analyzer to analyze a dataset.

Imports

For this example we will use only xarray and analyze_dataset from enstools-compression.

[1]:
import xarray
from enstools.compression.analyzer.analyzer import analyze_dataset
WARNING: eccodes c-library not found, grib file support not available!
[2]:
dataset_name = "air_temperature"
dataset = xarray.tutorial.open_dataset(dataset_name)
dataset
[2]:
<xarray.Dataset>
Dimensions:  (lat: 25, time: 2920, lon: 53)
Coordinates:
  * lat      (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 25.0 22.5 20.0 17.5 15.0
  * lon      (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
  * time     (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
Data variables:
    air      (time, lat, lon) float32 ...
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

Analyze dataset using default constrains

Use analyze_dataset to obtain the compression specification that guarantee quality constrains while maximising compression ratios. In this case if the argument constrains is not provided it will use the default ones, which are "correlation_I:5,ssim_I:2".

Note:

correlation_I is computed like: -log10(1-pearson_correlation). i.e. number of nines of correlation

correlation_I:5 == correlation:0.99999

Similarly ssim_I is computed like: -log10(1-ssim).

[3]:
encoding, metrics = analyze_dataset(dataset=dataset)
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested

The function returns two dictionaries, one containing the best encoding and another containing the resulting metrics.

[4]:
encoding
[4]:
{'air': 'lossy,sz,pw_rel,0.000549'}
[5]:
metrics
[5]:
{'air': {'correlation_I': 5.064244624947207,
  'ssim_I': 3.856431344802704,
  'compression_ratio': 7.868889904065783}}

Analyze dataset using custom constrains

If we want to specify different constrains we can do it like this:

[6]:
encoding, metrics = analyze_dataset(dataset=dataset,constrains="correlation_I:3,ssim_I:1")
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested
WARNING: Only absolute and norm2 modes properly tested