import requests
import xarray as xr
import ujson
import s3fs
import fsspec
from tqdm import tqdm
from glob import glob
import os
import pathlib
import hvplot.xarray
from kerchunk.hdf import SingleHdf5ToZarr
from kerchunk.combine import MultiZarrToZarr
# The xarray produced from the reference file throws a SerializationWarning for each variable. Will need to explore why
import warnings
"ignore") warnings.simplefilter(
GES DISC - MERRA2
Reading MERRA2 Data Using Kerchunk Reference File
Many of NASA’s current and legacy data collections are archive in netCDF4 format. By itself, netCDF4 are not cloud optimized and reading these files can take as long from a personal/local work environment as it takes to read the data from a working environment deployed in the cloud. Using Kerchunk
, we can treat these files as cloud optimized assets by creating metadata json file describing existing netCDF4 files, their chunks, and where to access them. The json reference files can be read in using Zarr
and Xarray
for efficient reads and fast processing.
Requirements
1. AWS instance running in us-west-2
NASA Earthdata Cloud data in S3 can be directly accessed via temporary credentials; this access is limited to requests made within the US West (Oregon) (code: us-west-2) AWS region.
2. Earthdata Login
An Earthdata Login account is required to access data, as well as discover restricted data, from the NASA Earthdata system. Thus, to access NASA data, you need Earthdata Login. Please visit https://urs.earthdata.nasa.gov to register and manage your Earthdata Login account. This account is free to create and only takes a moment to set up.
3. netrc File
You will need a netrc file containing your NASA Earthdata Login credentials in order to execute the notebooks. A netrc file can be created manually within text editor and saved to your home directory. For additional information see: Authentication for NASA Earthdata.
Import required packages
Create Dask client to process the output json file in parallel
Generating the Kerchunk
reference file can take some time depending on the internal structure of the data. Dask
allows us to execute the reference file generation process in parallel, thus speeding up the overall process.
import dask
from dask.distributed import Client
= Client(n_workers=4)
client client
Client
Client-3e6c0be3-d18c-11ec-809e-527eee20f3f0
Connection method: Cluster object | Cluster type: distributed.LocalCluster |
Dashboard: http://127.0.0.1:8787/status |
Cluster Info
LocalCluster
64356e89
Dashboard: http://127.0.0.1:8787/status | Workers: 4 |
Total threads: 4 | Total memory: 7.48 GiB |
Status: running | Using processes: True |
Scheduler Info
Scheduler
Scheduler-9811e6a9-be9d-4a54-9cc3-f17a8d5ea5bb
Comm: tcp://127.0.0.1:33947 | Workers: 4 |
Dashboard: http://127.0.0.1:8787/status | Total threads: 4 |
Started: Just now | Total memory: 7.48 GiB |
Workers
Worker: 0
Comm: tcp://127.0.0.1:38069 | Total threads: 1 |
Dashboard: http://127.0.0.1:44433/status | Memory: 1.87 GiB |
Nanny: tcp://127.0.0.1:43685 | |
Local directory: /home/jovyan/earthdata-cloud-cookbook/examples/GESDISC/dask-worker-space/worker-6b5pkr_f |
Worker: 1
Comm: tcp://127.0.0.1:32859 | Total threads: 1 |
Dashboard: http://127.0.0.1:36075/status | Memory: 1.87 GiB |
Nanny: tcp://127.0.0.1:35291 | |
Local directory: /home/jovyan/earthdata-cloud-cookbook/examples/GESDISC/dask-worker-space/worker-piqb4cet |
Worker: 2
Comm: tcp://127.0.0.1:36601 | Total threads: 1 |
Dashboard: http://127.0.0.1:45185/status | Memory: 1.87 GiB |
Nanny: tcp://127.0.0.1:41163 | |
Local directory: /home/jovyan/earthdata-cloud-cookbook/examples/GESDISC/dask-worker-space/worker-d_91r19y |
Worker: 3
Comm: tcp://127.0.0.1:34309 | Total threads: 1 |
Dashboard: http://127.0.0.1:38301/status | Memory: 1.87 GiB |
Nanny: tcp://127.0.0.1:44119 | |
Local directory: /home/jovyan/earthdata-cloud-cookbook/examples/GESDISC/dask-worker-space/worker-9wa5m1vc |
Get temporary S3 credentials
Temporary S3 credentials need to be passed to AWS. Note, these credentials must be refreshed after 1 hour.
= {
s3_cred_endpoint 'podaac':'https://archive.podaac.earthdata.nasa.gov/s3credentials',
'lpdaac':'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials',
'ornldaac':'https://data.ornldaac.earthdata.nasa.gov/s3credentials',
'gesdisc':'https://data.gesdisc.earthdata.nasa.gov/s3credentials'
}
def get_temp_creds():
= s3_cred_endpoint['gesdisc']
temp_creds_url return requests.get(temp_creds_url).json()
= get_temp_creds() temp_creds_req
Direct Access a single netCDF4 file
Pass temporary credentials to our filesystem object to access the S3 assets
= s3fs.S3FileSystem(
fs =False,
anon=temp_creds_req['accessKeyId'],
key=temp_creds_req['secretAccessKey'],
secret=temp_creds_req['sessionToken']
token )
= 's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190501.nc4' url
= fs.open(url, mode='rb') s3_file_obj
Time how long it takes to directly access a cloud asset for comparisons later.
%%time
= xr.open_dataset(s3_file_obj, chunks='auto', engine='h5netcdf')
xr_ds xr_ds
CPU times: user 2.9 s, sys: 228 ms, total: 3.13 s
Wall time: 7.53 s
<xarray.Dataset> Dimensions: (lon: 576, lat: 361, time: 24) Coordinates: * lon (lon) float64 -180.0 -179.4 -178.8 -178.1 ... 178.1 178.8 179.4 * lat (lat) float64 -90.0 -89.5 -89.0 -88.5 ... 88.5 89.0 89.5 90.0 * time (time) datetime64[ns] 2019-05-01T00:30:00 ... 2019-05-01T23:30:00 Data variables: (12/47) CLDPRS (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray> CLDTMP (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray> DISPH (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray> H1000 (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray> H250 (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray> H500 (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray> ... ... V250 (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray> V2M (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray> V500 (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray> V50M (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray> V850 (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray> ZLCL (time, lat, lon) float32 dask.array<chunksize=(24, 361, 576), meta=np.ndarray> Attributes: (12/30) History: Original file generated: Sat May 11 22... Comment: GMAO filename: d5124_m2_jan10.tavg1_2d... Filename: MERRA2_400.tavg1_2d_slv_Nx.20190501.nc4 Conventions: CF-1 Institution: NASA Global Modeling and Assimilation ... References: http://gmao.gsfc.nasa.gov ... ... Contact: http://gmao.gsfc.nasa.gov identifier_product_doi: 10.5067/VJAFPLI1CSIV RangeBeginningDate: 2019-05-01 RangeBeginningTime: 00:00:00.000000 RangeEndingDate: 2019-05-01 RangeEndingTime: 23:59:59.000000
Specify a list of S3 URLs
Data Collection: MERRA2_400.tavg1_2d_slv_Nx
Time Range: 05/01/2019 - 05/31/2019
= ['s3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190501.nc4',
urls 's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190502.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190503.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190504.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190505.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190506.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190507.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190508.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190509.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190510.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190511.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190512.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190513.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190514.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190515.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190516.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190517.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190518.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190519.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190520.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190521.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190522.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190523.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190524.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190525.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190526.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190527.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190528.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190529.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190530.nc4',
's3://gesdisc-cumulus-prod-protected/MERRA2/M2T1NXSLV.5.12.4/2019/05/MERRA2_400.tavg1_2d_slv_Nx.20190531.nc4']
Generate the Kerchunk
reference files.
Define a function to generate the Kerchunk
reference files. These files can take a little time to generate.
def gen_json(u):
= dict(
so = "rb",
mode= False,
anon= False,
default_fill_cache= "none"
default_cache_type
)with fs.open(u, **so) as infile:
= SingleHdf5ToZarr(infile, u, inline_threshold=300)
h5chunks with open(f"jsons/{u.split('/')[-1]}.json", 'wb') as outf:
outf.write(ujson.dumps(h5chunks.translate()).encode())
Create output jsons directory if one does not exist.
'./jsons/').mkdir(exist_ok=True) pathlib.Path(
Use the Dask Delayed function to create the Kerchunk
reference file for each URL from the list of URLs in parallel
%%time
= []
reference_files for url in urls:
= dask.delayed(gen_json)(url)
ref
reference_files.append(ref)
= dask.compute(*reference_files) reference_files_compute
CPU times: user 29 s, sys: 11.1 s, total: 40 s
Wall time: 11min 6s
Create a python list with the paths to the reference files.
= sorted(glob('./jsons/*.json'))
reference_list reference_list
['./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190501.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190502.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190503.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190504.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190505.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190506.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190507.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190508.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190509.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190510.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190511.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190512.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190513.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190514.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190515.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190516.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190517.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190518.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190519.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190520.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190521.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190522.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190523.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190524.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190525.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190526.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190527.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190528.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190529.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190530.nc4.json',
'./jsons/MERRA2_400.tavg1_2d_slv_Nx.20190531.nc4.json']
Read single netCDF4 using Kerchunk
reference file
Open the first reference file to read into an xarray dataset
with open(reference_list[0]) as j:
= ujson.load(j) reference
Set configurations options
= {'skip_instance_cache':True} #json
s_opts = {'anon':False,
r_opts 'key':temp_creds_req['accessKeyId'],
'secret':temp_creds_req['secretAccessKey'],
'token':temp_creds_req['sessionToken']} #ncfiles
= fsspec.filesystem("reference",
fs_single =reference,
fo=s_opts,
ref_storage_args='s3',
remote_protocol=r_opts) remote_options
Read in a single reference object. We get a lot of SerializationWarnings
which are ignored here using the warning
package.
NOTE, the fill value
, data range
, min value
, and max value
DO NOT match the source file. Will need to look into this more.
%%time
= fs_single.get_mapper("")
m = xr.open_dataset(m, engine="zarr", backend_kwargs={'consolidated':False}, chunks={})
ds_single ds_single
CPU times: user 142 ms, sys: 3.29 ms, total: 146 ms
Wall time: 354 ms
<xarray.Dataset> Dimensions: (time: 24, lat: 361, lon: 576) Coordinates: * lat (lat) float64 -90.0 -89.5 -89.0 -88.5 ... 88.5 89.0 89.5 90.0 * lon (lon) float64 -180.0 -179.4 -178.8 -178.1 ... 178.1 178.8 179.4 * time (time) datetime64[ns] 2019-05-01T00:30:00 ... 2019-05-01T23:30:00 Data variables: (12/47) CLDPRS (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> CLDTMP (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> DISPH (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> H1000 (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> H250 (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> H500 (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> ... ... V250 (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> V2M (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> V500 (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> V50M (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> V850 (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> ZLCL (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> Attributes: (12/30) Comment: GMAO filename: d5124_m2_jan10.tavg1_2d... Contact: http://gmao.gsfc.nasa.gov Conventions: CF-1 DataResolution: 0.5 x 0.625 EasternmostLongitude: 179.375 Filename: MERRA2_400.tavg1_2d_slv_Nx.20190501.nc4 ... ... TemporalRange: 1980-01-01 -> 2016-12-31 Title: MERRA2 tavg1_2d_slv_Nx: 2d,1-Hourly,Ti... VersionID: 5.12.4 WesternmostLongitude: -180.0 identifier_product_doi: 10.5067/VJAFPLI1CSIV identifier_product_doi_authority: http://dx.doi.org/
Read multiple netCDF4 files using Kerchunk
reference file
Combine the individual reference files into a single time series reference object
%%time
=[]
ds_k for ref in reference_list:
= s_opts
s_opts = r_opts
r_opts = fsspec.filesystem("reference",
fs =ref,
fo=s_opts,
ref_storage_args='s3',
remote_protocol=r_opts)
remote_options= fs.get_mapper("")
m ="zarr", backend_kwargs={'consolidated':False}, chunks={}))
ds_k.append(xr.open_dataset(m, engine
= xr.concat(ds_k, dim='time')
ds_multi
ds_multi
CPU times: user 8.93 s, sys: 174 ms, total: 9.1 s
Wall time: 14.9 s
<xarray.Dataset> Dimensions: (time: 744, lat: 361, lon: 576) Coordinates: * lat (lat) float64 -90.0 -89.5 -89.0 -88.5 ... 88.5 89.0 89.5 90.0 * lon (lon) float64 -180.0 -179.4 -178.8 -178.1 ... 178.1 178.8 179.4 * time (time) datetime64[ns] 2019-05-01T00:30:00 ... 2019-05-31T23:30:00 Data variables: (12/47) CLDPRS (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> CLDTMP (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> DISPH (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> H1000 (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> H250 (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> H500 (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> ... ... V250 (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> V2M (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> V500 (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> V50M (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> V850 (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> ZLCL (time, lat, lon) float32 dask.array<chunksize=(1, 91, 144), meta=np.ndarray> Attributes: (12/30) Comment: GMAO filename: d5124_m2_jan10.tavg1_2d... Contact: http://gmao.gsfc.nasa.gov Conventions: CF-1 DataResolution: 0.5 x 0.625 EasternmostLongitude: 179.375 Filename: MERRA2_400.tavg1_2d_slv_Nx.20190501.nc4 ... ... TemporalRange: 1980-01-01 -> 2016-12-31 Title: MERRA2 tavg1_2d_slv_Nx: 2d,1-Hourly,Ti... VersionID: 5.12.4 WesternmostLongitude: -180.0 identifier_product_doi: 10.5067/VJAFPLI1CSIV identifier_product_doi_authority: http://dx.doi.org/
Agains, the fill value
, data range
, min value
, and max value
DO NOT match the source file. TODO: explore why the values are different
'T500'] ds_multi[
<xarray.DataArray 'T500' (time: 744, lat: 361, lon: 576)> dask.array<concatenate, shape=(744, 361, 576), dtype=float32, chunksize=(1, 91, 144), chunktype=numpy.ndarray> Coordinates: * lat (lat) float64 -90.0 -89.5 -89.0 -88.5 -88.0 ... 88.5 89.0 89.5 90.0 * lon (lon) float64 -180.0 -179.4 -178.8 -178.1 ... 178.1 178.8 179.4 * time (time) datetime64[ns] 2019-05-01T00:30:00 ... 2019-05-31T23:30:00 Attributes: fmissing_value: 999999986991104.0 long_name: air_temperature_at_500_hPa standard_name: air_temperature_at_500_hPa units: K valid_range: [-999999986991104.0, 999999986991104.0] vmax: 999999986991104.0 vmin: -999999986991104.0
# Commenting for quarto site render
# ds_multi['T500'].hvplot.image(x='lon', y='lat')
References
- https://github.com/fsspec/kerchunk
- https://medium.com/pangeo/fake-it-until-you-make-it-reading-goes-netcdf4-data-on-aws-s3-as-zarr-for-rapid-data-access-61e33f8fe685
- https://medium.com/pangeo/cloud-performant-reading-of-netcdf4-hdf5-data-using-the-zarr-library-1a95c5c92314