# Data Structures


The core functionality of UXarray revolves around three data structures, which are used for interacting with unstructured grids and the data variables that reside on them.

1. **[`uxarray.Grid`](https://uxarray.readthedocs.io/en/latest/user_api/generated/uxarray.UxDataArray.html)**: Stores the grid representation (i.e. coordinates, connectivity information, etc.)
2. **[`uxarray.UxDataset`](https://uxarray.readthedocs.io/en/latest/user_api/generated/uxarray.UxDataset.html)**: One or more data variable that resided on a grid.
3. **[`uxarray.UxDataArray`](https://uxarray.readthedocs.io/en/latest/user_api/generated/uxarray.UxDataArray.html)**: A single data variable that resides on a grid 


In [None]:
import xarray as xr

import uxarray as ux

## Grid and Data Files


When working with unstructured grid datasets, the grid definition is typically stored separately from any data variables. 

For example, the dataset used in this example is made up of two files: a single grid definition and a single data file.


```
quad-hexagon
│   grid.nc
│   data.nc
```

In [None]:
grid_path = "../../test/meshfiles/ugrid/quad-hexagon/grid.nc"
data_path = "../../test/meshfiles/ugrid/quad-hexagon/data.nc"

Additionally, there may be multiple data files that are mapped to the same unstructured grid (such as the case with climate model output). Using our sample dataset, this may look something like this:

```
quad-hexagon
│   grid.nc
│   data1.nc
|   data2.nc
|   data3.nc
```

We can store these paths as a list (in this case we simply repeat the original data file to imitate having 4 separate data files)

In [None]:
multiple_data_paths = [data_path for i in range(3)]

## Grid

The `Grid` class is used for storing variables associated with an unstructured grid's topology. This includes dimensions, coordinates, and connectivity variables.

### Creating a Grid

The recommended way to construct a `Grid` is by using the `ux.open_grid()` method, which takes in a grid file path, detects the input grid format, and parses and encodes the provided coordinates and connectivity into the UGRID conventions. Details on supported grid formats and what variables are parsed can be found in other parts of this user guide.

In [None]:
uxgrid = ux.open_grid(grid_path)
uxgrid

### Accessing Variables

As we saw above when printing out Grid instance, there are many variables that are associated with a single grid. In addition to the general repr, we can obtain the stored dimensions, coordinates, and connectivity variables through the following attributes.



In [None]:
uxgrid.dims

In [None]:
uxgrid.sizes

In [None]:
uxgrid.coordinates

In [None]:
uxgrid.connectivity

We can access any desired quantity by either calling an attribute by the same name or by indexing a `Grid` like a dictionary.

In [None]:
uxgrid.node_lon

In [None]:
uxgrid["node_lon"]

### Constructing Additional Variables

Looking at `Grid.connectivity` one more time, we can see that there are only two available variables. 

In [None]:
uxgrid.connectivity

These variables are the ones that were able to be parsed and encoded in the UGRID conventions from the inputted grid file.

In addition to parsing variables, we can construct additional variables by calling the attribute or indexing the Grid with the desired name. For example, if we wanted to construct the `face_edge_connectivity`, we would do the following:

In [None]:
uxgrid.face_edge_connectivity

Now if we look at our `Grid.connectivity`, we can see that it now contains our new connectivity variable.

In [None]:
uxgrid.connectivity

All grid variables can be accessed using an attribute. At the time the user calls the attribute (in the above example `uxgrid.face_edge_connectivity`), there is code in place to check whether the variable is present within the `Grid`. If it's available, it is directly returned to the user, otherwise it is constructed. Below shows off how this works internally.

```Python
@property
def face_edge_connectivity(self) -> xr.DataArray:
    """Indices of the edges that surround each face.

    Dimensions: ``(n_face, n_max_face_edges)``
    """
    if "face_edge_connectivity" not in self._ds:
        _populate_face_edge_connectivity(self)

    return self._ds["face_edge_connectivity"]
```


## UxDataset

Up to this point, we've exclusively looked at the unstructured grid without any data variables mapped to it. Working with a standalone `Grid` has its applications, such as grid debugging and analysis, however more commonly an unstructured grid is paired with data variables that are mapped to it.  

The `UxDataset` class is used for pairing one or more data variables with an unstructured grid. It operates similarly to a `xarrary.Dataset`, with the addition of unstructured-grid specific functionality and is linked to an instance of a `Grid`.

```{info}
More information about `xarray.Dataset` can be found [here](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.html).
```


### Opening a Single Data File

We can load a pair of grid and data files using the `ux.open_dataset()` method.


In [None]:
uxds = ux.open_dataset(grid_path, data_path)
uxds

### Opening Multiple Data Files

When working with multiple data paths, we can open them using the `ux.open_mfdataset()` method. 

In [None]:
uxds_multi = ux.open_mfdataset(
    grid_path, multiple_data_paths, combine="nested", concat_dim="time"
)
uxds_multi

## Grid Accessor

Each `UxDataset` (and in the next section `UxDataArray`) is linked to a `Grid` instance, which contain the unstructured grid information.

In [None]:
uxds.uxgrid

All the same functionality can be performed using the `uxgrid` attribute as was discussed in the `Grid` sections above.

In [None]:
uxds.uxgrid.dims

## UxDataArray



While a `UxDataset` represents one or more data variables linked to some unstructured grid, a `UxDataArray` represent a single data variable. Alternatively, one can think of a `UxDataset` as a collection of one or more `UxDataArray` instances.

```{info}
More information about `xarray.DataArray` can be found [here](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.html).
```

In our sample dataset, we have a variable called `t2m`, which can be used to index our `UxDataset`


In [None]:
uxds["t2m"]

We can see the relationship between a `UxDataset` and `UxDataArray` by checking the type.

In [None]:
type(uxds), type(uxds["t2m"])

As mentioned before, each `UxDataArray` is linked to a `Grid` instance.

In [None]:
uxds["t2m"].uxgrid

This Grid is identical to the one linked to the `UxDataset`. Regardless of the number of data variables present in the `UxDataset`, they all share a single `Grid` instance. 

In [None]:
uxds["t2m"].uxgrid == uxds.uxgrid

### Functionality

Just like with Xarray, we can perform various operations on our data variable.


In [None]:
uxds["t2m"].min()

In [None]:
uxds["t2m"].mean()

UXarray also provides custom data analysis operators which are explored in further sections of this user guide.

In [None]:
uxds["t2m"].gradient()

## Inheritance from Xarray

For those that are familiar with Xarray, the naming of the methods and data structures looks familiar. UXarray aims to provide a familiar experience to Xarray by inheriting the `xr.Dataset` and `xr.DataArray` objects and linking them to an instance of a `Grid` class to provide grid-aware implementations.

We can observe this inheritance by checking for subclassing.

In [None]:
issubclass(ux.UxDataset, xr.Dataset)

In [None]:
issubclass(ux.UxDataArray, xr.DataArray)

## Overloaded Methods

With subclassing, all methods are directly inherited from the parent class (`xr.Dataset`). Most Xarray functionality works directly on UXarray's data structures, however certain methods have been overloaded to make them unstructured-grid aware.

One example of this is the plotting functionality of a `ux.UxDataArray`, which was re-implemented to support visualuzations of unstructured grids. A detailed overview of plotting functionality can be found in the next sections.

In [None]:
uxds["t2m"].plot(cmap="viridis", backend="bokeh")