mascdb package#
Submodules#
mascdb.api module#
MASCDB API.
- class mascdb.api.MASC_DB(dir_path)[source][source]#
Bases:
objectRead MASCDB database from a specific directory.
- Parameters:
dir_path (str) – Filepath to a directory storing a MASCDB. 5 files are expected in the directory: - MASCdb_cam<0/1/2>.parquet - MASCdb_triplet.parquet - MASCdb.zarr
- Returns:
MASCDB class instance.
- Return type:
MASCDB
Initialize MASC_DB object.
It reads 4 parquet databases as well as the zarr database of MASC greyscale images.
- Returns:
MASCDB class instance.
- Return type:
MASCDB
- add_cam_columns(cam0, cam1, cam2, force=False, complete=True)[source][source]#
Method allowing to safely add columns to cam dataframes of MASCDB.
- Parameters:
cam0 (pandas.DataFrame) – pd.DataFrame with index ‘flake_id’ .
cam1 (pandas.DataFrame) – pd.DataFrame with index ‘flake_id’ .
cam2 (pandas.DataFrame) – pd.DataFrame with index ‘flake_id’ .
force (bool, optional) – Whether to overwrite existing column of mascdb. The default is False.
complete (bool, optional) – Whether to merge only when the cam dataframes have same ‘flake_id’ of the current mascdb. The default is True.
- Returns:
MASCDB class instance
- Return type:
MASCDB
- add_triplet_columns(df, force=False, complete=True)[source][source]#
Method allowing to safely add columns to cam dataframes of MASCDB.
- Parameters:
df (pandas.DataFrame) – pd.DataFrame with index ‘flake_id’ .
force (bool, optional) – Whether to overwrite existing column of mascdb. The default is False.
complete (bool, optional) – Whether to merge only when the provided dataframe has the same ‘flake_id’ of the current mascdb triplet dataframe. The default is True.
- Returns:
MASCDB class instance
- Return type:
MASCDB
- arrange(expression, decreasing=True)[source][source]#
Reorder the MASCDB based on the DataFrame column values specified with expression.
- Parameters:
expression (str) – Expression specifying the DataFrame and column used to sort the MASCDB. The expression must have the following pattern ‘<df_name>.<column_name>’ . Valid df_names are : [‘cam0’, ‘cam1’,’cam2’,’triplet’,’bs’,’env’,’gan3d’,’flake’,’labels’] .
decreasing (bool, optional) – Whether to sort MASCDB by increasing or decreasing values of the DataFrame column. The default is True.
- Returns:
MASCDB object sorted.
- Return type:
MASCDB
- compute_2Dimage_descriptors(fun, labels, fun_kwargs=None, force=False, dask='parallelized')[source][source]#
Compute and add user-specific image descriptors to the CAM dataframes.
It requires the specification of a function (‘fun’) expecting the image 2D array and returning the descriptor(s) value(s). It also require the specification of the expected descriptors names (‘labels’).
- Parameters:
fun (callable) – A function computing the descriptor(s) of a 2D image. The function must expects a grayscale 2D array and return the descriptor(s) value(s).
labels ((str, list)) – String or list of string specifying the descriptor names computed by ‘fun’. These labels will become the columns added to cam dataframe.
fun_kwargs (dict, optional) – Optional arguments to be passed to ‘fun’. The default is None.
force (bool, optional) – force=True enable to overwrite existing descriptors present in the cam dataframes. The default is False.
dask (str, optional) – Option to be passed to xr.apply_u_func. The default is “parallelized”.
- Returns:
MASCDB class instance with new descriptors in cam dataframes.
- Return type:
MASCDB
- discard_melting_class(values, method='Praz2017', df_source='triplet')[source][source]#
Discard MASCDB data with specific melting classes.
- Parameters:
values ((str, int, list)) – Values specifying the melting classes to discard. If integers, it assumes melting_class_id. If strings, it assumes melting_class_name. Valid values can be retrieved by calling ‘mascdb.utils_aux.get_melting_class_name_dict(method)’.
method (str, optional) – Method used to determine melting_class. The default is ‘Praz2017’.
df_source (str, optional) – The dataframe from which retrieve the class. Either ‘cam0’, ‘cam1’, ‘cam2’ or ‘triplet’. The default is ‘triplet’.
- Returns:
MASCDB class instance with specific melting classes.
- Return type:
MASCDB
- discard_precip_class(values, method='Schaer2020')[source][source]#
Discard MASCDB data with specific precipitation types.
- Parameters:
values ((str, int, list)) – Values specifying the precipitation classes to discard. If integers, it assumes bs_precip_class_id. If strings, it assumes bs_precip_class_name. Valid values can be retrieved by calling ‘mascdb.utils_aux.get_precip_class_name_dict(method)’.
method (str, optional) – Method used to determine bs_precip_class. The default is ‘Schaer2020’.
- Returns:
MASCDB class instance with specific precipitation classes.
- Return type:
MASCDB
- discard_riming_class(values, method='Praz2017', df_source='triplet')[source][source]#
Discard MASCDB data with specific riming classes.
- Parameters:
values ((str, int, list)) – Values specifying the riming classes to discard. If integers, it assumes riming_class_id. If strings, it assumes riming_class_name. Valid values can be retrieved by calling ‘mascdb.utils_aux.get_riming_class_name_dict(method)’.
method (str, optional) – Method used to determine riming_class. The default is ‘Praz2017’.
df_source (str, optional) – The dataframe from which retrieve the class. Either ‘cam0’, ‘cam1’, ‘cam2’ or ‘triplet’. The default is ‘triplet’.
- Returns:
MASCDB class instance with specific riming classes.
- Return type:
MASCDB
- discard_snowflake_class(values, method='Praz2017', df_source='triplet')[source][source]#
Discard MASCDB data with specific snowflake classes.
- Parameters:
values ((str, int, list)) – Values specifying the snowflake classes to discard. If integers, it assumes snowflake_class_id. If strings, it assumes snowflake_class_name. Valid values can be retrieved by calling ‘mascdb.utils_aux.get_snowflake_class_name_dict(method)’.
method (str, optional) – Method used to determine snowflake_class. The default is ‘Praz2017’.
df_source (str, optional) – The dataframe from which retrieve the class. Either ‘cam0’, ‘cam1’, ‘cam2’ or ‘triplet’. The default is ‘triplet’.
- Returns:
MASCDB class instance with specific snowflake classes.
- Return type:
MASCDB
- drop_cam_columns(columns)[source][source]#
Method allowing to safely remove columns from all cam dataframes of MASCDB.
- Parameters:
columns (list) – List with column names of MASCDB cam dataframes to be removed
- Returns:
MASCDB class instance
- Return type:
MASCDB
- drop_triplet_columns(columns)[source][source]#
Method allowing to safely remove columns from the MASCDB triplet dataframe.
- Parameters:
columns (list) – List with column names of cam dataframes of MASCDB to be removed
- Returns:
MASCDBclass instance
- Return type:
MASCDB
- ds_images(cam_id=None, campaign=None, img_id='img_id')[source][source]#
Return xarray DataArray of images.
- get_var_explanation(varname)[source][source]#
Get verbose explanation of a given variable.
It includes DOI of reference paper whenever relevant.
- isel(idx)[source][source]#
Positional-index subsetting of MASCDB DataArray and MASCDB DataFrames.
- Parameters:
idx ((numpy.ndarray, list, int)) – List or np.ndarray of integer/boolean values used as positional indices for subsetting.
- Returns:
MASCDB class instance subsetted class instance subsetted (or index-based reordered).
- Return type:
MASCDB
- property labels[source]#
Dataframe of hydrometeor classification, riming and melting attributes (Praz et al 2017).
- plot_flake(cam_id=None, index=None, random=False, enhancement='histogram_equalization', zoom=True, squared=True, ax=None, **kwargs)[source][source]#
Plotting routine to display a specific MASC snowflake image.
By default: - The image is enhanced with histogram_equalization and zoomed. - ‘random’ is effective only if ‘index’ is not specified. - If index is unspecified, it plot an image of the first MASCDB triplet.
- Parameters:
cam_id (int, optional) – The camera from which display the snowflake image. If not specified, the camera is randomly chosen. Valid cam_id values are 0, 1 and 2. The default is None.
index (int, optional) – Row index of the MASCDB triplet image to display. The default is None.
random (bool, optional) – Specify if the displayed MASCDB image must be chosen randomly. It’s effective only if ‘index’ is not specified. The default is False.
enhancement (str, optional) – Type of enhancement to use to improve the image quality. Valid enhancements are : [None, “histogram_equalization”, “contrast_stretching”, “local_equalization”] The default is “histogram_equalization”.
zoom (bool, optional) – Specify if zooming close to the snowflake bounding box. The image shape is defined by selecting the smallest possible shape to include the entire snowflake. The default is True.
squared (bool, optional) – Specify if the zoomed images must have equal height,width. The default is True.
ax (matplotlib.axes.Axes, optional) – Optional matplotlib axis on which to plot the image. The default is None.
**kwargs (dict) – Optional arguments to be passed to DataArray.plot.
- plot_flakes(cam_id=None, indices=None, random=False, n_images=9, col_wrap=3, enhancement='histogram_equalization', zoom=True, squared=True, hspace=0.1, wspace=0.1, **kwargs)[source][source]#
Plotting routine to display MASC snowflake images.
By default: - images are enhanced with histogram_equalization and zoomed. - ‘n_images’ and ‘random’ are effective only if ‘indices’ are not specified. - If indices are unspecified: * If cam_id is unspecified: it displays the first ‘n_images’ from a randomly selected camera of MASCDB. * If cam_id specify 1 camera: it displays the first ‘n_images’ of the specified camera of MASCDB. * If cam_id specifies more than 1 camera: it displays the first ‘n_images’ of each of the specified camera of MASCDB.
- Parameters:
cam_id ((int, list), optional) –
- The camera(s) from which display the snowflake images.
If not specified, a single camera is randomly chosen. If specified, it can be any subset of the 3 camera. Valid cam_id values are 0, 1 and 2. The default is None.
- indices(int, list), optional
Integer list of rows to display. The default is None.
random (bool, optional) – Specify if the displayed MASCDB images must be chosen randomly. It’s effective only if ‘indices’ are not specified. The default is False.
n_images (int, optional) – Specify the number of MASCDB images to be displayed for each camera. It’s effective only if ‘indices’ are not specified. The default is 1.
enhancement (str, optional) – Type of enhancement to use to improve the image quality. Valid enhancements are : [None, “histogram_equalization”, “contrast_stretching”, “local_equalization”] The default is “histogram_equalization”.
zoom (bool, optional) – Specify if zooming close to the snowflake bounding box. The image shape is defined by selecting the smallest possible shapes across all the snowflakes to be plotted. The default is True.
squared (bool, optional) – Specify if the zoomed images must have equal height,width. The default is True.
hspace (float) – Define the space across images in the vertical dimension. The default is 0.1.
wspace (float) – Define the space across images in the horizontal dimension. The default is 0.1.
**kwargs (dict) – Optional arguments to be passed to DataArray.plot.
- Returns:
FacetGrid object for additional customization
- Return type:
- plot_triplets(indices=None, random=False, n_triplets=1, enhancement='histogram_equalization', zoom=True, squared=True, wspace=0.01, hspace=0.01, **kwargs)[source][source]#
Plotting routine to display specific triplets of MASC snowflake images.
By default: - images are enhanced with histogram_equalization and zoomed. - ‘n_triplets’ and ‘random’ are effective only if ‘indices’ are not specified. - If indices are unspecified, the chosen triplets correspond to the first ‘n_triplets’ of MASCDB.
- Parameters:
indices ((int, list), optional) – Integer list of rows to display. The default is None.
random (bool, optional) – Specify if the displayed MASCDB triplets must be chosen randomly. It’s effective only if ‘indices’ are not specified. The default is False.
n_triplets (int, optional) – Specify the number of MASCDB triplets to be displayed. It’s effective only if ‘indices’ are not specified. The default is 1.
enhancement (str, optional) – Type of enhancement to use to improve the image quality. Valid enhancements are : [None, “histogram_equalization”, “contrast_stretching”, “local_equalization”] The default is “histogram_equalization”.
zoom (bool, optional) – Specify if zooming close to the snowflake bounding box. The image shape is defined by selecting the smallest possible shapes across all the snowflakes to be plotted The default is True.
squared (bool, optional) – Specify if the zoomed images must have equal height,width. The default is True.
hspace (float) – Define the space across images in the vertical dimension. The default is 0.01.
wspace (float) – Define the space across images in the horizontal dimension. The default is 0.01.
**kwargs (dict) – Optional arguments to be passed to DataArray.plot.
- Returns:
FacetGrid object for additional customization
- Return type:
- redefine_events(max_interval_without_images=None, min_duration=None, max_duration=None, min_n_triplets=None, max_n_triplets=None, unit='ns')[source][source]#
Enable selection and custom definition of an ‘event’.
If <min/max>_<duration/n_triplets> are specified, the MASCDB will likely be subsetted.
- Parameters:
max_interval_without_images ((numpy.timedelta64, pandas.Timedelta), optional) – Maximum interval of time without images to consider consecutive images to belong the same event. The default is np.timedelta64(4,’h’).
min_duration ((numpy.timedelta64, pandas.Timedelta), optional) – Minimum duration of an event to retained. The default is numpy.timedelta64(0,’ns’).
max_duration ((numpy.timedelta64, pandas.Timedelta), optional) – Maximum duration of an event to retained. The default is numpy.timedelta64(365,’D’).
min_n_triplets (int, optional) – Minimum number of triplets within an event to retain the event. The default is 0.
max_n_triplets (int, optional) – Maximum number of triplets within an event to retain the event.. The default is Inf.
unit (str, optional) – Unit of timedelta to consider for events definition. The default is “ns”.
- Returns:
MASCDB class instance with the custom event definition.
- Return type:
MASCDB
- save(dir_path, force=False)[source][source]#
Save MASCDB object to disk into 4 parquet files and one Zarr store.
- sel(flake_ids)[source][source]#
Subset MASCDB based on specified flake_ids.
- Parameters:
flake_ids (numpy.ndarray, list, str) – List or np.ndarray of string specifying flake_id values to subset.
- Returns:
MASCDB class instance subsetted.
- Return type:
MASCDB
- select_events_longest(n=1)[source][source]#
Select MASCDB data corresponding to the ‘n’ events with longest duration.
- Parameters:
n (int, optional) – The number of events to retrieve. The default is 1.
- Returns:
MASCDB class instance
- Return type:
MASCDB
- select_events_shortest(n=1)[source][source]#
Select MASCDB data corresponding to the ‘n’ events with shortest duration.
- Parameters:
n (int, optional) – The number of events to retrieve. The default is 1.
- Returns:
MASCDB class instance
- Return type:
MASCDB
- select_events_with_duration(min=None, max=None)[source][source]#
Select events with duration between min and max.
- Parameters:
min ((numpy.timedelta64, pandas.Timedelta), optional) – Minimum duration. The default is 0 ns.
max ((numpy.timedelta64, pandas.Timedelta), optional) – Maximum duration. The default is 1 year.
- Returns:
MASCDB class instance
- Return type:
MASCDB
- select_events_with_n_triplets(min=0, max=inf)[source][source]#
Select events with number of triplets between min and max.
- select_max(expression, n=10)[source][source]#
Select ‘n’ triplets with maximum values of a given DataFrame column.
- select_melting_class(values, method='Praz2017', invert=False, df_source='triplet')[source][source]#
Select MASCDB data with specific melting classes.
- Parameters:
values ((str, int, list)) – Values specifying the melting classes to select. If integers, it assumes melting_class_id. If strings, it assumes melting_class_name. Valid values can be retrieved by calling ‘mascdb.utils_aux.get_melting_class_name_dict(method)’.
method (str, optional) – Method used to determine melting_class. The default is ‘Praz2017’.
invert (bool, optional) – If True, instead of selecting it discard the specified melting_class_id. The default is False.
df_source (str, optional) – The dataframe from which retrieve the class. Either ‘cam0’, ‘cam1’, ‘cam2’ or ‘triplet’. The default is ‘triplet’.
- Returns:
MASCDB class instance with specific melting classes.
- Return type:
MASCDB
- select_min(expression, n=10)[source][source]#
Select ‘n’ triplets with minimum values of a given DataFrame column.
- select_precip_class(values, method='Schaer2020', invert=False)[source][source]#
Select MASCDB data with specific precipitation types.
- Parameters:
values ((str, int, list)) – Values specifying the precipitation classes to select. If integers, it assumes bs_precip_class_id. If strings, it assumes bs_precip_class_name. Valid values can be retrieved by calling ‘mascdb.utils_aux.get_precip_class_name_dict(method)’.
method (str, optional) – Method used to determine bs_precip_class. The default is ‘Schaer2020’.
invert (bool, optional) – If True, instead of selecting it discard the specified bs_precip_class. The default is False.
- Returns:
MASCDB class instance with specific precipitation classes.
- Return type:
MASCDB
- select_riming_class(values, method='Praz2017', invert=False, df_source='triplet')[source][source]#
Select MASCDB data with specific riming classes.
- Parameters:
values ((str, int, list)) – Values specifying the riming classes to select. If integers, it assumes riming_class_id. If strings, it assumes riming_class_name. Valid values can be retrieved by calling ‘mascdb.utils_aux.get_riming_class_name_dict(method)’.
method (str, optional) – Method used to determine riming_class. The default is ‘Praz2017’.
invert (bool, optional) – If True, instead of selecting it discard the specified riming_class. The default is False.
df_source (str, optional) – The dataframe from which retrieve the class. Either ‘cam0’, ‘cam1’, ‘cam2’ or ‘triplet’. The default is ‘triplet’.
- Returns:
MASCDB class instance with specific riming classes.
- Return type:
MASCDB
- select_snowflake_class(values, method='Praz2017', invert=False, df_source='triplet')[source][source]#
Select MASCDB data with specific snowflake classes.
- Parameters:
values ((str, int, list)) – Values specifying the snowflake classes to select. If integers, it assumes snowflake_class_id. If strings, it assumes snowflake_class_name. Valid values can be retrieved by calling ‘mascdb.utils_aux.get_snowflake_class_name_dict(method)’.
method (str, optional) – Method used to determine snowflake_class. The default is ‘Praz2017’.
invert (bool, optional) – If True, instead of selecting it discard the specified snowflake_class. The default is False.
df_source (str, optional) – The dataframe from which retrieve the class. Either ‘cam0’, ‘cam1’, ‘cam2’ or ‘triplet’. The default is ‘triplet’.
- Returns:
MASCDB class instance with specific snowflake classes.
- Return type:
MASCDB
mascdb.pd_sns_accessor module#
Pandas DataFrame Seaborn Accessor.
- class mascdb.pd_sns_accessor.SeabornAccessor(pandas_obj)[source][source]#
Bases:
objectPandas DataFrame accessor for Seaborn plotting functionality.
This accessor provides convenient access to Seaborn plotting functions directly from pandas DataFrames using the .sns attribute. It dynamically adds all common Seaborn plotting methods and provides additional custom visualization methods.
Examples
>>> import pandas as pd >>> df = pd.DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]}) >>> df.sns.scatterplot(x="x", y="y") >>> df.sns.corrplot()
Notes
Available Seaborn methods include: boxplot, violinplot, boxenplot, swarmplot, stripplot, pointplot, lmplot, pairplot, scatterplot, relplot, lineplot, displot, catplot, barplot, histplot, jointplot, and kdeplot.
- corrplot(vars=None, vmin=-0.3, vmax=0.3, center=0, cbar_kws=None, linewidths=0.5)[source][source]#
Create a correlation matrix heatmap with lower triangle display.
Computes the correlation matrix of the DataFrame and visualizes it as a heatmap with a mask for the upper triangle, showing only the lower triangle of correlations.
- Parameters:
vars (list of str, optional) – List of column names to include in the correlation matrix. If None, uses all numeric columns. Default is None.
vmin (float, optional) – Minimum value for colormap normalization. Default is -0.3.
vmax (float, optional) – Maximum value for colormap normalization. Default is 0.3.
center (float, optional) – Value at which to center the colormap. Default is 0.
cbar_kws (dict, optional) – Keyword arguments for the colorbar. Default is {“shrink”: 0.5}.
linewidths (float, optional) – Width of lines separating cells in the heatmap. Default is 0.5.
- Returns:
The matplotlib figure object containing the correlation heatmap.
- Return type:
Examples
>>> df.sns.corrplot() >>> df.sns.corrplot(vars=["col1", "col2", "col3"], vmin=-1, vmax=1)
References
https://seaborn.pydata.org/examples/many_pairwise_correlations.html
- kde_marginals(x, y, xlim=None, ylim=None, space=0, thresh=0, levels=100, cmap='rocket', hist_color='#03051A', hist_alpha=1, hist_bins=25)[source][source]#
Create a bivariate KDE plot with marginal histograms.
Produces a joint plot showing a smooth bivariate kernel density estimate (KDE) in the center with marginal histograms along the x and y axes.
- Parameters:
x (str) – Name of the column to plot on the x-axis.
y (str) – Name of the column to plot on the y-axis.
xlim (tuple of float, optional) – Limits for the x-axis as (min, max). Default is None.
ylim (tuple of float, optional) – Limits for the y-axis as (min, max). Default is None.
space (float, optional) – Space between the joint and marginal axes. Default is 0.
thresh (float, optional) – Threshold for the KDE contours. Values below this are not drawn. Default is 0.
levels (int, optional) – Number of contour levels for the KDE plot. Default is 100.
cmap (str, optional) – Colormap name for the KDE plot. Default is “rocket”.
hist_color (str, optional) – Color for the marginal histograms. Default is “#03051A”.
hist_alpha (float, optional) – Alpha (transparency) for the marginal histograms. Default is 1.
hist_bins (int, optional) – Number of bins for the marginal histograms. Default is 25.
- Returns:
The JointGrid object containing the bivariate and marginal plots.
- Return type:
Examples
>>> df.sns.kde_marginals(x="col1", y="col2") >>> df.sns.kde_marginals(x="col1", y="col2", cmap="viridis", hist_bins=50)
References
https://seaborn.pydata.org/examples/smooth_bivariate_kde.html
- kde_ridgeplot(x, group, pal=None, bw_adjust=0.5, height=0.5, aspect=15, hspace=-0.25, linewidth=2)[source][source]#
Create a ridge plot (joyplot) showing KDE distributions for groups.
Produces overlapping kernel density estimates for different groups, creating a “ridge” or “joyplot” visualization useful for comparing distributions across multiple categories.
- Parameters:
x (str) – Name of the column containing the values to plot.
group (str) – Name of the column containing the grouping variable.
pal (list or palette, optional) – Color palette for the groups. If None, uses a cubehelix palette. Default is None.
bw_adjust (float, optional) – Bandwidth adjustment factor for the KDE. Higher values produce smoother curves. Default is 0.5.
height (float, optional) – Height of each facet in inches. Default is 0.5.
aspect (float, optional) – Aspect ratio of each facet (width/height). Default is 15.
hspace (float, optional) – Space between facets. Negative values create overlap. Default is -0.25.
linewidth (float, optional) – Width of the KDE lines. Default is 2.
- Returns:
The FacetGrid object containing the ridge plot.
- Return type:
Examples
>>> df.sns.kde_ridgeplot(x="value", group="category") >>> df.sns.kde_ridgeplot(x="value", group="category", bw_adjust=1.0, hspace=-0.5)
References
mascdb.utils_aux module#
MASCDB auxiliary functions.
- mascdb.utils_aux.get_campaign_colors_dict()[source][source]#
Get a dictionary mapping campaign names to colors.
- mascdb.utils_aux.get_melting_class_id_colors_dict(method='Praz2017')[source][source]#
Get color mapping for melting class IDs.
Returns a dictionary mapping melting class IDs to color names for visualization.
- mascdb.utils_aux.get_melting_class_id_dict(method='Praz2017')[source][source]#
Get melting class name mapping from class ID.
Returns a dictionary mapping melting class IDs to their corresponding names. This is the inverse of get_melting_class_name_dict.
- mascdb.utils_aux.get_melting_class_name_colors_dict(method='Praz2017')[source][source]#
Get color mapping for melting class names.
Returns a dictionary mapping melting class names to color names for visualization.
- Parameters:
method (str, optional) – Hydrometeor classification method. Default is “Praz2017”.
- Returns:
Dictionary mapping class names (str) to color names (str).
- Return type:
- Raises:
ValueError – If the specified method is not available.
- mascdb.utils_aux.get_melting_class_name_dict(method='Praz2017')[source][source]#
Get melting class ID mapping from class name.
Returns a dictionary mapping melting class names to their corresponding integer IDs according to the specified classification method.
- Parameters:
method (str, optional) – Hydrometeor classification method. Default is “Praz2017”.
- Returns:
Dictionary mapping class names (str) to class IDs (int). For “Praz2017” method, includes: - “dry”: 0 - “melting”: 1
- Return type:
- Raises:
ValueError – If the specified method is not available.
- mascdb.utils_aux.get_precip_class_id_colors_dict(method='Schaer2020')[source][source]#
Get color mapping for precipitation class IDs.
Returns a dictionary mapping precipitation class IDs to color names for visualization.
- mascdb.utils_aux.get_precip_class_id_dict(method='Schaer2020')[source][source]#
Get precipitation class name mapping from class ID.
Returns a dictionary mapping precipitation class IDs to their corresponding names. This is the inverse of get_precip_class_name_dict.
- mascdb.utils_aux.get_precip_class_name_colors_dict(method='Praz2017')[source][source]#
Get color mapping for precipitation class names.
Returns a dictionary mapping precipitation class names to color names for visualization.
- Parameters:
method (str, optional) – Precipitation classification method. Default is “Praz2017”.
- Returns:
Dictionary mapping class names (str) to color names (str).
- Return type:
- Raises:
ValueError – If the specified method is not available.
- mascdb.utils_aux.get_precip_class_name_dict(method='Schaer2020')[source][source]#
Get precipitation class ID mapping from class name.
Returns a dictionary mapping precipitation class names to their corresponding integer IDs according to the specified classification method.
- Parameters:
method (str, optional) – Precipitation classification method. Default is “Schaer2020”.
- Returns:
Dictionary mapping class names (str) to class IDs (int). For “Schaer2020” method, includes: - “undefined”: 0 - “precip”: 1 - “mixed”: 2 - “blowing_snow”: 3
- Return type:
- Raises:
ValueError – If the specified method is not available.
- mascdb.utils_aux.get_riming_class_id_colors_dict(method='Praz2017')[source][source]#
Get color mapping for riming class IDs.
Returns a dictionary mapping riming class IDs to color names for visualization.
- mascdb.utils_aux.get_riming_class_id_dict(method='Praz2017')[source][source]#
Get riming class name mapping from class ID.
Returns a dictionary mapping riming class IDs to their corresponding names. This is the inverse of get_riming_class_name_dict.
- mascdb.utils_aux.get_riming_class_name_colors_dict(method='Praz2017')[source][source]#
Get color mapping for riming class names.
Returns a dictionary mapping riming class names to color names for visualization.
- Parameters:
method (str, optional) – Hydrometeor classification method. Default is “Praz2017”.
- Returns:
Dictionary mapping class names (str) to color names (str).
- Return type:
- Raises:
ValueError – If the specified method is not available.
- mascdb.utils_aux.get_riming_class_name_dict(method='Praz2017')[source][source]#
Get riming class ID mapping from class name.
Returns a dictionary mapping riming class names to their corresponding integer IDs according to the specified classification method.
- Parameters:
method (str, optional) – Hydrometeor classification method. Default is “Praz2017” based on https://amt.copernicus.org/articles/10/1335/2017/.
- Returns:
Dictionary mapping class names (str) to class IDs (int). For “Praz2017” method, includes: - “undefined”: 0 - “unrimed”: 1 - “rimed”: 2 - “densely_rimed”: 3 - “graupel-like”: 4 - “graupel”: 5
- Return type:
- Raises:
ValueError – If the specified method is not available.
- mascdb.utils_aux.get_snowflake_class_id_colors_dict(method='Praz2017')[source][source]#
Get color mapping for snowflake class IDs.
Returns a dictionary mapping snowflake class IDs to color names for visualization.
- mascdb.utils_aux.get_snowflake_class_id_dict(method='Praz2017')[source][source]#
Get snowflake class name mapping from class ID.
Returns a dictionary mapping snowflake class IDs to their corresponding names. This is the inverse of get_snowflake_class_name_dict.
- mascdb.utils_aux.get_snowflake_class_name_colors_dict(method='Praz2017')[source][source]#
Get color mapping for snowflake class names.
Returns a dictionary mapping snowflake class names to color names for visualization.
- Parameters:
method (str, optional) – Hydrometeor classification method. Default is “Praz2017”.
- Returns:
Dictionary mapping class names (str) to color names (str).
- Return type:
- Raises:
ValueError – If the specified method is not available.
- mascdb.utils_aux.get_snowflake_class_name_dict(method='Praz2017')[source][source]#
Get snowflake class ID mapping from class name.
Returns a dictionary mapping snowflake class names to their corresponding integer IDs according to the specified classification method.
- Parameters:
method (str, optional) – Hydrometeor classification method. Default is “Praz2017” based on https://amt.copernicus.org/articles/10/1335/2017/.
- Returns:
Dictionary mapping class names (str) to class IDs (int). For “Praz2017” method, includes: - “small_particle”: 1 - “columnar_crystal”: 2 - “planar_crystal”: 3 - “aggregate”: 4 - “graupel”: 5 - “columnar_planar_combination”: 6
- Return type:
- Raises:
ValueError – If the specified method is not available.
- mascdb.utils_aux.get_vars_blowing_snow()[source][source]#
Retrieve the list of all blowing snow variables.
Returns variable names related to blowing snow classification and mixing indices.
- Returns:
List of blowing snow variable names including normalized angle, mixing index, and precipitation class identifiers.
- Return type:
References
- mascdb.utils_aux.get_vars_cam_descriptors()[source][source]#
Retrieve the list of MASC camera descriptors.
Returns variable names for geometric, morphological, and textural descriptors computed from individual camera ROI (Region of Interest) images.
- Returns:
Comprehensive list of descriptor variable names including shape features, texture features, symmetry features, and complexity measures.
- Return type:
References
Descriptors are detailed in Appendix A of: https://doi.org/10.5194/amt-10-1335-2017
- mascdb.utils_aux.get_vars_cam_info()[source][source]#
Retrieve the list of MASC camera information variables.
Returns variable names for metadata about individual camera captures.
- mascdb.utils_aux.get_vars_class()[source][source]#
Retrieve the list of all classification variables.
Returns variable names for snowflake classification including riming, melting, and hydrometeor type classifications with associated probabilities.
- mascdb.utils_aux.get_vars_class_ids()[source][source]#
Retrieve the list of class ID variables.
Returns variable names for integer class identifiers.
- mascdb.utils_aux.get_vars_class_names()[source][source]#
Retrieve the list of class name variables.
Returns variable names for string class labels.
- mascdb.utils_aux.get_vars_env()[source][source]#
Retrieve the list of all environmental variables.
Returns variable names for meteorological measurements in the proximity of the MASC instrument.
- mascdb.utils_aux.get_vars_gan3d()[source][source]#
Retrieve the list of all GAN3D variables.
Returns variable names related to 3D mass, volume, and gyration estimates from the GAN3D method.
- Returns:
List of GAN3D variable names: [‘gan3d_mass’, ‘gan3d_volume’, ‘gan3d_gyration’].
- Return type:
References
- mascdb.utils_aux.get_vars_location()[source][source]#
Retrieve the list of all location variables.
Returns variable names for spatiotemporal information about measurements.
- mascdb.utils_aux.var_explanations()[source][source]#
Get dictionary containing verbose explanations of MASC descriptors.
Provides detailed descriptions for all MASC (Multi-Angle Snowflake Camera) descriptor variables, including references to relevant publications.
- Returns:
Dictionary mapping variable names (str) to their detailed explanations (str). Explanations include physical meaning, calculation methods, and references to scientific publications where applicable.
- Return type:
- mascdb.utils_aux.var_units()[source][source]#
Return dictionary containing units of MASC descriptors.
Provides a comprehensive mapping of all MASC (Multi-Angle Snowflake Camera) descriptor variable names to their physical units.
- Returns:
Dictionary mapping variable names (str) to their units (str). Common units include: ‘m’ (meters), ‘deg’ (degrees), ‘-’ (dimensionless), ‘pix’ (pixels), ‘class’ (classification), ‘boolean’, etc.
- Return type:
mascdb.utils_env module#
MASCDB auxiliary environmental functions.
- mascdb.utils_env.wet_bulb_t(t, rh)[source][source]#
Returns Wet bulb temperature estimated from T and RH.
- Parameters:
t (float, int, list or numpy.ndarray) – Temperature in degree Celsius
rh (float, int, list or numpy.ndarray) – Relative humidity in percentage
- Returns:
Wet bulb temperature in °C, same data type as as t/rh
- Return type:
array-like
mascdb.utils_event module#
Utilities for event definition.
mascdb.utils_figs module#
MASCDB Visualization Utilities.
- mascdb.utils_figs.cm2inch(*tupl)[source][source]#
Convert centimeters to inches.
- Parameters:
*tupl (float or tuple of float) – Dimensions in centimeters. Can be individual numbers or a single tuple.
- Returns:
Dimensions converted to inches.
- Return type:
Examples
>>> cm2inch(10, 20) (3.937, 7.874) >>> cm2inch((10, 20)) (3.937, 7.874)
- mascdb.utils_figs.get_c_cmap_from_color_dict(color_dict, labels)[source][source]#
Create color mapping and colormap for scatter plots from a color dictionary.
This function converts a dictionary of label-to-color mappings into integer color indices and a matplotlib colormap suitable for use with plt.scatter.
- Parameters:
color_dict (dict) – Dictionary mapping labels to color names or hex values.
labels (array-like) – Array of labels corresponding to data points.
- Returns:
A list containing [c, cmap] where:
- cnumpy.ndarray
Integer array of color indices for each label.
- cmapmatplotlib.colors.ListedColormap
Colormap object with unique colors from the dictionary.
- Return type:
Examples
>>> color_dict = {"A": "red", "B": "blue", "C": "green"} >>> labels = ["A", "B", "A", "C"] >>> c, cmap = get_c_cmap_from_color_dict(color_dict, labels)
- mascdb.utils_figs.get_colors_from_cmap(x, cmap_name='Spectral', vmin=None, vmax=None, nan_color=None)[source][source]#
Map numeric values to colors using a matplotlib colormap.
This function converts numeric values to hexadecimal color codes based on a specified colormap. It handles both arrays and dictionaries, and provides special handling for NaN values.
- Parameters:
x (array-like, dict, or float) – Numeric values to map to colors. Can be a single value, array, or dictionary where values are numeric.
cmap_name (str, optional) – Name of the matplotlib colormap to use. Default is “Spectral”. Examples: “Spectral”, “viridis”, “plasma”, “Wistia”, “Green”.
vmin (float, optional) – Minimum value for colormap normalization. Default is None.
vmax (float, optional) – Maximum value for colormap normalization. Default is None.
nan_color (str, optional) – Hexadecimal color code to use for NaN values. If None, NaN values are marked as “NaN” string. Default is None.
- Returns:
Hexadecimal color codes corresponding to input values. Returns a dictionary if input was a dictionary, otherwise returns an array.
- Return type:
Examples
>>> get_colors_from_cmap(2, cmap_name="Spectral", vmin=0, vmax=8) '#...' # hex color >>> get_colors_from_cmap([0, 2, 3, 5], cmap_name="Wistia", vmin=0, vmax=5) array(['#...', '#...', '#...', '#...'], dtype='<U7') >>> get_colors_from_cmap({"a": 1, "b": 5}, cmap_name="viridis", vmin=0, vmax=10) {'a': '#...', 'b': '#...'}
Notes
NaN values in the input are preserved and can be assigned a custom color using the nan_color parameter.
- mascdb.utils_figs.get_legend_handles_from_colors_dict(colors_dict, marker='o')[source][source]#
Create legend handles from a color dictionary for matplotlib legends.
This function generates legend handles that can be used with matplotlib.pyplot.legend to create custom legend entries based on a color dictionary.
- Parameters:
colors_dict (dict) – Dictionary mapping labels to colors (color names or hex values).
marker (str, optional) – Marker style for legend entries. Default is “o” (filled circle). Options include: - “o” : filled circle - “s” : filled square - “PATCH” : filled large rectangle - Any valid matplotlib marker
- Returns:
List of matplotlib handle objects (Line2D or Patch) for use in legend.
- Return type:
Examples
>>> colors = {"Category A": "red", "Category B": "blue"} >>> handles = get_legend_handles_from_colors_dict(colors, marker="s") >>> plt.legend(handles=handles)
mascdb.utils_img module#
Utilities for image processing.
- mascdb.utils_img.apply_2Dimage_fun(da, fun, x='x', y='y', fun_kwargs=None)[source][source]#
Apply a function to each 2D image in a DataArray.
This function applies a user-defined function to 2D images stored in a xarray DataArray. It handles DataArrays with multiple dimensions by stacking and unstacking as needed, ensuring the function is applied to each 2D image independently.
- Parameters:
da (xarray.DataArray) – Input DataArray containing 2D images.
fun (callable) – Function to apply to each 2D image. Should accept a 2D numpy array and return a 2D numpy array.
x (str, optional) – Name of the width dimension. Default is “x”.
y (str, optional) – Name of the height dimension. Default is “y”.
fun_kwargs (dict, optional) – Additional keyword arguments to pass to the function. Default is None.
- Returns:
DataArray with the function applied to each 2D image, maintaining original dimensions.
- Return type:
- Raises:
TypeError – If da is not a xarray.DataArray or if x/y are not strings.
ValueError – If x or y are not dimensions of the DataArray.
- mascdb.utils_img.xri_contrast_stretching(da, x='x', y='y', pmin=2, pmax=98)[source][source]#
Apply contrast stretching to 2D images using percentile-based intensity rescaling.
Contrast stretching improves image contrast by remapping pixel intensities based on specified percentiles, expanding the dynamic range of the image.
- Parameters:
da (xarray.DataArray) – Input DataArray containing 2D images.
x (str, optional) – Name of the width dimension. Default is “x”.
y (str, optional) – Name of the height dimension. Default is “y”.
pmin (float, optional) – Lower percentile for intensity remapping. Default is 2.
pmax (float, optional) – Upper percentile for intensity remapping. Default is 98.
- Returns:
DataArray with contrast-stretched images.
- Return type:
Notes
Zero-valued pixels are preserved and not affected by the stretching operation.
- mascdb.utils_img.xri_hist_equalization(da, x='x', y='y', nbins=256, adaptive=False, kernel_size=None, clip_limit=0.01)[source][source]#
Apply global or adaptive histogram equalization to 2D images.
Histogram equalization enhances image contrast by redistributing pixel intensities to approximate a uniform distribution. Adaptive equalization (CLAHE) computes histograms over local tile regions for better enhancement of local details.
- Parameters:
da (xarray.DataArray) – Input DataArray containing 2D images.
x (str, optional) – Name of the width dimension. Default is “x”.
y (str, optional) – Name of the height dimension. Default is “y”.
nbins (int, optional) – Number of bins for image histogram. Ignored for integer images where each integer is its own bin. Default is 256.
adaptive (bool, optional) – If False, uses classical histogram equalization. If True, uses Contrast Limited Adaptive Histogram Equalization (CLAHE). Default is False.
kernel_size (int or array-like, optional) – Shape of contextual regions used in CLAHE algorithm. By default, uses 1/8 of image height by 1/8 of image width.
clip_limit (float, optional) – Clipping limit for CLAHE, normalized between 0 and 1. Higher values give more contrast. Default is 0.01.
- Returns:
DataArray with histogram-equalized images.
- Return type:
Notes
Zero-valued pixels are preserved and not affected by the equalization.
- mascdb.utils_img.xri_local_hist_equalization(da, x='x', y='y', footprint=None)[source][source]#
Equalize images using local histograms with a specified neighborhood footprint.
This function performs histogram equalization using local histograms computed over a neighborhood defined by the footprint parameter. This allows for better enhancement of local details compared to global histogram equalization.
- Parameters:
da (xarray.DataArray) – Input DataArray containing 2D images with dtype uint8 or uint16.
x (str, optional) – Name of the width dimension. Default is “x”.
y (str, optional) – Name of the height dimension. Default is “y”.
footprint (numpy.ndarray, optional) – The neighborhood expressed as an ndarray of 1’s and 0’s. By default, uses a rectangle of size 1/8 of image height and width. Custom footprints can be generated using skimage.morphology functions (e.g., rectangle, disk, square, star, diamond, octagon).
- Returns:
DataArray with locally equalized images.
- Return type:
Notes
Zero-valued pixels are preserved and not affected by the equalization. The input images should have dtype uint8 or uint16.
- mascdb.utils_img.xri_zoom(da, x='x', y='y', squared=False)[source][source]#
Zoom into 2D images by cropping to non-zero regions and centering.
This function removes zero-valued borders from images, crops to the smallest bounding box containing all non-zero pixels, and centers the result. Optionally creates square images.
- Parameters:
da (xarray.DataArray) – Input DataArray containing 2D images.
x (str, optional) – Name of the width dimension. Default is “x”.
y (str, optional) – Name of the height dimension. Default is “y”.
squared (bool, optional) – If True, output images will be square (same height and width). If False, output images maintain their aspect ratio. Default is False.
- Returns:
DataArray with zoomed and centered images.
- Return type:
- Raises:
TypeError – If da is not a xarray.DataArray or if x/y are not strings.
ValueError – If x or y are not dimensions of the DataArray.
Module contents#
MASCDB.