geochemistrypi.data_mining.plot package

Submodules

geochemistrypi.data_mining.plot.geochemistry_plot module

geochemistrypi.data_mining.plot.map_plot module

map_projected_by_basemap(col: Series, name_column: str, longitude: DataFrame, latitude: DataFrame) None[source]

Project an element data into world map using basemap.

Parameters:
  • col (pd.Series) – One selected column from the data sheet.

  • longitude (pd.DataFrame) – Longitude data of data items.

  • latitude (pd.DataFrame) – Latitude data of data items.

map_projected_by_cartopy(col: Series, name_column: str, longitude: DataFrame, latitude: DataFrame) None[source]

Project an element data into world map using cartopy.

Parameters:
  • col (pd.Series) – One selected column from the data sheet.

  • longitude (pd.DataFrame) – Longitude data of data items.

  • latitude (pd.DataFrame) – Latitude data of data items.

process_world_map(data: DataFrame, name_column: str) None[source]

The process of projecting the data on the world map.

geochemistrypi.data_mining.plot.statistic_plot module

basic_statistic(data: DataFrame) None[source]

Some basic statistic information of the designated data set.

Parameters:

data (pd.DataFrame) – The data set.

check_missing_value(data: DataFrame) bool[source]

Check whether the data set has null value or not.

Parameters:

data (pd.DataFrame) – The data set.

Returns:

flag – True if it has null value.

Return type:

bool

correlation_plot(col: Index, df: DataFrame, name_column: str) None[source]

A heatmap describing the correlation between the required columns.

Parameters:
  • col (pd.Index) – A list of columns that need to plot.

  • df (pd.DataFrame) – The data set.

distribution_plot(col: Index, df: DataFrame, name_column: str) None[source]

The histogram containing the respective distribution subplots of the required columns.

Parameters:
  • col (pd.Index) – A list of columns that need to plot.

  • df (pd.DataFrame) – The data set.

is_null_value(data: DataFrame) None[source]

Check whether the data set has null value or not.

Parameters:

data (pd.DataFrame) – The data set.

log_distribution_plot(col: Index, df: DataFrame, name_column: str) None[source]

The histogram containing the respective distribution subplots after log transformation of the required columns.

Parameters:
  • col (pd.Index) – A list of columns that need to plot.

  • df (pd.DataFrame) – The data set.

probability_plot(col: Index, df_origin: DataFrame, df_impute: DataFrame, name_column: str) None[source]

A large graph containing the respective probability plots (origin vs. impute) of the required columns.

Parameters:
  • col (pd.Index) – A list of columns that need to plot.

  • df_origin (pd.DataFrame (n_samples, n_components)) – The original dataset with missing value.

  • df_impute (pd.DataFrame (n_samples, n_components)) – The dataset after imputation.

ratio_null_vs_filled(data: DataFrame) None[source]

The ratio of the null values in each column.

Parameters:

data (pd.DataFrame) – The data set.

Module contents