geochemistrypi.data_mining.utils package

Submodules

geochemistrypi.data_mining.utils.base module

check_package(package_name: str) bool[source]

Check whether the package is installed.

Parameters:

package_name (str) – The name of the package.

Returns:

Whether the package is installed.

Return type:

bool

clear_output(text: str | None = None) None[source]

Clear the console output.

copy_files(GEOPI_OUTPUT_ARTIFACTS_PATH: str, GEOPI_OUTPUT_METRICS_PATH: str, GEOPI_OUTPUT_PARAMETERS_PATH: str, GEOPI_OUTPUT_SUMMARY_PATH: str) None[source]

Copy all files from the source folder to the destination folder.

Parameters:
  • GEOPI_OUTPUT_ARTIFACTS_PATH (str) – Source folder path.

  • GEOPI_OUTPUT_METRICS_PATH (str) – Source folder path.

  • GEOPI_OUTPUT_PARAMETERS_PATH (str) – Source folder path.

  • GEOPI_OUTPUT_SUMMARY_PATH (str) – Destination folder path

copy_files_from_source_dir_to_dest_dir(source_dir: str, dest_dir: str) None[source]

Copy all files from the source folder to the destination folder.

Parameters:
  • source_dir (str) – Source folder path.

  • dest_dir (str) – Destination folder path

create_geopi_output_dir(output_path: str, experiment_name: str, run_name: str, sub_run_name: str | None = None) None[source]

Create the output directory for the current run and store the related pathes as environment variable.

Parameters:
  • output_path (str) – The root path to store the output.

  • experiment_name (str) – The name of the experiment.

  • run_name (str) – The name of the run.

  • sub_run_name (str, default=None) – The name of the sub run.

get_os() str[source]

Get the operating system.

Returns:

The operating system.

Return type:

str

install_package(package_name: str) None[source]

Install the package.

Parameters:

package_name (str) – The name of the package.

list_excel_files(directory: str) list[source]

Recursively lists all Excel files (including .xlsx, .xls, and .csv) in the specified directory and its subdirectories.

Parameters:

directory (str) – The path to the directory to search for Excel files.

Returns:

excel_files – A list of file paths for all Excel files found.

Return type:

list

Notes

  1. The function uses os.walk to traverse the directory and its subdirectories.

  2. Only files with extensions .xlsx, .xls, and .csv are considered as Excel files.

log(log_path, log_name)[source]
save_data(df: DataFrame, name_column: str, df_name: str, local_data_path: str, mlflow_artifact_data_path: str | None = None, index: bool = False) None[source]

Save the dataset in the local directory and in mlflow specialized directory.

Parameters:
  • df (pd.DataFrame) – The dataset to store.

  • name_column – The name of the data.

  • df_name (str) – The name of the data sheet.

  • local_data_path (str) – The path to store the data sheet

  • mlflow_artifact_data_path (str, default=None) – The path to store the data sheet in mlflow.

  • index (bool, default=False) – Whether to write the index.

save_data_without_data_identifier(df: DataFrame, df_name: str, local_data_path: str, mlflow_artifact_data_path: str | None = None, index: bool = False) None[source]

Save the dataset in the local directory and in mlflow specialized directory.

Parameters:
  • df (pd.DataFrame) – The dataset to store.

  • df_name (str) – The name of the data sheet.

  • local_data_path (str) – The path to store the data sheet

  • mlflow_artifact_data_path (str, default=None) – The path to store the data sheet in mlflow.

  • index (bool, default=False) – Whether to write the index.

save_fig(fig_name: str, local_image_path: str, mlflow_artifact_image_path: str | None = None, tight_layout: bool = True) None[source]

Save the figure in the local directory and in mlflow specialized directory.

Parameters:
  • fig_name (str) – Figure name.

  • local_image_path (str) – The path to store the image.

  • mlflow_artifact_image_path (str, default=None) – The path to store the image in mlflow.

  • tight_layout (bool, default=True) – Automatically adjust subplot parameters to give specified padding.

save_model(model: object, model_name: str, data_sample: DataFrame, local_model_path: str, mlflow_artifact_model_path: str | None = None) None[source]

Save the model in the local directory and in mlflow specialized directory.

Parameters:
  • model (object) – The model to store.

  • model_name (str) – The name of the model.

  • data_sample (pd.DataFrame) – The sample of the dataset.

  • local_model_path (str) – The path to store the model.

  • mlflow_artifact_model_path (str, default=None) – The path to store the model in mlflow.

save_text(string: str, text_name: str, local_text_path: str, mlflow_artifact_text_path: str | None = None) None[source]

Save the text.

Parameters:
  • string (str) – The text to store.

  • text_name (str) – The name of the text.

  • local_text_path (str) – The path to store the text.

  • mlflow_artifact_text_path (str, default=None) – The path to store the text in mlflow.

show_warning(is_show: bool = True) None[source]

Overriding Python’s default filter to control whether to display warning information.

geochemistrypi.data_mining.utils.exceptions module

exception InvalidFileError(value)[source]

Bases: Exception

geochemistrypi.data_mining.utils.mlflow_utils module

retrieve_previous_experiment_id(experiment_name: str) str | None[source]

Retrieve the previous experiment with the same name.

Parameters:

experiment_name (str) – The name of the experiment.

Returns:

experiment_id – The ID of the experiment.

Return type:

str

geochemistrypi.data_mining.utils.toggle_address_status module

toggle_data_source(data_source: DataSource | None = None) list[source]

Toggle the training data path and output path based on the provided status.

Parameters:
  • status (str, optional) – The status value, which can be “1” or “2”. - “1”: Use the input and output paths in command line mode. - “2”: Retrieves all Excel files from the “data” folder on the desktop as the training data path, and sets the output path to the desktop.

  • training_data_path (str, optional) – The path to the training data. This parameter is used when status is “1”.

Returns:

paths – A list containing the training data path and the output path.

Return type:

list

Module contents