geochemistrypi.data_mining.utils package¶

Submodules¶

geochemistrypi.data_mining.utils.base module¶

check_package(package_name: str) → bool[source]¶

Check whether the package is installed.

Parameters:: package_name (str) – The name of the package.
Returns:: Whether the package is installed.
Return type:: bool

clear_output(text: str | None = None) → None[source]¶: Clear the console output.

copy_files(GEOPI_OUTPUT_ARTIFACTS_PATH: str, GEOPI_OUTPUT_METRICS_PATH: str, GEOPI_OUTPUT_PARAMETERS_PATH: str, GEOPI_OUTPUT_SUMMARY_PATH: str) → None[source]¶

Copy all files from the source folder to the destination folder.

Parameters:

GEOPI_OUTPUT_ARTIFACTS_PATH (str) – Source folder path.
GEOPI_OUTPUT_METRICS_PATH (str) – Source folder path.
GEOPI_OUTPUT_PARAMETERS_PATH (str) – Source folder path.
GEOPI_OUTPUT_SUMMARY_PATH (str) – Destination folder path

copy_files_from_source_dir_to_dest_dir(source_dir: str, dest_dir: str) → None[source]¶

Copy all files from the source folder to the destination folder.

Parameters:

source_dir (str) – Source folder path.
dest_dir (str) – Destination folder path

create_geopi_output_dir(output_path: str, experiment_name: str, run_name: str, sub_run_name: str | None = None) → None[source]¶

Create the output directory for the current run and store the related pathes as environment variable.

Parameters:

output_path (str) – The root path to store the output.
experiment_name (str) – The name of the experiment.
run_name (str) – The name of the run.
sub_run_name (str, default=None) – The name of the sub run.

get_os() → str[source]¶

Get the operating system.

Returns:: The operating system.
Return type:: str

install_package(package_name: str) → None[source]¶

Install the package.

Parameters:: package_name (str) – The name of the package.

list_excel_files(directory: str) → list[source]¶

Recursively lists all Excel files (including .xlsx, .xls, and .csv) in the specified directory and its subdirectories.

Parameters:: directory (str) – The path to the directory to search for Excel files.
Returns:: excel_files – A list of file paths for all Excel files found.
Return type:: list

Notes

The function uses os.walk to traverse the directory and its subdirectories.
Only files with extensions .xlsx, .xls, and .csv are considered as Excel files.

log(log_path, log_name)[source]¶

save_data(df: DataFrame, name_column: str, df_name: str, local_data_path: str, mlflow_artifact_data_path: str | None = None, index: bool = False) → None[source]¶

Save the dataset in the local directory and in mlflow specialized directory.

Parameters:

df (pd.DataFrame) – The dataset to store.
name_column – The name of the data.
df_name (str) – The name of the data sheet.
local_data_path (str) – The path to store the data sheet
mlflow_artifact_data_path (str, default=None) – The path to store the data sheet in mlflow.
index (bool, default=False) – Whether to write the index.

save_data_without_data_identifier(df: DataFrame, df_name: str, local_data_path: str, mlflow_artifact_data_path: str | None = None, index: bool = False) → None[source]¶

Save the dataset in the local directory and in mlflow specialized directory.

Parameters:

df (pd.DataFrame) – The dataset to store.
df_name (str) – The name of the data sheet.
local_data_path (str) – The path to store the data sheet
mlflow_artifact_data_path (str, default=None) – The path to store the data sheet in mlflow.
index (bool, default=False) – Whether to write the index.

save_fig(fig_name: str, local_image_path: str, mlflow_artifact_image_path: str | None = None, tight_layout: bool = True) → None[source]¶

Save the figure in the local directory and in mlflow specialized directory.

Parameters:

fig_name (str) – Figure name.
local_image_path (str) – The path to store the image.
mlflow_artifact_image_path (str, default=None) – The path to store the image in mlflow.
tight_layout (bool, default=True) – Automatically adjust subplot parameters to give specified padding.

save_model(model: object, model_name: str, data_sample: DataFrame, local_model_path: str, mlflow_artifact_model_path: str | None = None) → None[source]¶

Save the model in the local directory and in mlflow specialized directory.

Parameters:

model (object) – The model to store.
model_name (str) – The name of the model.
data_sample (pd.DataFrame) – The sample of the dataset.
local_model_path (str) – The path to store the model.
mlflow_artifact_model_path (str, default=None) – The path to store the model in mlflow.

save_text(string: str, text_name: str, local_text_path: str, mlflow_artifact_text_path: str | None = None) → None[source]¶

Save the text.

Parameters:

string (str) – The text to store.
text_name (str) – The name of the text.
local_text_path (str) – The path to store the text.
mlflow_artifact_text_path (str, default=None) – The path to store the text in mlflow.

show_warning(is_show: bool = True) → None[source]¶: Overriding Python’s default filter to control whether to display warning information.

geochemistrypi.data_mining.utils.exceptions module¶

exception InvalidFileError(value)[source]¶: Bases: Exception

geochemistrypi.data_mining.utils.mlflow_utils module¶

retrieve_previous_experiment_id(experiment_name: str) → str | None[source]¶

Retrieve the previous experiment with the same name.

Parameters:: experiment_name (str) – The name of the experiment.
Returns:: experiment_id – The ID of the experiment.
Return type:: str

geochemistrypi.data_mining.utils.toggle_address_status module¶

toggle_data_source(data_source: DataSource | None = None) → list[source]¶

Toggle the training data path and output path based on the provided status.

Parameters:

status (str, optional) – The status value, which can be “1” or “2”. - “1”: Use the input and output paths in command line mode. - “2”: Retrieves all Excel files from the “data” folder on the desktop as the training data path, and sets the output path to the desktop.
training_data_path (str, optional) – The path to the training data. This parameter is used when status is “1”.

Returns:

paths – A list containing the training data path and the output path.

Return type:

list

geochemistrypi.data_mining.utils package¶

Submodules¶

geochemistrypi.data_mining.utils.base module¶

geochemistrypi.data_mining.utils.exceptions module¶

geochemistrypi.data_mining.utils.mlflow_utils module¶

geochemistrypi.data_mining.utils.toggle_address_status module¶

Module contents¶