geochemistrypi.data_mining.utils package¶
Submodules¶
geochemistrypi.data_mining.utils.base module¶
- check_package(package_name: str) bool[source]¶
Check whether the package is installed.
- Parameters:
package_name (str) – The name of the package.
- Returns:
Whether the package is installed.
- Return type:
bool
- copy_files(GEOPI_OUTPUT_ARTIFACTS_PATH: str, GEOPI_OUTPUT_METRICS_PATH: str, GEOPI_OUTPUT_PARAMETERS_PATH: str, GEOPI_OUTPUT_SUMMARY_PATH: str) None[source]¶
Copy all files from the source folder to the destination folder.
- Parameters:
GEOPI_OUTPUT_ARTIFACTS_PATH (str) – Source folder path.
GEOPI_OUTPUT_METRICS_PATH (str) – Source folder path.
GEOPI_OUTPUT_PARAMETERS_PATH (str) – Source folder path.
GEOPI_OUTPUT_SUMMARY_PATH (str) – Destination folder path
- copy_files_from_source_dir_to_dest_dir(source_dir: str, dest_dir: str) None[source]¶
Copy all files from the source folder to the destination folder.
- Parameters:
source_dir (str) – Source folder path.
dest_dir (str) – Destination folder path
- create_geopi_output_dir(output_path: str, experiment_name: str, run_name: str, sub_run_name: str | None = None) None[source]¶
Create the output directory for the current run and store the related pathes as environment variable.
- Parameters:
output_path (str) – The root path to store the output.
experiment_name (str) – The name of the experiment.
run_name (str) – The name of the run.
sub_run_name (str, default=None) – The name of the sub run.
- install_package(package_name: str) None[source]¶
Install the package.
- Parameters:
package_name (str) – The name of the package.
- list_excel_files(directory: str) list[source]¶
Recursively lists all Excel files (including .xlsx, .xls, and .csv) in the specified directory and its subdirectories.
- Parameters:
directory (str) – The path to the directory to search for Excel files.
- Returns:
excel_files – A list of file paths for all Excel files found.
- Return type:
list
Notes
The function uses os.walk to traverse the directory and its subdirectories.
Only files with extensions .xlsx, .xls, and .csv are considered as Excel files.
- save_data(df: DataFrame, name_column: str, df_name: str, local_data_path: str, mlflow_artifact_data_path: str | None = None, index: bool = False) None[source]¶
Save the dataset in the local directory and in mlflow specialized directory.
- Parameters:
df (pd.DataFrame) – The dataset to store.
name_column – The name of the data.
df_name (str) – The name of the data sheet.
local_data_path (str) – The path to store the data sheet
mlflow_artifact_data_path (str, default=None) – The path to store the data sheet in mlflow.
index (bool, default=False) – Whether to write the index.
- save_data_without_data_identifier(df: DataFrame, df_name: str, local_data_path: str, mlflow_artifact_data_path: str | None = None, index: bool = False) None[source]¶
Save the dataset in the local directory and in mlflow specialized directory.
- Parameters:
df (pd.DataFrame) – The dataset to store.
df_name (str) – The name of the data sheet.
local_data_path (str) – The path to store the data sheet
mlflow_artifact_data_path (str, default=None) – The path to store the data sheet in mlflow.
index (bool, default=False) – Whether to write the index.
- save_fig(fig_name: str, local_image_path: str, mlflow_artifact_image_path: str | None = None, tight_layout: bool = True) None[source]¶
Save the figure in the local directory and in mlflow specialized directory.
- Parameters:
fig_name (str) – Figure name.
local_image_path (str) – The path to store the image.
mlflow_artifact_image_path (str, default=None) – The path to store the image in mlflow.
tight_layout (bool, default=True) – Automatically adjust subplot parameters to give specified padding.
- save_model(model: object, model_name: str, data_sample: DataFrame, local_model_path: str, mlflow_artifact_model_path: str | None = None) None[source]¶
Save the model in the local directory and in mlflow specialized directory.
- Parameters:
model (object) – The model to store.
model_name (str) – The name of the model.
data_sample (pd.DataFrame) – The sample of the dataset.
local_model_path (str) – The path to store the model.
mlflow_artifact_model_path (str, default=None) – The path to store the model in mlflow.
- save_text(string: str, text_name: str, local_text_path: str, mlflow_artifact_text_path: str | None = None) None[source]¶
Save the text.
- Parameters:
string (str) – The text to store.
text_name (str) – The name of the text.
local_text_path (str) – The path to store the text.
mlflow_artifact_text_path (str, default=None) – The path to store the text in mlflow.
geochemistrypi.data_mining.utils.exceptions module¶
geochemistrypi.data_mining.utils.mlflow_utils module¶
geochemistrypi.data_mining.utils.toggle_address_status module¶
- toggle_data_source(data_source: DataSource | None = None) list[source]¶
Toggle the training data path and output path based on the provided status.
- Parameters:
status (str, optional) – The status value, which can be “1” or “2”. - “1”: Use the input and output paths in command line mode. - “2”: Retrieves all Excel files from the “data” folder on the desktop as the training data path, and sets the output path to the desktop.
training_data_path (str, optional) – The path to the training data. This parameter is used when status is “1”.
- Returns:
paths – A list containing the training data path and the output path.
- Return type:
list