ROM Generators

Model Generator Base

class rom.generators.model_generator_base.ModelGeneratorBase(analysis_id, random_seed=None, **kwargs)[source]

Bases: object

anova_plots(y_data, yhat, model_name)[source]
build(metamodel, **kwargs)[source]
evaluate(model, model_name, model_moniker, x_data, y_data, downsample, build_time, cv_time, covariates=None, scaler=None)[source]

Generic base function to evaluate the performance of the models.

Parameters
  • model

  • model_name

  • x_data

  • y_data

  • downsample

  • build_time

Returns

Ordered dict

inspect()[source]

Inspect the dataframe and return the statistics of the dataframe.

Returns

load_data(datafile)[source]

Load the data into a dataframe. The data needs to be a CSV file at the moment.

Parameters

datafile – str, path to the CSV file to load

Returns

None

save_dataframe(dataframe, path)[source]
train_test_validate_split(dataset, metamodel, downsample=None, scale=False)[source]

Use the built in method to generate the train and test data. This adds an additional set of data for validation. This vaildation dataset is a unique ID that is pulled out of the dataset before the test_train method is called.

yy_plots(y_data, yhat, model_name)[source]

Plot the yy-plots

Parameters
  • y_data

  • yhat

  • model_name

Returns

Linear Model

class rom.generators.linear_model.LinearModel(analysis_id, random_seed=None, **kwargs)[source]

Bases: rom.generators.model_generator_base.ModelGeneratorBase

build(metamodel, **kwargs)[source]
evaluate(model, model_name, model_type, x_data, y_data, downsample, build_time, cv_time, covariates=None, scaler=None)[source]

Evaluate the performance of the forest based on known x_data and y_data. If the model was scaled, then the test data will already be scaled.

Random Forest Model

class rom.generators.random_forest.RandomForest(analysis_id, random_seed=None, **kwargs)[source]

Bases: rom.generators.model_generator_base.ModelGeneratorBase

build(metamodel, **kwargs)[source]
evaluate(model, model_name, model_type, x_data, y_data, downsample, build_time, cv_time, covariates=None, scaler=None)[source]

Evaluate the performance of the forest based on known x_data and y_data.

Parameters
  • model

  • model_name

  • model_type

  • x_data

  • y_data

  • downsample

  • build_time

  • cv_time

  • covariates

Returns

export_tree_png(tree, covariates, filename)[source]
save_cv_results(cv_results, response, downsample, filename)[source]

Save the cv_results to a CSV file. Data in the cv_results file looks like the following.

The CV results are the results of the GridSearch k-fold cross validation. The form of the results take the following from:

{
    'param_kernel': masked_array(data=['poly', 'poly', 'rbf', 'rbf'],
                                 mask=[False False False False]...)
    'param_gamma': masked_array(data=[-- -- 0.1 0.2],
                                mask=[True  True False False]...),
    'param_degree': masked_array(data=[2.0 3.0 - - --],
                                 mask=[False False  True  True]...),
    'split0_test_score': [0.8, 0.7, 0.8, 0.9],
    'split1_test_score': [0.82, 0.5, 0.7, 0.78],
    'mean_test_score': [0.81, 0.60, 0.75, 0.82],
    'std_test_score': [0.02, 0.01, 0.03, 0.03],
    'rank_test_score': [2, 4, 3, 1],
    'split0_train_score': [0.8, 0.9, 0.7],
    'split1_train_score': [0.82, 0.5, 0.7],
    'mean_train_score': [0.81, 0.7, 0.7],
    'std_train_score': [0.03, 0.03, 0.04],
    'mean_fit_time': [0.73, 0.63, 0.43, 0.49],
    'std_fit_time': [0.01, 0.02, 0.01, 0.01],
    'mean_score_time': [0.007, 0.06, 0.04, 0.04],
    'std_score_time': [0.001, 0.002, 0.003, 0.005],
    'params': [{'kernel': 'poly', 'degree': 2}, ...],
}
Parameters
  • cv_results

  • filename

Returns

Support Vector Regression

class rom.generators.svr.SVR(analysis_id, random_seed=None, **kwargs)[source]

Bases: rom.generators.model_generator_base.ModelGeneratorBase

build(metamodel, **kwargs)[source]
evaluate(model, model_name, model_moniker, x_data, y_data, downsample, build_time, cv_time, covariates=None, scaler=None)[source]

Evaluate the performance of the forest based on known x_data and y_data.

save_cv_results(cv_results, response, downsample, filename)[source]

Save the cv_results to a CSV file. Data in the cv_results file looks like the following.

The CV results are the results of the GridSearch k-fold cross validation. The form of the results take the following from:

{
    'param_kernel': masked_array(data=['poly', 'poly', 'rbf', 'rbf'],
                                 mask=[False False False False]...)
    'param_gamma': masked_array(data=[-- -- 0.1 0.2],
                                mask=[True  True False False]...),
    'param_degree': masked_array(data=[2.0 3.0 - - --],
                                 mask=[False False  True  True]...),
    'split0_test_score': [0.8, 0.7, 0.8, 0.9],
    'split1_test_score': [0.82, 0.5, 0.7, 0.78],
    'mean_test_score': [0.81, 0.60, 0.75, 0.82],
    'std_test_score': [0.02, 0.01, 0.03, 0.03],
    'rank_test_score': [2, 4, 3, 1],
    'split0_train_score': [0.8, 0.9, 0.7],
    'split1_train_score': [0.82, 0.5, 0.7],
    'mean_train_score': [0.81, 0.7, 0.7],
    'std_train_score': [0.03, 0.03, 0.04],
    'mean_fit_time': [0.73, 0.63, 0.43, 0.49],
    'std_fit_time': [0.01, 0.02, 0.01, 0.01],
    'mean_score_time': [0.007, 0.06, 0.04, 0.04],
    'std_score_time': [0.001, 0.002, 0.003, 0.005],
    'params': [{'kernel': 'poly', 'degree': 2}, ...],
}
Parameters
  • cv_results

  • filename

Returns