ROM Generators¶

Model Generator Base¶

class rom.generators.model_generator_base.ModelGeneratorBase(analysis_id, random_seed=None, **kwargs)[source]¶

Bases: object

anova_plots(y_data, yhat, model_name)[source]¶

build(metamodel, **kwargs)[source]¶

evaluate(model, model_name, model_moniker, x_data, y_data, downsample, build_time, cv_time, covariates=None, scaler=None)[source]¶

Generic base function to evaluate the performance of the models.

Parameters

model –
model_name –
x_data –
y_data –
downsample –
build_time –

Returns

Ordered dict

inspect()[source]¶

Inspect the dataframe and return the statistics of the dataframe.

Returns

load_data(datafile)[source]¶

Load the data into a dataframe. The data needs to be a CSV file at the moment.

Parameters: datafile – str, path to the CSV file to load
Returns: None

save_dataframe(dataframe, path)[source]¶

train_test_validate_split(dataset, metamodel, downsample=None, scale=False)[source]¶: Use the built in method to generate the train and test data. This adds an additional set of data for validation. This vaildation dataset is a unique ID that is pulled out of the dataset before the test_train method is called.

yy_plots(y_data, yhat, model_name)[source]¶

Plot the yy-plots

Parameters

y_data –
yhat –
model_name –

Returns

Linear Model¶

class rom.generators.linear_model.LinearModel(analysis_id, random_seed=None, **kwargs)[source]¶

Bases: rom.generators.model_generator_base.ModelGeneratorBase

build(metamodel, **kwargs)[source]¶

evaluate(model, model_name, model_type, x_data, y_data, downsample, build_time, cv_time, covariates=None, scaler=None)[source]¶: Evaluate the performance of the forest based on known x_data and y_data. If the model was scaled, then the test data will already be scaled.

Random Forest Model¶

class rom.generators.random_forest.RandomForest(analysis_id, random_seed=None, **kwargs)[source]¶

Bases: rom.generators.model_generator_base.ModelGeneratorBase

build(metamodel, **kwargs)[source]¶

evaluate(model, model_name, model_type, x_data, y_data, downsample, build_time, cv_time, covariates=None, scaler=None)[source]¶

Evaluate the performance of the forest based on known x_data and y_data.

Parameters

model –
model_name –
model_type –
x_data –
y_data –
downsample –
build_time –
cv_time –
covariates –

Returns

export_tree_png(tree, covariates, filename)[source]¶

save_cv_results(cv_results, response, downsample, filename)[source]¶

Save the cv_results to a CSV file. Data in the cv_results file looks like the following.

The CV results are the results of the GridSearch k-fold cross validation. The form of the results take the following from:

{
    'param_kernel': masked_array(data=['poly', 'poly', 'rbf', 'rbf'],
                                 mask=[False False False False]...)
    'param_gamma': masked_array(data=[-- -- 0.1 0.2],
                                mask=[True  True False False]...),
    'param_degree': masked_array(data=[2.0 3.0 - - --],
                                 mask=[False False  True  True]...),
    'split0_test_score': [0.8, 0.7, 0.8, 0.9],
    'split1_test_score': [0.82, 0.5, 0.7, 0.78],
    'mean_test_score': [0.81, 0.60, 0.75, 0.82],
    'std_test_score': [0.02, 0.01, 0.03, 0.03],
    'rank_test_score': [2, 4, 3, 1],
    'split0_train_score': [0.8, 0.9, 0.7],
    'split1_train_score': [0.82, 0.5, 0.7],
    'mean_train_score': [0.81, 0.7, 0.7],
    'std_train_score': [0.03, 0.03, 0.04],
    'mean_fit_time': [0.73, 0.63, 0.43, 0.49],
    'std_fit_time': [0.01, 0.02, 0.01, 0.01],
    'mean_score_time': [0.007, 0.06, 0.04, 0.04],
    'std_score_time': [0.001, 0.002, 0.003, 0.005],
    'params': [{'kernel': 'poly', 'degree': 2}, ...],
}

Parameters

cv_results –
filename –

Returns

Support Vector Regression¶

class rom.generators.svr.SVR(analysis_id, random_seed=None, **kwargs)[source]¶

Bases: rom.generators.model_generator_base.ModelGeneratorBase

build(metamodel, **kwargs)[source]¶

evaluate(model, model_name, model_moniker, x_data, y_data, downsample, build_time, cv_time, covariates=None, scaler=None)[source]¶: Evaluate the performance of the forest based on known x_data and y_data.

save_cv_results(cv_results, response, downsample, filename)[source]¶

Save the cv_results to a CSV file. Data in the cv_results file looks like the following.

The CV results are the results of the GridSearch k-fold cross validation. The form of the results take the following from:

{
    'param_kernel': masked_array(data=['poly', 'poly', 'rbf', 'rbf'],
                                 mask=[False False False False]...)
    'param_gamma': masked_array(data=[-- -- 0.1 0.2],
                                mask=[True  True False False]...),
    'param_degree': masked_array(data=[2.0 3.0 - - --],
                                 mask=[False False  True  True]...),
    'split0_test_score': [0.8, 0.7, 0.8, 0.9],
    'split1_test_score': [0.82, 0.5, 0.7, 0.78],
    'mean_test_score': [0.81, 0.60, 0.75, 0.82],
    'std_test_score': [0.02, 0.01, 0.03, 0.03],
    'rank_test_score': [2, 4, 3, 1],
    'split0_train_score': [0.8, 0.9, 0.7],
    'split1_train_score': [0.82, 0.5, 0.7],
    'mean_train_score': [0.81, 0.7, 0.7],
    'std_train_score': [0.03, 0.03, 0.04],
    'mean_fit_time': [0.73, 0.63, 0.43, 0.49],
    'std_fit_time': [0.01, 0.02, 0.01, 0.01],
    'mean_score_time': [0.007, 0.06, 0.04, 0.04],
    'std_score_time': [0.001, 0.002, 0.003, 0.005],
    'params': [{'kernel': 'poly', 'degree': 2}, ...],
}

Parameters

cv_results –
filename –

Returns