PVPRO Post Processing#
PVPRO Post Processing Module
This module contains a class that takes in the output dataframe of PVPRO and contains methods to process the dataset, perform signal decompositions to model degradation trends, analyze how well the models fit, and visualize the trends.
- class solardatatools.pvpro_post_processing.PVPROPostProcessor(file_name, period, index_col=0, dates=None, df_prep=True, include=None, exclude=None, verbose=False, bp=True)#
Bases:
objectThis is a class to process a dataset, perform signal decomposition to model degradation trends, and analyze and visualize the resulting trends.
- Parameters:
file_name (str) – Name of the data file to be imported (must be .csv)
period (int) – How many data points in the file make up a full year of data (for instance, in 5-day interval data, the period is 73)
index_col (int, optional) – column in the input data to be used as the index
dates (list, optional) – a list of integer column indices to be parsed as dates
df_prep (bool, optional) – A T/F switch to determine whether the dataframe preparation steps should be performed
include (str, optional) – Input to include a key term in selected column names
exclude (str, optional) – Input to exclude a key term from selected column names
verbose (bool, optional) – A T/F switch to show percentage of data points on the boundaries
bp (bool, optional) – A T/F switch to choose whether to look for data points on the boundaries
- analyze(label, lambda2=0.001, lambda4=0.1, lambda5=1, model='smooth_monotonic', verbose=False, known=None, solver='Default')#
Performs optimize() with default values. All parameters and outputs are the same as those in optimize().
- Parameters:
label (str) – Column name that indicates which system parameter is being optimized.
lambda2 (float, optional) – Weight on the Laplacian noise term
lambda4 (float) – Weight which determines the strength of smoothing on the periodic component
lambda5 (float) – Weight which determines the strength of smoothing on the degradation component
model (str) – Names the model to use for the degradation component, can be ‘linear’, ‘monotonic’, ‘smooth_monotonic’, or ‘piecewise_linear’
verbose (bool, optional) – T/F switch to determine whether cvxpy prints a verbose output of the solve
known (bool mask, optional) – Option to input a mask on the data inputted into the solver
solver (str, optional) – Indicates which solver cvxpy should call to perform the optimization problem
- boundary_points(verbose=False)#
Determines indices of points on the boundary to a tolerance determined in param_dict. Creates a list of indices where data points are on the boundary in any of the system parameters.
- Parameters:
verbose (bool, optional) – A T/F switch to show percentage of data points on the boundaries
- boundary_to_nan()#
Makes all points in the dataframe at boundary point indices be nan.
- data_setup(include=None, exclude=None)#
Adjusts time index so that there are equal intervals and isolates columns of interest. Creates dataframe self.df_ds with the selected columns and adjusted time index.
- Parameters:
include (str, optional) – Input to include a key term in selected column names
exclude (str, optional) – Input to exclude a key term from selected column names
- error_analysis(lambda4, lambda5, num_runs, lambda2=0.001, solver='Default')#
Calculates the holdout error, looping over system parameters, models, cost function weights, and number of repetitions. Creates a data frame with error results for each iteration and another which averages over all runs for each unique set of inputs.
- Parameters:
lambda4 (list) – A list of periodic component weight values to loop over
lambda5 (list) – A list of degradation component weight values to loop over
num_runs (int) – Number of runs to perform; more runs yields more generalizable results at the cost of time
lambda2 (float, optional) – Laplacian noise weight term value
solver (str, optional) – Indicates which solver cvxpy should call to perform the optimization problem
- ln_df()#
Takes the natural log of the scaled dataframe and makes all inf values nan.
- Returns:
df_l, a scaled to max 1 data frame in log space
- Return type:
Pandas DataFrame
- optimize(label, lambda4, lambda5, model, lambda2=0.001, verbose=False, known=None, solver='Default')#
Runs an optimization problem to perform a 5-component signal decomposition using cvxpy on one parameter of the PV system. Creates two data frames of the resulting components and a composed signal of the noiseless components. One data frame is in the scaled log space and the other is in the original space. These resulting data frames can be accessed in the self.scaled_data and self.descaled_data dictionaries using the key (label + ‘_’ + model).
- Parameters:
label (str) – Column name that indicates which system parameter is being optimized.
lambda4 (float) – Weight which determines the strength of smoothing on the periodic component
lambda5 (float) – Weight which determines the strength of smoothing on the degradation component
model (str) – Names the model to use for the degradation component, can be ‘linear’, ‘monotonic’, ‘smooth_monotonic’, or ‘piecewise_linear’
lambda2 (float, optional) – Weight on the Laplacian noise term
verbose (bool, optional) – T/F switch to determine whether cvxpy prints a verbose output of the solve
known (bool mask, optional) – Option to input a mask on the data inputted into the solver
solver (str, optional) – Indicates which solver cvxpy should call to perform the optimization problem
- plot_original_space(label, model='smooth_monotonic')#
Plots the SD of one system parameter in the original space.
- Parameters:
label (str) – Column name that indicates which system parameter is being optimized
model (str, optional) – Names the model to use for the degradation component, can be ‘linear’, ‘monotonic’, ‘smooth_monotonic’, or ‘piecewise_linear’
- Returns:
Plot of the SD in original space
- Return type:
figure
- plot_sd_space(label, model='smooth_monotonic')#
Plots the SD of one system parameter in the scaled log space.
- Parameters:
label (str) – Column name that indicates which system parameter is being optimized
model (str, optional) – Names the model to use for the degradation component, can be ‘linear’, ‘monotonic’, ‘smooth_monotonic’, or ‘piecewise_linear’
- Returns:
Plot of the SD in scaled log space
- Return type:
figure
- retreive_result(label, model='smooth_monotonic')#
- scale_max_1()#
Scales a dataframe to have max value 1.
- Returns:
self.scaler, which saves all of the values involved in scaling the data frame
- Return type:
array
- sd_result_dfs(lambda2=0.001, lambda4=0.1, lambda5=1, model='smooth_monotonic', known=None, solver='Default')#
Creates six new data frames containing the six signal decomposition components produced by performing optimize() with the indicated inputs over all system parameters. One data frame holds one component for all system parameters.
- Parameters:
lambda2 (float, optional) – Weight on the Laplacian noise term
lambda4 (float) – Weight which determines the strength of smoothing on the periodic component
lambda5 (float) – Weight which determines the strength of smoothing on the degradation component
model (str) – Names the model to use for the degradation component, can be ‘linear’, ‘monotonic’, ‘smooth_monotonic’, or ‘piecewise_linear’
known (bool mask, optional) – Option to input a mask on the data inputted into the solver
solver (str, optional) – Indicates which solver cvxpy should call to perform the optimization problem
- view_minmax(df)#
Prints the minimum and maximum values for each column in the dataframe.