PVPRO Post Processing#

PVPRO Post Processing Module

This module contains a class that takes in the output dataframe of PVPRO and contains methods to process the dataset, perform signal decompositions to model degradation trends, analyze how well the models fit, and visualize the trends.

class solardatatools.pvpro_post_processing.PVPROPostProcessor(file_name, period, index_col=0, dates=None, df_prep=True, include=None, exclude=None, verbose=False, bp=True)#

Bases: object

This is a class to process a dataset, perform signal decomposition to model degradation trends, and analyze and visualize the resulting trends.

Parameters:
  • file_name (str) – Name of the data file to be imported (must be .csv)

  • period (int) – How many data points in the file make up a full year of data (for instance, in 5-day interval data, the period is 73)

  • index_col (int, optional) – column in the input data to be used as the index

  • dates (list, optional) – a list of integer column indices to be parsed as dates

  • df_prep (bool, optional) – A T/F switch to determine whether the dataframe preparation steps should be performed

  • include (str, optional) – Input to include a key term in selected column names

  • exclude (str, optional) – Input to exclude a key term from selected column names

  • verbose (bool, optional) – A T/F switch to show percentage of data points on the boundaries

  • bp (bool, optional) – A T/F switch to choose whether to look for data points on the boundaries

analyze(label, lambda2=0.001, lambda4=0.1, lambda5=1, model='smooth_monotonic', verbose=False, known=None, solver='Default')#

Performs optimize() with default values. All parameters and outputs are the same as those in optimize().

Parameters:
  • label (str) – Column name that indicates which system parameter is being optimized.

  • lambda2 (float, optional) – Weight on the Laplacian noise term

  • lambda4 (float) – Weight which determines the strength of smoothing on the periodic component

  • lambda5 (float) – Weight which determines the strength of smoothing on the degradation component

  • model (str) – Names the model to use for the degradation component, can be ‘linear’, ‘monotonic’, ‘smooth_monotonic’, or ‘piecewise_linear’

  • verbose (bool, optional) – T/F switch to determine whether cvxpy prints a verbose output of the solve

  • known (bool mask, optional) – Option to input a mask on the data inputted into the solver

  • solver (str, optional) – Indicates which solver cvxpy should call to perform the optimization problem

boundary_points(verbose=False)#

Determines indices of points on the boundary to a tolerance determined in param_dict. Creates a list of indices where data points are on the boundary in any of the system parameters.

Parameters:

verbose (bool, optional) – A T/F switch to show percentage of data points on the boundaries

boundary_to_nan()#

Makes all points in the dataframe at boundary point indices be nan.

data_setup(include=None, exclude=None)#

Adjusts time index so that there are equal intervals and isolates columns of interest. Creates dataframe self.df_ds with the selected columns and adjusted time index.

Parameters:
  • include (str, optional) – Input to include a key term in selected column names

  • exclude (str, optional) – Input to exclude a key term from selected column names

error_analysis(lambda4, lambda5, num_runs, lambda2=0.001, solver='Default')#

Calculates the holdout error, looping over system parameters, models, cost function weights, and number of repetitions. Creates a data frame with error results for each iteration and another which averages over all runs for each unique set of inputs.

Parameters:
  • lambda4 (list) – A list of periodic component weight values to loop over

  • lambda5 (list) – A list of degradation component weight values to loop over

  • num_runs (int) – Number of runs to perform; more runs yields more generalizable results at the cost of time

  • lambda2 (float, optional) – Laplacian noise weight term value

  • solver (str, optional) – Indicates which solver cvxpy should call to perform the optimization problem

ln_df()#

Takes the natural log of the scaled dataframe and makes all inf values nan.

Returns:

df_l, a scaled to max 1 data frame in log space

Return type:

Pandas DataFrame

optimize(label, lambda4, lambda5, model, lambda2=0.001, verbose=False, known=None, solver='Default')#

Runs an optimization problem to perform a 5-component signal decomposition using cvxpy on one parameter of the PV system. Creates two data frames of the resulting components and a composed signal of the noiseless components. One data frame is in the scaled log space and the other is in the original space. These resulting data frames can be accessed in the self.scaled_data and self.descaled_data dictionaries using the key (label + ‘_’ + model).

Parameters:
  • label (str) – Column name that indicates which system parameter is being optimized.

  • lambda4 (float) – Weight which determines the strength of smoothing on the periodic component

  • lambda5 (float) – Weight which determines the strength of smoothing on the degradation component

  • model (str) – Names the model to use for the degradation component, can be ‘linear’, ‘monotonic’, ‘smooth_monotonic’, or ‘piecewise_linear’

  • lambda2 (float, optional) – Weight on the Laplacian noise term

  • verbose (bool, optional) – T/F switch to determine whether cvxpy prints a verbose output of the solve

  • known (bool mask, optional) – Option to input a mask on the data inputted into the solver

  • solver (str, optional) – Indicates which solver cvxpy should call to perform the optimization problem

plot_original_space(label, model='smooth_monotonic')#

Plots the SD of one system parameter in the original space.

Parameters:
  • label (str) – Column name that indicates which system parameter is being optimized

  • model (str, optional) – Names the model to use for the degradation component, can be ‘linear’, ‘monotonic’, ‘smooth_monotonic’, or ‘piecewise_linear’

Returns:

Plot of the SD in original space

Return type:

figure

plot_sd_space(label, model='smooth_monotonic')#

Plots the SD of one system parameter in the scaled log space.

Parameters:
  • label (str) – Column name that indicates which system parameter is being optimized

  • model (str, optional) – Names the model to use for the degradation component, can be ‘linear’, ‘monotonic’, ‘smooth_monotonic’, or ‘piecewise_linear’

Returns:

Plot of the SD in scaled log space

Return type:

figure

retreive_result(label, model='smooth_monotonic')#
scale_max_1()#

Scales a dataframe to have max value 1.

Returns:

self.scaler, which saves all of the values involved in scaling the data frame

Return type:

array

sd_result_dfs(lambda2=0.001, lambda4=0.1, lambda5=1, model='smooth_monotonic', known=None, solver='Default')#

Creates six new data frames containing the six signal decomposition components produced by performing optimize() with the indicated inputs over all system parameters. One data frame holds one component for all system parameters.

Parameters:
  • lambda2 (float, optional) – Weight on the Laplacian noise term

  • lambda4 (float) – Weight which determines the strength of smoothing on the periodic component

  • lambda5 (float) – Weight which determines the strength of smoothing on the degradation component

  • model (str) – Names the model to use for the degradation component, can be ‘linear’, ‘monotonic’, ‘smooth_monotonic’, or ‘piecewise_linear’

  • known (bool mask, optional) – Option to input a mask on the data inputted into the solver

  • solver (str, optional) – Indicates which solver cvxpy should call to perform the optimization problem

view_minmax(df)#

Prints the minimum and maximum values for each column in the dataframe.