API Reference#
- class solardatatools.ClearDayDetection
Bases:
object- filter_for_sparsity(data, w1=6000.0, solver='OSQP')
- find_clear_days(data, smoothness_threshold=0.9, energy_threshold=0.8, boolean_out=True, solver='CLARABEL')
This function quickly finds clear days in a PV power data set. The input to this function is a 2D array containing standardized time series power data. This will typically be the output from solardatatools.data_transforms.make_2d. The filter relies on two estimates of daily “clearness”: the smoothness of each daily signal as measured by the l2-norm of the 2nd order difference, and seasonally-adjusted daily energy. Seasonal adjustment of the daily energy if obtained by solving a local quantile regression problem, which is a convex optimization problem and is solvable with cvxpy. The parameter th controls the relative weighting of the daily smoothness and daily energy in the final filter in a geometric mean. A value of 0 will rely entirely on the daily energy and a value of 1 will rely entirely on daily smoothness.
- Parameters:
D – A 2D numpy array containing a solar power time series signal.
th – A parameter that tunes the filter between relying of daily smoothness and daily energy
- Returns:
A 1D boolean array, with True values corresponding to clear days in the data set
- plot_analysis(figsize=None)
- class solardatatools.DataHandler(data_frame=None, raw_data_matrix=None, datetime_col=None, convert_to_ts=False, no_future_dates=True, aggregate=None, how=<function DataHandler.<lambda>>, gmt_offset=None)
Bases:
object- apply_time_dilation(nvals_dil=101)
- augment_data_frame(boolean_index, column_name)
Add a column to the data frame (tabular) representation of the data, containing True/False values at each time stamp. Boolean index is a 1-D or 2-D numpy array of True/False values. If 1-D, array should be of length N, where N is the number of days in the data set. If 2-D, the array should be of size M X N where M is the number of measurements each day and N is the number of days.
- Parameters:
boolean_index – Length N or size M X N numpy arrays of booleans
column_name – Name for column
- Returns:
- auto_fix_time_shifts(round_shifts_to_hour=True, w1=None, w2=0.001, estimator='srss', threshold=0.005, periodic_detector=False, solver=None)
- calculate_scsf_performance_index()
- capacity_clustering(solver=None, plot=False, figsize=(8, 6), show_clusters=True)
- clipping_check(solver='OSQP')
- detect_clear_days(smoothness_threshold=0.9, energy_threshold=0.8, solver=None)
- detect_clear_sky(quantile_level=0.9, threshold_low=0.75, threshold_high=1.2, stickiness_high=0.1, stickiness_low=4, loss_correction=True, verbose=False, nvals_dil=101, num_harmonics=[16, 3], regularization=0.1, solver='CLARABEL')
- estimate_latitude()
Estimate the latitude of the system based on the current parameter estimation.
This method uses the parameter_estimation object to perform latitude estimation. The method will only proceed if the __help_param_est method indicates that the parameters are ready for estimation.
- Returns:
float The estimated latitude of the system in decimal degrees.
- Raises:
AttributeError – If parameter_estimation or its estimate_latitude method is not available.
Exception – If the parameters are not ready for estimation.
- estimate_location_and_orientation(day_interval=None, x1=0.9, x2=0.9)
Estimates the location (latitude and longitude) and orientation (tilt and azimuth) of the system.
This method utilizes the parameter_estimation object to estimate the system’s geographic location and orientation based on the provided parameters.
- Parameters:
day_interval (float or None, optional) – (Optional) The interval of days used for estimation. If None, defaults to internal settings.
x1 (float, optional) – Weight factor for optimization. Defaults to 0.9.
x2 (float, optional) – Weight factor for optimization. Defaults to 0.9.
- Returns:
A tuple containing the estimated latitude, longitude, tilt, and azimuth of the system.
- Return type:
tuple of (float, float, float, float)
- Raises:
Exception if the parameter estimation is not properly initialized.
- estimate_longitude(estimator='fit_l1', eot_calculation='duffie')
Estimates the longitude of the system’s location.
This method uses the configured parameter estimator to calculate the longitude based on the Equation of Time (EOT) calculation methods.
- Parameters:
estimator – The method to use for fitting the longitude estimation. Options include “fit_l1” for L1 norm fitting. Default is “fit_l1”.
eot_calculation – The method to use for calculating the Equation of Time. Options include “duffie” for Duffie method. Default is “duffie”.
- Returns:
The estimated longitude of the system’s location.
- Raises:
ValueError if the parameter estimator is not ready for use.
- estimate_orientation(latitude=None, longitude=None, tilt=None, azimuth=None, day_interval=None, x1=0.9, x2=0.9)
Estimate the orientation of the system, including tilt and azimuth angles. It relies on previously set up parameter estimation configurations.
- Parameters:
latitude – Optional latitude of the system. If not provided, will use the previously estimated value.
longitude – Optional longitude of the system. If not provided, will use the previously estimated value.
tilt – Optional tilt angle of the system in degrees. If not provided, will use the previously estimated value.
azimuth – Optional azimuth angle of the system in degrees. If not provided, will use the previously estimated value.
day_interval – Optional interval in days for the estimation. If not provided, the default value will be used.
x1 – Optional weight factor for the first parameter. Default is 0.9.
x2 – Optional weight factor for the second parameter. Default is 0.9.
- Returns:
A tuple containing the estimated tilt and azimuth angles. The values are in degrees.
- Return type:
tuple(float, float)
- Raises:
AttributeError – If parameter estimation has not been properly initialized.
- estimate_quantiles(nvals_dil=101, quantile_levels=[0.02, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.98], num_harmonics=[16, 3], regularization=0.1, solver='CLARABEL', verbose=False)
- find_clipped_times(solver_convex='CLARABEL')
- fit_statistical_clear_sky_model(quantile_level=0.9, nvals_dil=101, num_harmonics=[16, 3], regularization=0.1, solver='CLARABEL', verbose=False)
Fit statistical model of PV system clear sky response, using smooth, periodic quantile estimation. Uses new self.estimate_quantiles method, estimating a single default quantile level of 0.9.
- Parameters:
quantile_level (float) – the quantile level to fit as the clear sky system response
nvals_dil (int) – the number of data points to use for daily dilation
num_harmonics (list) – the number of Fourier harmonics to use for the daily (first) and yearly (second) periods
regularization (float) – stiffness weight for quantile fits (larger is more stiff)
solver (string) – the cvxpy solver to invoke, default is CLARABEL
verbose (boolean) – print solver status
- Returns:
- fix_dst()
Helper function for fixing data sets with known DST shift. This function works for data recorded anywhere in the United States. The choice of timezone (e.g. ‘US/Pacific’) does not matter, as long as the dates of the clock changes are the same. :return:
- generate_extra_matrix(column, new_index=None, key=None)
- get_daily_flags(density_lower_threshold=0.6, density_upper_threshold=1.05, linearity_threshold=0.1)
- get_daily_scores(threshold=0.2, solver=None)
- get_density_scores(threshold=0.2, solver=None)
- get_linearity_scores()
- make_data_matrix(use_col=None, start_day_ix=None, end_day_ix=None)
- make_filled_data_matrix(zero_night=True, interp_day=True)
- plot_bundt(figsize=(12, 8), units='kW', inner_radius=1.0, slice_thickness=100, elev=45, azim=30, zoom=1.0, zscale=0.5, cmap='coolwarm', aggregate=True, skip=0)
- plot_capacity_change_analysis(figsize=(8, 6), show_clusters=True)
Plot the capacity change analysis results.
This function generates a plot that visualizes the results of the capacity change analysis. It uses the capacity_clustering method to create the plot and can optionally show clusters.
- Parameters:
figsize (tuple, optional) – The size of the figure to create. Default is (8, 6).
show_clusters (bool, optional) – Whether to show clusters in the plot. Default is True.
- Returns:
The figure containing the capacity change analysis plot.
- Return type:
matplotlib.figure.Figure
- plot_cdf_analysis(figsize=(12, 6))
Plot the CDF (Cumulative Distribution Function) analysis results.
This function generates a plot that visualizes the differences in the CDF analysis using the plot_diffs method from the clipping_analysis attribute.
- Parameters:
figsize (tuple, optional) – The size of the figure to create. Default is (12, 6).
- Returns:
The figure containing the CDF analysis plot.
- Return type:
matplotlib.figure.Figure
- plot_circ_dist(flag='good', num_bins=48, figsize=(8, 8))
Plot a circular distribution of days based on the specified flag.
This function generates a polar plot showing the distribution of days throughout the year based on the provided flag. The plot is divided into bins, and the number of days in each bin is represented as bars.
- Parameters:
flag (str, optional) – The type of days to plot. Options are “good”, “bad”, “clear”, “sunny”, or “cloudy”. Default is “good”.
num_bins (int, optional) – The number of bins to divide the year into. Default is 48 (12 * 4).
figsize (tuple, optional) – The size of the figure to create. Default is (8, 8).
- plot_clipping(figsize=(10, 8))
Plot the clipping analysis results.
This function generates a plot that visualizes the clipping scores and highlights the days with inverter clipping.
- Parameters:
figsize (tuple, optional) – The size of the figure to create. Default is (10, 8).
- Returns:
The figure of the clipping analysis results.
- Return type:
matplotlib.figure.Figure
- plot_daily_energy(flag=None, figsize=(8, 6), units='Wh')
Plot the daily energy production.
This function generates a plot that visualizes the daily energy production. It can optionally flag certain days based on the provided flag parameter.
- Parameters:
flag (str, optional) – A flag to apply to the data matrix. Options are “good”, “bad”, “clear”, “sunny”, or “cloudy”. Default is None.
figsize (tuple, optional) – The size of the figure to create. Default is (8, 6).
units (str, optional) – The units to use for the energy data. Default is “Wh”.
- Returns:
The figure showing the daily energy analysis.
- Return type:
matplotlib.figure.Figure
- plot_daily_max_cdf(figsize=(10, 6))
Plot the CDF (Cumulative Distribution Function) of the daily maximum values.
This function generates a plot that visualizes the CDF of the daily maximum values from the clipping analysis.
- Parameters:
figsize (tuple, optional) – The size of the figure to create. Default is (10, 6).
- Returns:
The figure containing the CDF of the daily maximum values.
- Return type:
matplotlib.figure.Figure
- plot_daily_max_cdf_and_pdf(figsize=(10, 6))
Plot both the CDF (Cumulative Distribution Function) and PDF (Probability Density Function) of the daily maximum values.
This function generates a plot that includes both the CDF and PDF of the daily maximum values from the clipping analysis.
- Parameters:
figsize (tuple, optional) – The size of the figure to create. Default is (10, 6).
- Returns:
The plot containing both the CDF and PDF of the daily maximum values.
- Return type:
matplotlib.figure.Figure
- plot_daily_max_pdf(figsize=(8, 6))
Plot the PDF (Probability Density Function) of the daily maximum values.
This function generates a plot that visualizes the PDF of the daily maximum values from the clipping analysis.
- Parameters:
figsize (tuple, optional) – The size of the figure to create. Default is (8, 6).
- Returns:
The figure containing the PDF of the daily maximum values.
- Return type:
matplotlib.figure.Figure
- plot_daily_signals(boolean_index=None, start_day=0, num_days=5, filled=True, ravel=True, figsize=(12, 6), color=None, alpha=None, label=None, boolean_mask=None, mask_label=None, show_clear_model=True, show_legend=False, marker=None)
Plot daily signals from the data matrix.
This function generates a plot of daily signals from the data matrix, with options to select specific days, apply boolean masks, and customize the appearance of the plot.
- Parameters:
boolean_index (List[bool] or None, optional) – Boolean index to select specific days. Default is None.
start_day (int or str, optional) – The starting day for the plot. Can be an integer or a date string. Default is 0.
num_days (int, optional) – The number of days to include in the plot. Default is 5.
filled (bool, optional) – Whether to use the filled data matrix. If False, uses the raw data matrix. Default is True.
ravel (bool, optional) – Whether to ravel the data matrix for plotting. Default is True.
figsize (tuple, optional) – The size of the figure to create. Default is (12, 6).
color (str or None, optional) – The color of the plot lines. Default is None.
alpha (float or None, optional) – The alpha transparency of the plot lines. Default is None.
label (str or None, optional) – The label for the plot lines. Default is None.
boolean_mask (List[bool] or None, optional) – A boolean mask to apply to the data. Default is None.
mask_label (str or None, optional) – Label for the boolean mask. Default is None.
show_clear_model (bool, optional) – Whether to show the clear sky model in the plot. Default is True.
show_legend (bool, optional) – Whether to show the legend in the plot. Default is False.
marker (str or None, optional) – Marker style for the plot. Default is None.
- plot_data_quality_scatter(figsize=(6, 5))
Plot a scatter plot of data quality scores.
This function generates a scatter plot that visualizes the density and linearity scores for each day in the data set. It uses the quality clustering labels to color-code the points and includes decision boundaries for density and linearity thresholds.
- Parameters:
figsize (tuple, optional) – The size of the figure to create. Default is (6, 5).
- Returns:
The figure containing the scatter plot of data quality scores.
- Return type:
matplotlib.figure.Figure
- plot_density_signal(flag=None, show_fit=False, figsize=(8, 6))
Plot the daily signal density.
This function generates a plot that visualizes the daily signal density. It can optionally flag certain days and show a seasonal density fit.
- Parameters:
flag (str or array-like, optional) – A flag to apply to the data matrix. Options are: - “density”: Flag density outlier days. - “good”: Flag good days. - “bad”: Flag bad days. - “clear” or “sunny”: Flag clear days. - “cloudy”: Flag cloudy days. - A boolean array indicating which days to flag. Default is None.
show_fit (bool, optional) – Whether to show the seasonal density fit. Default is False.
figsize (tuple, optional) – The size of the figure. Default is (8, 6).
- Returns:
The figure containing the density signal analysis.
- Return type:
matplotlib.figure.Figure
- plot_heatmap(matrix='raw', flag=None, figsize=(12, 6), scale_to_kw=True, year_lines=True, units=None)
Plot a heatmap of the data matrix, with options to scale the data, add year lines, and customize the figure size and units.
- Parameters:
matrix (str, optional) – The data matrix to plot. Options are “raw”, “filled”, or any key in self.extra_matrices. Default is “raw”.
flag (str, optional) – A flag to apply to the data matrix. If None, no flag is applied. Default is None. Options are: ‘good’, ‘bad’, ‘sunny’, ‘cloudy’, and ‘clipping’.
figsize (tuple, optional) – The size of the figure to create. Default is (12, 6).
scale_to_kw (bool, optional) – Whether to scale the data to kiloWatts if the power units are in Watts. Default is True.
year_lines (bool, optional) – Whether to add lines indicating the start of each year. Default is True.
units (str, optional) – The units to use for the data. If None, the function will determine the units based on scale_to_kw and self.power_units. Default is None.
- Returns:
None
- plot_polar_transform(lat, lon, tz_offset, elevation_round=1, azimuth_round=2, alpha=1.0)
Plot the polar transformation of the data.
This function generates a polar plot of the data using the specified latitude, longitude, and time zone offset. It optionally rounds the elevation and azimuth angles and sets the transparency of the plot.
- Parameters:
lat (float) – Latitude of the location.
lon (float) – Longitude of the location.
tz_offset (int) – Time zone offset from GMT.
elevation_round (int, optional) – Rounding precision for elevation angles. Default is 1.
azimuth_round (int, optional) – Rounding precision for azimuth angles. Default is 2.
alpha (float, optional) – Transparency level of the plot. Default is 1.0.
- Returns:
The polar transformation plot.
- Return type:
matplotlib.figure.Figure
- plot_time_shift_analysis_results(figsize=(8, 6), show_filter=True)
Plot the results of the time shift analysis.
This function generates a plot that visualizes the results of the time shift analysis, including the daily solar noon metric, the shift detector, and the signal model. It also optionally highlights the filtered days.
- Parameters:
figsize (tuple, optional) – The size of the figure to create. Default is (8, 6).
show_filter (bool, optional) – If True, highlights the filtered days in the plot. Default is True.
- Returns:
The plot figure.
- Return type:
matplotlib.figure.Figure
- report(verbose=True, return_values=False)
Generate a report summarizing key metrics from the pipeline analysis.
The report includes metrics such as the data set length, data sampling rate, quality scores, inverter clipping, and any detected capacity changes, time shift or time zone corrections. The function can either print this report or return it as a dictionary.
- Parameters:
verbose (bool, optional) – If True, the report will be printed to the console. Defaults to True.
return_values (bool, optional) – If True, the report will be returned as a dictionary. If False, the function only prints the report. Defaults to False.
- Returns:
A dictionary containing the following keys:
”length”: The length of the data set in years.
”capacity”: The estimated capacity in kW.
”sampling”: The data sampling rate in minutes.
”quality score”: The overall data quality score.
”clearness score”: The clearness score of the data.
”inverter clipping”: Whether inverter clipping was detected.
”clipped fraction”: The fraction of the data that is clipped due to inverter limitations.
”capacity change”: Whether capacity changes were detected.
”data quality warning”: Warnings regarding data quality.
”time shift correction”: Whether any time shifts that were corrected.
”time zone correction”: The amount of time zone correction applied.
The dictionary is returned only if return_values is True.
- Return type:
dict, optional
- Example usage:
report = my_data_handler.report(verbose=True, return_values=True) print(report['capacity'])
- run_loss_factor_analysis(verbose=True, tau=0.9, num_harmonics=4, deg_type='linear', include_soiling=True, weight_seasonal=0.1, weight_soiling_stiffness=0.5, weight_soiling_sparsity=0.02, weight_deg_nonlinear=100000.0, deg_rate=None, use_capacity_change_labels=True)
Perform loss factor analysis on the processed data to estimate energy loss factors.
This method performs a series of calculations to analyze the loss factors in the solar power data, including energy loss due to degradation, soiling, and other factors.
The analysis is performed only if the pipeline has been successfully run. It uses the LossFactorAnalysis class to estimate degradation rates and energy losses based on various input parameters.
- Parameters:
verbose – If True, prints detailed information about the analysis and results. Default is True.
tau – Smoothing parameter for the analysis. Default is 0.9.
num_harmonics – Number of harmonics to include in the analysis. Default is 4.
deg_type – Type of degradation model to use (“linear” or “nonlinear”). Default is “linear”.
include_soiling – If True, includes soiling effects in the analysis. Default is True.
weight_seasonal – Weight for seasonal effects in the analysis. Default is 10e-2.
weight_soiling_stiffness – Weight for soiling stiffness in the analysis. Default is 5e-1.
weight_soiling_sparsity – Weight for soiling sparsity in the analysis. Default is 2e-2.
weight_deg_nonlinear – Weight for nonlinear degradation in the analysis. Default is 10e4.
deg_rate – User-provided degradation rate. If None, the rate is estimated from the data. Default is None.
use_capacity_change_labels – If True, uses capacity change labels in the analysis. Default is True.
- Returns:
None Prints the results of the loss factor analysis if verbose is True.
- Raises:
RuntimeError if the pipeline has not been run before calling this method.
- run_pipeline(power_col=None, min_val=-5, max_val=None, zero_night=True, interp_day=True, fix_shifts=False, round_shifts_to_hour=True, density_lower_threshold=0.6, density_upper_threshold=1.05, linearity_threshold=0.1, clear_day_smoothness_param=0.9, clear_day_energy_param=0.8, verbose=True, start_day_ix=None, end_day_ix=None, time_shift_weight_change_detector=None, time_shift_weight_seasonal=0.001, periodic_detector=False, solar_noon_estimator='srss', correct_tz=True, extra_cols=None, sunrise_sunset_random_seed=42, daytime_threshold=0.005, units='W', solver='CLARABEL', solver_convex='CLARABEL', reset=True)
Runs the main Solar Data Tools pipeline.
Runs the main pipeline, which preprocesses, cleans, and performs quality control on PV power or irradiance time series data. The pipeline executes a series of tasks to prepare the data for further analysis.
- Parameters:
power_col (str, optional) – The column name containing the power data to process. If None, the first column in the data frame is used.
min_val (float, default=-5) – Minimum allowable value in the data. Values below this threshold are set to NaN.
max_val (float, optional) – Maximum allowable value in the data. Values above this threshold are set to NaN.
zero_night (bool, default=True) – Whether to set nighttime values to zero.
interp_day (bool, default=True) – Whether to interpolate missing daytime values.
fix_shifts (bool, default=False) – Whether to attempt to automatically fix time shifts in the data.
round_shifts_to_hour (bool, default=True) – Whether to round detected time shifts to the nearest hour.
density_lower_threshold (float, default=0.6) – Lower threshold for data density during the quality flagging process.
density_upper_threshold (float, default=1.05) – Upper threshold for data density during the quality flagging process.
linearity_threshold (float, default=0.1) – Threshold for linearity checks during quality flagging.
clear_day_smoothness_param (float, default=0.9) – Smoothness parameter for clear day detection.
clear_day_energy_param (float, default=0.8) – Energy parameter for clear day detection.
verbose (bool, default=True) – Whether to print detailed information about the pipeline process.
start_day_ix (int, optional) – The starting index for data processing. If None, processing starts from the first day in the dataset.
end_day_ix (int, optional) – The ending index for data processing. If None, processing continues until the last day in the dataset.
time_shift_weight_change_detector (float, optional) – Weight parameter for the time shift change detector.
time_shift_weight_seasonal (float, default=1e-3) – Weight parameter for the seasonal time shift detector.
periodic_detector (bool, default=False) – Whether to use a periodic detector for time shifts.
solar_noon_estimator (str, default="srss") – Method to estimate solar noon. Options include “srss” (sunrise/sunset).
correct_tz (bool, default=True) – Whether to correct for timezone offsets in the data.
extra_cols (str, tuple, or list, optional) – Additional columns to process, typically containing extra data features.
daytime_threshold (float, default=0.005) – Threshold for detecting daytime in the data.
units (str, default="W") – Units of the power data. Can be “W” (watts) or “kW” (kilowatts).
solver (str, default="CLARABEL") – Solver to use for optimization tasks in the pipeline.
solver_convex (str, default="CLARABEL") – Solver to use for convex optimization tasks.
reset (bool, default=True) – Whether to reset the pipeline’s internal state before running.
- Returns:
None
- Return type:
NoneType
- Note:
If using the MOSEK solver, ensure that a valid license is available.
- score_data_set()
- setup_location_and_orientation_estimation(gmt_offset, solar_noon_method='optimized_estimates', daylight_method='optimized_estimates', data_matrix='filled', daytime_threshold=0.001)
Sets up the location and orientation estimation for the system using the ConfigurationEstimator from the pvsystemprofiler package.
This method initializes an instance of ConfigurationEstimator with the provided parameters and assigns it to the parameter_estimation attribute.
- Parameters:
gmt_offset – The GMT offset for the location in hours. This value adjusts for the local time zone relative to GMT.
solar_noon_method – Method for estimating solar noon. Options include “optimized_estimates” (default) and other methods depending on the estimator’s implementation.
daylight_method – Method for estimating daylight hours. Options include “optimized_estimates” (default) and other methods depending on the estimator’s implementation.
data_matrix – The type of data matrix to use. Default is “filled”. This parameter specifies how missing data should be handled.
daytime_threshold – The threshold for determining daytime. Default is 0.001.
- Returns:
None. The method sets the parameter_estimation attribute of the DataHandler object to an instance of ConfigurationEstimator.
- class solardatatools.PolarTransform(series, latitude, longitude, tz_offset=-8, boolean_selection=None, normalize_data=False)
Bases:
object- normalize_days()
- plot_transformation(figsize=(10, 6), ax=None, alpha=1.0, cmap='plasma', cbar=True)
- transform(elevation_round=1, azimuth_round=2, agg_func='mean')
- solardatatools.fix_daylight_savings_with_known_tz(df, tz='America/Los_Angeles', inplace=False)
- solardatatools.make_2d(df, key='dc_power', trim_start=False, trim_end=False, return_day_axis=False)
This function constructs a 2D array (or matrix) from a time series signal with a standardized time axis. The data is chunked into days, and each consecutive day becomes a column of the matrix.
- Parameters:
df – A pandas data frame contained tabular data with a standardized time axis.
key – The key corresponding to the column in the data frame contained the signal to make into a matrix
- Returns:
A 2D numpy array with shape (measurements per day, days in data set)
- solardatatools.make_time_series(df, return_keys=True, localize_time=-8, timestamp_key='ts', value_key='meas_val_f', name_key='meas_name', groupby_keys=['site', 'sensor'], filter_length=200)
Accepts a Pandas data frame extracted from a relational or Cassandra database. These queries often result in data with repeated timestamps, as you might have multiple columns stacked into rows in the database. Defaults are intended to work with GISMo’s VADER Cassandra database implementation.
Returns a data frame with a single timestamp index and the data from different systems split into columns.
- Parameters:
df – A Pandas data from generated from a query the VADER Cassandra database
return_keys – If true, return the mapping from data column names to site and system ID
localize_time – If non-zero, localize the time stamps. Default is PST or UTC-8
filter_length – The number of non-null data values a single system must have to be included in the output
- Returns:
A time-series data frame
- solardatatools.plot_2d(D, figsize=(12, 6), units='kW', clear_days=None, dates=None, year_lines=False, ax=None, color='red')
A function for plotting the power heat map for solar power data
- Parameters:
D – PV power data arranged as a matrix, typically the output of data_transforms.make_2d()
figsize – the size of the desired figure (passed to matplotlib)
units – the units of the power data
clear_days – a boolean array marking the location of clear days in the data set, typically the output of clear_day_detection.find_clear_days()
- Returns:
matplotlib figure
- solardatatools.standardize_time_axis(df, timeindex=True, power_col=None, datetimekey=None, correct_tz=True, verbose=True)
This function takes in a pandas data frame containing tabular time series data, likely generated with a call to pandas.read_csv(). It is assumed that each row of the data frame corresponds to a unique date-time, though not necessarily on standard intervals. This function will attempt to convert a user-specified column containing time stamps to python datetime objects, assign this column to the index of the data frame, and then standardize the index over time. By standardize, we mean reconstruct the index to be at regular intervals, starting at midnight of the first day of the data set. This solves a couple common data errors when working with raw data. (1) Missing data points from skipped scans in the data acquisition system. (2) Time stamps that are at irregular exact times, including fractional seconds.
- Parameters:
df – A pandas data frame containing the tabular time series data
datetimekey – An optional key corresponding to the name of the column that contains the time stamps
- Returns:
A new data frame with a standardized time axis
Algorithms#
- Algorithms
CapacityChangeClearSkyDetectionClippingDetectionDilationLossFactorAnalysisLossFactorAnalysis.estimate_degradation_rate()LossFactorAnalysis.estimate_losses()LossFactorAnalysis.holdout_validate()LossFactorAnalysis.make_problem()LossFactorAnalysis.plot_decomposition()LossFactorAnalysis.plot_mc_by_tau()LossFactorAnalysis.plot_mc_by_weight()LossFactorAnalysis.plot_mc_histogram()LossFactorAnalysis.plot_pie()LossFactorAnalysis.plot_waterfall()LossFactorAnalysis.report()
PVQuantilesShadeAnalysisShadeAnalysis.analyze_yearly_energy()ShadeAnalysis.has_runShadeAnalysis.make_osd_problem()ShadeAnalysis.plot_annotated_heatmap()ShadeAnalysis.plot_annotated_polar()ShadeAnalysis.plot_component()ShadeAnalysis.plot_transformed_data()ShadeAnalysis.plot_yearly_energy_analysis()ShadeAnalysis.run()ShadeAnalysis.transform_data()
SoilingAnalysisSunriseSunsetTimeShiftsoiling_seperation()soiling_seperation_old()- Submodules
Submodules#
- Circular Statistics
- Clear Day Detection
- Data Filling
- Data Handler
BooleanMasksDailyFlagsDailyScoresDailySignalsDataHandlerDataHandler.apply_time_dilation()DataHandler.augment_data_frame()DataHandler.auto_fix_time_shifts()DataHandler.calculate_scsf_performance_index()DataHandler.capacity_clustering()DataHandler.clipping_check()DataHandler.detect_clear_days()DataHandler.detect_clear_sky()DataHandler.estimate_latitude()DataHandler.estimate_location_and_orientation()DataHandler.estimate_longitude()DataHandler.estimate_orientation()DataHandler.estimate_quantiles()DataHandler.find_clipped_times()DataHandler.fit_statistical_clear_sky_model()DataHandler.fix_dst()DataHandler.generate_extra_matrix()DataHandler.get_daily_flags()DataHandler.get_daily_scores()DataHandler.get_density_scores()DataHandler.get_linearity_scores()DataHandler.make_data_matrix()DataHandler.make_filled_data_matrix()DataHandler.plot_bundt()DataHandler.plot_capacity_change_analysis()DataHandler.plot_cdf_analysis()DataHandler.plot_circ_dist()DataHandler.plot_clipping()DataHandler.plot_daily_energy()DataHandler.plot_daily_max_cdf()DataHandler.plot_daily_max_cdf_and_pdf()DataHandler.plot_daily_max_pdf()DataHandler.plot_daily_signals()DataHandler.plot_data_quality_scatter()DataHandler.plot_density_signal()DataHandler.plot_heatmap()DataHandler.plot_polar_transform()DataHandler.plot_time_shift_analysis_results()DataHandler.report()DataHandler.run_loss_factor_analysis()DataHandler.run_pipeline()DataHandler.score_data_set()DataHandler.setup_location_and_orientation_estimation()
- Data Quality
- Daytime
- Matrix Embedding
- Model Soiling
- Plotting
- Polar Transform
- PVPRO Post Processing
PVPROPostProcessorPVPROPostProcessor.analyze()PVPROPostProcessor.boundary_points()PVPROPostProcessor.boundary_to_nan()PVPROPostProcessor.data_setup()PVPROPostProcessor.error_analysis()PVPROPostProcessor.ln_df()PVPROPostProcessor.optimize()PVPROPostProcessor.plot_original_space()PVPROPostProcessor.plot_sd_space()PVPROPostProcessor.retreive_result()PVPROPostProcessor.scale_max_1()PVPROPostProcessor.sd_result_dfs()PVPROPostProcessor.view_minmax()
- Sensor Identification
- Signal Decompositions
- OSD Signal Decompositions
- CVXPY Signal Decompositions
- Solar Noon
- Sunrise Sunset
- Time Axis Manipulation
- Utilities