Data Quality#
Data Quality Checking Module
This module contains functions for identifying corrupt or bad quality data.
- solardatatools.data_quality.daily_missing_data_simple(data_matrix, threshold=0.2, return_density_signal=False)#
This function takes a PV power data matrix and returns a boolean array, identifying good days. The good days are the ones that are not missing a significant amount of data. This assessment is made based on the fraction of non-zero and non-NaN values each day. In a typical “good” data set, around 40-60% of the measured values each day will be non-zero. The default threshold for this function is 20%.
- Parameters:
data_matrix – numpy.array, a matrix containing PV power signals
threshold – float, the threshold to identify good days
- Returns:
a boolean array, with a True if the day passes the test and a False if the day fails
- solardatatools.data_quality.dataset_quality_score(data_matrix, threshold=0.2, good_days=None, use_advanced=True)#
This function scores a complete data set. The score is the fraction of days in the data set that pass the missing data test. A score of 1 means all the days in the data set pass the test and are not missing data.
- Parameters:
data_matrix – numpy.array, a matrix containing PV power signals
threshold – float, the threshold to identify good days
- Returns:
the score, a float between 0 and 1
- solardatatools.data_quality.make_density_scores(data_matrix, threshold=0.2, return_density_signal=False, return_fit=False, solver=None)#
- solardatatools.data_quality.make_linearity_scores(data_matrix, capacity, density_baseline)#
- solardatatools.data_quality.make_quality_flags(density_scores, linearity_scores, density_lower_threshold=0.6, density_upper_threshold=1.05, linearity_threshold=0.1)#