Data Quality
Estimate your data quality using MarkovML trust estimate.
Label Trust Estimate
Label Trust Estimate measures the potentially mislabeled records in the dataset. You can download the dataset with the estimates using the MarkovML SDK.
Make sure to Register Datasets with MarkovML. Currently, only text datasets are supported. You can find overall label quality estimate on the top right.
Code
import markov
dataset = markov.dataset.get_by_name(dataset_name="Sentiment Analysis Tweets")
# Access the data quality information
data_quality = dataset.quality
# Access the data quality metrics as a DataFrame
data_quality.df
# Retrieve a direct download link for data quality data frame
data_quality.url
Sample Result
is_label_issue label_quality ... text feeling
0 False 0.818080 ... im feeling rather rotten, so I'm not very ambitious... sadness
1 False 0.789854 ... im updating my blog because I feel shitty sadness
1998 False 0.052096 ... i keep feeling like someone is being unki... anger
1999 True 0.123182 ... i feel all weird when i have to meet w people ... fear
The data frame following columns
is_label_issue
: A boolean indicating whether there are issues with the labels in the dataset.label_quality
: A numerical score representing the quality of the labels in the dataset.- Other columns are the original labels (target) and feature the user selected during dataset registration.
Updated 9 months ago
What’s Next