Markov Quality Score
Estimate your data quality using MarkovML trust estimate.
Markov Quality Score helps you measure the potentially mislabeled records in the dataset. You can download the dataset with the estimates with Markov SDK.
Make sure to Register Datasets with MarkovML first. Currently, only text and numeric datasets are supported. You can find the overall Markov Quality Score at the top right corner of the Dataset Details page.
Getting Markov Quality Score with Markov SDK
Use the Markov SDK to fetch the Markov Quality Score for a dataset. Fetch the dataset you are interested in by its name. Get the dataset's quality metrics and store it as a DataFrame. Finally, obtain a direct download link for the data quality DataFrame using the dataset.url
feature.
The DataFrame includes the following columns:
is_label_issue
: Shows if there are problems with dataset labels (True/False).label_quality
: Rates the quality of the dataset labels (numerical score).- Other columns are the original labels (target) and features the user selected during dataset registration.
Sample Code
import markov
#Fetch the dataset by name
dataset = markov.dataset.get_by_name(dataset_name="Sentiment Analysis Tweets")
# Access the data quality information
data_quality = dataset.quality
# Access the data quality metrics as a DataFrame
data_quality.df
# Retrieve a direct download link for data quality data frame
data_quality.url
Sample Result
is_label_issue label_quality ... text feeling
0 False 0.818080 ... im feeling rather rotten, so I'm not very ambitious... sadness
1 False 0.789854 ... im updating my blog because I feel shitty sadness
1998 False 0.052096 ... i keep feeling like someone is being unki... anger
1999 True 0.123182 ... i feel all weird when i have to meet w people ... fear
Updated about 2 months ago