Read Datasets
Access registered datasets to get their metadata and DataFrames or download as a CSV file
List Datasets
This section describes how you can list datasets registered with MarkovML in the current workspace
import markov
# Fetches all the datasets registered within the logged-in workspace
for dataset in markov.dataset.get_datasets():
print(dataset) # prints the metadata
The result would look something like this
{
"ds_prop": {
"name": "Resume Dataset Reduced",
"notes": "Contains the Resume dataset filtered from the original dataset in a 1/10 ratio",
"data_category": "text",
"delimiter": ",",
"df_id": "dYFoqzhBBCxR74uh",
"storage_type": "s3",
"x_indexes": [],
"y_index": -1,
"x_col_names": [
"Resume_str"
],
"y_name": "Category",
"storage_format": "csv",
"info": {},
"source": ""
},
"ds_paths": [
{
"segment_type": "train",
"path": "s3://XXXXXXXv/wsp-XXXXXXX/uido1o8s5sra7/Resume Dataset Reduced/reduced_resume_train.csv",
"multi_file": false
},
{
"segment_type": "test",
"path": "s3://XXXXXXX/wsp-XXXXXXXX/uido1o8s5sra7/Resume Dataset Reduced/reduced_resume_test.csv",
"multi_file": false
}
],
"ds_id": "3b64AfvqRsPaBVrmP",
"analysis_status": "RESULTS_AVAILABLE",
"df": null,
"cred_id": "XXXXXXXX",
"\_credentials": null
},
{
...
}
Fetch a Dataset
Markov allows you to fetch registered datasets using the following APIs. Know more about datasets in MarkovML in Datasets & Data Families
If you don't have any dataset registered with Markov, follow Register Datasets with MarkovML.
You can use the datasets to do the following:
- Get the feature column
- Get the target column
- Use the dataset segments (train/test/validate) as dataframes
- Get the number of columns in the dataset / segment
- Get the number of rows in the dataset / segment
- Download the dataset as csv
Fetch registered dataset using dataset ID
import markov
dataset = markov.dataset.get_by_id(dataset_id="paste_dataset_id_here")
# get the feature columns
features = dataset.features
# get the target column
target = dataset.target
# get the segments
segments = dataset.segments
# get the train segment's dataframe
dataset_train_dataframe = dataset.train.as_df()
# get the number of rows of the train segment
train_num_rows = dataset.train.num_rows
# get the number of columns of the test segment
test_num_cols = dataset.test.num_cols
# download the test segment as csv
dataset.test.download_as_csv(filepath="test.csv")
Fetch registered dataset using dataset name
import markov
dataset = markov.dataset.get_by_name(dataset_name="paste_dataset_name_here")
# get the feature columns
features = dataset.features
# get the target column
target = dataset.target
# get the segments
segments = dataset.segments
# get the train segment's dataframe
dataset_train_dataframe = dataset.train.as_df()
# get the number of rows of the train segment
train_num_rows = dataset.train.num_rows
# get the number of columns of the test segment
test_num_cols = dataset.test.num_cols
# download the test segment as csv
dataset.test.download_as_csv(filepath="test.csv")
Download Dataset Segment
You can use the dataset object to download one or more segments of the dataset.
import markov
# Fetch registered dataset by id
dataset = markov.dataset.get_by_id("<paste_dataset_id_here>")
# download the train segment of the dataset
dataset.train.download_as_csv(filepath="train.csv")
# downloads all segments of the dataset
dataset.download_as_csv()
View Dataset in the Web UI
import markov
dataset = markov.dataset.get_by_id("<paste_dataset_id_here>")
# get url of the dataset
url = dataset.get_url()
# view details page in browser of the dataset
dataset.view_details()
This would open your browser and prompt you to log in (if you haven't already).
Get Dataset Preview
You can retrieve a preview of a dataset's data by dataset ID
import markov
# Fetch registered dataset
dataset = markov.dataset.get_by_id("<paste_dataset_id_here>")
# Get preview
dataset.get_preview()
The dataset preview would look like this
{
"segments": [
"train",
"test"
],
"preview": {
"train": {
"data": [
"Unnamed: 0,order,sentiment,tweet_id,date,Query,handle,tweet",
"179491,379027,0,2052245373,Sat Jun 06 00:06:05 PDT 2009,NO_QUERY,torilovesbradie,@jess_0000 being stood up is the worst thing in the world ",
"211679,448154,0,2068887341,Sun Jun 07 14:52:55 PDT 2009,NO_QUERY,blackkitty,Splitting headache... About to pass out. Really sad because the Tony's are tonight! ",
"268518,1337563,4,2017684084,Wed Jun 03 08:47:19 PDT 2009,NO_QUERY,moodeey,I tweeted asking how to cancel a domain on godaddy yesterday and I got reply from @GoDaddyGuy with the instructions .. it's very nice ",
"132189,188370,0,1968867310,Fri May 29 22:24:53 PDT 2009,NO_QUERY,Kelvin_Anethema,Just burned my foot ",
"20072,717414,0,2259995617,Sat Jun 20 18:30:05 PDT 2009,NO_QUERY,lauraa15,@Jonasbrothers likee miley? LOL i wish i was there when are you coming back to chile?",
"298358,971255,4,1831132511,Sun May 17 18:16:37 PDT 2009,NO_QUERY,abeckb,\"Love love making random, last minute Sconnie plans for @courtneyfaile Hollerrr for double datin'!!!\"",
"141710,683485,0,2250287314,Sat Jun 20 00:08:54 PDT 2009,NO_QUERY,KHolwick,I lost my phone ",
"107872,458194,0,2071816448,Sun Jun 07 19:58:08 PDT 2009,NO_QUERY,artiseverything,I over fed myself man man man",
"203261,538147,0,2199000171,Tue Jun 16 16:52:41 PDT 2009,NO_QUERY,kebridgeman,\"Looks like someone cut my phone! Ugh... No phone, no internet...this sucks! Text me...it's all I got right now. \"",
"234716,1111240,4,1972194333,Sat May 30 08:45:14 PDT 2009,NO_QUERY,imhannahh,@AlanCarr Alphabeat-Fascnation ? Cant get more upbeat and happier than that "
],
"metadata": {
"line_separator": "\n"
}
},
"test": {
"data": [
"Unnamed: 0,order,sentiment,tweet_id,date,Query,handle,tweet",
"1,1295011,4,2003602665,Tue Jun 02 06:49:50 PDT 2009,NO_QUERY,jessiii_babiii,time to play mind games ",
"9,1505010,4,2072281940,Sun Jun 07 20:43:46 PDT 2009,NO_QUERY,brianjshoopman,@raingraves Indeed it was. I'll be seeing Mo Broaddus tomorrow so I'm teasing him about the upcoming "slumber party" with you & Wrath. ",
"10,617001,0,2226879429,Thu Jun 18 12:30:08 PDT 2009,NO_QUERY,franmesquish,hates it when i lose my train of thought and forget what i was going to look at on the tinternet ",
"13,1564370,4,2187296656,Mon Jun 15 20:04:36 PDT 2009,NO_QUERY,ladydollparts,@Buccah Disney cruise.. comes with complimentary Prince Charming ",
"29,986626,4,1834587595,Mon May 18 03:30:13 PDT 2009,NO_QUERY,voguex,@magicmillie maybe low fat snickers? we are making it with Mars bars as well ",
"32,806225,4,1468786164,Tue Apr 07 03:43:43 PDT 2009,NO_QUERY,SomersetBob,\"@John1954Moi No, not yet - this is the first time I've really disclosed anything about him \"",
"34,1310693,4,2013356851,Tue Jun 02 22:29:03 PDT 2009,NO_QUERY,AthenaATL,Happy birthday @SupBritt! now it's bedtimeee ",
"36,67425,0,1692330758,Sun May 03 19:45:55 PDT 2009,NO_QUERY,Pawel_Sarkowicz,\"Shit, I'm so tired I'm like falling asleep! I still gotta finish my project though Gotta stay awake O_O\"",
"40,331338,0,2012640890,Tue Jun 02 21:01:16 PDT 2009,NO_QUERY,ohsaby,@Usedink I smelt it ",
"42,336215,0,2013950296,Wed Jun 03 00:01:57 PDT 2009,NO_QUERY,mandu86,@JONESmichael can i not register and just have some tix plz? mine fell through "
],
"metadata": {
"line_separator": "\n"
}
}
},
"delimiter": ","
}
Updated 9 months ago