Read Dataset

Access registered datasets to get their metadata and DataFrames or download as a CSV file

Whether you want to review a dataset you have previously registered or use it for machine learning tasks, here are some ways and formats to fetch and read your MarkovML registered datasets.

1. List Datasets

Use the get_datasets()method to view all datasets registered within your logged-in MarkovML workspace. The results will be in JSON format and display the metadata of each dataset.

Sample Code

import markov

# Fetches all the datasets registered within the logged-in workspace

for dataset in markov.dataset.get_datasets():  
    print(dataset) # prints the metadata  

JSON Results

{  
 "ds_prop": {  
  "name": "Resume Dataset  Reduced",  
  "notes": "Contains the Resume dataset filtered from the original dataset in a 1/10 ratio",  
  "data_category": "text",  
  "delimiter": ",",  
  "df_id": "dYFoqzhBBCxR74uh",  
  "storage_type": "s3",  
  "x_indexes": [],  
  "y_index": -1,  
  "x_col_names": [  
  	"Resume_str"  
  ],  
  "y_name": "Category",  
  "storage_format": "csv",  
  "info": {},  
  "source": ""  
 },  
 "ds_paths": [  
  {  
   "segment_type": "train",  
   "path": "s3://XXXXXXXv/wsp-XXXXXXX/uido1o8s5sra7/Resume Dataset  Reduced/reduced_resume_train.csv",  
   "multi_file": false  
  },  
  {  
   "segment_type": "test",  
   "path": "s3://XXXXXXX/wsp-XXXXXXXX/uido1o8s5sra7/Resume Dataset  Reduced/reduced_resume_test.csv",  
   "multi_file": false  
  }  
 ],  
 "ds_id": "3b64AfvqRsPaBVrmP",  
 "analysis_status": "RESULTS_AVAILABLE",  
 "df": null,  
 "cred_id": "XXXXXXXX",  
 "\_credentials": null  
},  
{  
 ...  
}

2. Fetch Registered Dataset

With Markov SDK, you can fetch registered datasets by their ID using dataset.get_by_id() or Name usingdataset.get_by_name().

You can use it to get the following information:

  • Get the feature column.
  • Get the target column.
  • Use the dataset segments (train/test/validate) as dataframes.
  • Get the number of columns in the dataset / segment.
  • Get the number of rows in the dataset / segment.
  • Download the dataset as csv.

To find the dataset ID from the MarkovML UI, go to Dataset in the navigation bar and select your dataset. On the right side of the page, you will see details like dataset name, ID, data family, and more. You can also find the dataset ID in the URL.

1. Fetch Registered Dataset using Dataset ID

Sample Code

import markov

# fetch dataset using dataset ID
dataset = markov.dataset.get_by_id(dataset_id="paste_dataset_id_here")

# Get the following information 
# get the feature columns
features = dataset.features

# get the target column
target = dataset.target

# get the segments
segments = dataset.segments

# get the train segment's dataframe
dataset_train_dataframe = dataset.train.as_df()

# get the number of rows of the train segment
train_num_rows = dataset.train.num_rows

# get the number of columns of the test segment
test_num_cols = dataset.test.num_cols

# download the test segment as csv
dataset.test.download_as_csv(filepath="test.csv")

2. Fetch Registered Dataset using Dataset Name

Sample Code

import markov

# fetch dataset using dataset Name
dataset = markov.dataset.get_by_name(dataset_name="paste_dataset_name_here")

# Get the following information 
# get the feature columns
features = dataset.features

# get the target column
target = dataset.target

# get the segments
segments = dataset.segments

# get the train segment's dataframe
dataset_train_dataframe = dataset.train.as_df()

# get the number of rows of the train segment
train_num_rows = dataset.train.num_rows

# get the number of columns of the test segment
test_num_cols = dataset.test.num_cols

# download the test segment as csv
dataset.test.download_as_csv(filepath="test.csv")

Use the Fetched Dataset to perform various tasks

1. Download Dataset Segment

You can use the dataset object to download one or more segments of the dataset, such as the training set of a registered dataset, as shown below:

Sample Code

import markov

# Fetch registered dataset by id
dataset = markov.dataset.get_by_id("<paste_dataset_id_here>")

# download the train segment of the dataset
dataset.train.download_as_csv(filepath="train.csv")

# downloads all segments of the dataset
dataset.download_as_csv()

2. View Dataset in the Web UI

Once you have fetched your dataset using its ID, you can easily get its URL with the get_url() method. To view it in your browser, simply use the view_details() method. This will open your browser and might ask you to log in if you haven't already.

Sample Code

import markov

# Fetch registered dataset by id
dataset = markov.dataset.get_by_id("<paste_dataset_id_here>")

# get url of the dataset
url = dataset.get_url()

# view details page in browser of the dataset
dataset.view_details()

Browser View

3. Get Dataset Preview

To preview a specific dataset, use the get_preview() method to view the fetched dataset. This method retrieves a JSON preview of the dataset. While get_datasets() lists all datasets in your logged-in workspace, get_by_id() fetches a specific dataset, and get_preview() lets you view it in JSON format.

Sample Code

import markov

# Fetch registered dataset
dataset = markov.dataset.get_by_id("<paste_dataset_id_here>")

# Get preview
dataset.get_preview()

JSON Result

{
 "segments": [
  "train",
  "test"
 ],
 "preview": {
  "train": {
   "data": [
    "Unnamed: 0,order,sentiment,tweet_id,date,Query,handle,tweet",
    "179491,379027,0,2052245373,Sat Jun 06 00:06:05 PDT 2009,NO_QUERY,torilovesbradie,@jess_0000 being stood up is the worst thing in the world ",
    "211679,448154,0,2068887341,Sun Jun 07 14:52:55 PDT 2009,NO_QUERY,blackkitty,Splitting headache... About to pass out. Really sad because the Tony's are tonight! ",
    "268518,1337563,4,2017684084,Wed Jun 03 08:47:19 PDT 2009,NO_QUERY,moodeey,I tweeted asking how to cancel a domain on godaddy yesterday and I got reply from @GoDaddyGuy with the instructions .. it's very nice ",
    "132189,188370,0,1968867310,Fri May 29 22:24:53 PDT 2009,NO_QUERY,Kelvin_Anethema,Just burned my foot ",
    "20072,717414,0,2259995617,Sat Jun 20 18:30:05 PDT 2009,NO_QUERY,lauraa15,@Jonasbrothers likee miley? LOL i wish i was there  when are you coming back to chile?",
    "298358,971255,4,1831132511,Sun May 17 18:16:37 PDT 2009,NO_QUERY,abeckb,\"Love love making random, last minute Sconnie plans for @courtneyfaile   Hollerrr for double datin'!!!\"",
    "141710,683485,0,2250287314,Sat Jun 20 00:08:54 PDT 2009,NO_QUERY,KHolwick,I lost my phone ",
    "107872,458194,0,2071816448,Sun Jun 07 19:58:08 PDT 2009,NO_QUERY,artiseverything,I over fed myself  man man man",
    "203261,538147,0,2199000171,Tue Jun 16 16:52:41 PDT 2009,NO_QUERY,kebridgeman,\"Looks like someone cut my phone! Ugh... No phone, no internet...this sucks! Text me...it's all I got right now. \"",
    "234716,1111240,4,1972194333,Sat May 30 08:45:14 PDT 2009,NO_QUERY,imhannahh,@AlanCarr Alphabeat-Fascnation  ? Cant get more upbeat and happier than that "
   ],
   "metadata": {
    "line_separator": "\n"
   }
  },
  "test": {
   "data": [
    "Unnamed: 0,order,sentiment,tweet_id,date,Query,handle,tweet",
    "1,1295011,4,2003602665,Tue Jun 02 06:49:50 PDT 2009,NO_QUERY,jessiii_babiii,time to play mind games ",
    "9,1505010,4,2072281940,Sun Jun 07 20:43:46 PDT 2009,NO_QUERY,brianjshoopman,@raingraves Indeed it was. I'll be seeing Mo Broaddus tomorrow so I'm teasing him about the upcoming &quot;slumber party&quot; with you &amp; Wrath. ",
    "10,617001,0,2226879429,Thu Jun 18 12:30:08 PDT 2009,NO_QUERY,franmesquish,hates it when i lose my train of thought and forget what i was going to look at on the tinternet ",
    "13,1564370,4,2187296656,Mon Jun 15 20:04:36 PDT 2009,NO_QUERY,ladydollparts,@Buccah Disney cruise.. comes with complimentary Prince Charming ",
    "29,986626,4,1834587595,Mon May 18 03:30:13 PDT 2009,NO_QUERY,voguex,@magicmillie maybe low fat snickers? we are making it with Mars bars as well ",
    "32,806225,4,1468786164,Tue Apr 07 03:43:43 PDT 2009,NO_QUERY,SomersetBob,\"@John1954Moi No, not yet - this is the first time I've really disclosed anything about him \"",
    "34,1310693,4,2013356851,Tue Jun 02 22:29:03 PDT 2009,NO_QUERY,AthenaATL,Happy birthday @SupBritt! now it's bedtimeee ",
    "36,67425,0,1692330758,Sun May 03 19:45:55 PDT 2009,NO_QUERY,Pawel_Sarkowicz,\"Shit, I'm so tired I'm like falling asleep! I still gotta finish my project though  Gotta stay awake O_O\"",
    "40,331338,0,2012640890,Tue Jun 02 21:01:16 PDT 2009,NO_QUERY,ohsaby,@Usedink I smelt it ",
    "42,336215,0,2013950296,Wed Jun 03 00:01:57 PDT 2009,NO_QUERY,mandu86,@JONESmichael  can i not register and just have some tix plz? mine fell through "
   ],
   "metadata": {
    "line_separator": "\n"
   }
  }
 },
 "delimiter": ","
}

What’s Next