Datasets & Data Families
A Dataset represents a collection of examples, where each example consists of one or more variables along with a label or target.
By registering your datasets with MarkovML, you gain insights into essential characteristics such as distributions, correlations between columns, frequency of empty values, and more. This analysis aids in understanding your data better and facilitates informed decision-making during model training and evaluation processes.
A single MarkovML Dataset can be segmented or unsegmented. ML engineers frequently divide datasets into segments to train, test, and/or validate a model. MarkovML allows you to specify different dataset segments and provides insights into how your train, test, and validate segments compare.
Data Family
To help keep your datasets organized, each dataset you register with MarkovML is associated with a data family. A data family is a set of one or more Datasets that share a similar schema. Organizing your datasets into data families makes it much easier to locate a particular dataset when needed.
Updated about 2 months ago