Add Embedding Recorder

To add individual embedding records to a MarkovML evaluation recording, you need both the individual dataset record and the corresponding embedding. Both of these can be provided as a list.

embedding_recorder.ds_columns can be used as a reference to understand the columns for which the values need to be populated for individual dataset records.

Sample Code

import markov
from markov import EmbeddingRecorder

# get dataset by name
data_set = markov.dataset.get_by_name("my_dataset_name")

# You can also get dataset by id
dataset = markov.dataset.get_by_id(dataset_id)

# Custom Embeddings for any segment can be added, Train segment is used here as an example, 
train_df = dataset.train.as_df() 

embedding_recorder = EmbeddingRecorder(
  	name="Custom embedding name", 
  	dataset_id=dataset_id,
  	notes="Optional description for this custom embedding "
)
embedding_recorder.register()
for _, row in train_df[embedding_recorder.ds_columns].iterrows():
 # Get embeddings for a particular row. 
 # A list of values is expected corresponding to the embedding for this particular record.
 # User defined method to generate embeddings, This is usually generated from a trained model.
	embedding: List[float] = get_embeddings(row)
	embedding_recorder.add_embedding_record(row.tolist(),embedding)

# its important to call finish to signal to MarkovML that all embeddings have been uploaded
embedding_recorder.finish()
	
	

📘

Note

The get_embeddings() is a psudo function the user will create to get model embeddings.