Add Embedding Recorder
To add individual embedding records to a MarkovML evaluation recording, you need both the individual dataset record and the corresponding embedding. Both of these can be provided as a list.
embedding_recorder.ds_columns
can be used as a reference to understand the columns for which the values need to be populated for individual dataset records.
Sample Code
import markov
from markov import EmbeddingRecorder
# get dataset by name
data_set = markov.dataset.get_by_name("my_dataset_name")
# You can also get dataset by id
dataset = markov.dataset.get_by_id(dataset_id)
# Custom Embeddings for any segment can be added, Train segment is used here as an example,
train_df = dataset.train.as_df()
embedding_recorder = EmbeddingRecorder(
name="Custom embedding name",
dataset_id=dataset_id,
notes="Optional description for this custom embedding "
)
embedding_recorder.register()
for _, row in train_df[embedding_recorder.ds_columns].iterrows():
# Get embeddings for a particular row.
# A list of values is expected corresponding to the embedding for this particular record.
# User defined method to generate embeddings, This is usually generated from a trained model.
embedding: List[float] = get_embeddings(row)
embedding_recorder.add_embedding_record(row.tolist(),embedding)
# its important to call finish to signal to MarkovML that all embeddings have been uploaded
embedding_recorder.finish()
Note
The
get_embeddings()
is a psudo function the user will create to get model embeddings.
Updated about 2 months ago