Build your Custom Model
Build your custom model using Markov SDK.
As mentioned in the quick intro, building custom models involves specifying inference stages and actions using MarkovML operators. These stages are executed in sequence, and the results are packaged into a single entity known as a custom model.
Define your Inference Model
The inference pipeline requires the following inputs from you.
- Name: Name your inference pipeline for your reference.
- Sample: Give the sample inputs as given by the users.
- Schema (mandatory): Schema provides the target columns along with their features and data types (e.g., string or numbers) within the dataset that the model will use to make predictions. This information is mandatory, as it enables the model to understand the user input data and generate accurate predictions.
Convert schema and sample to Markov backend acceptable format
Use the utility
infer_schema_and_samples_from_dataframe
to convert your input data frame into schema and samples in the format accepted by the Markov backend.schema, samples = infer_schema_and_samples_from_dataframe(sample_input)
Sample Code
import os.path
import markov
import pandas as pd
# markov imports
from markov.api.models.artifacts.base import (
MarkovPredictor,
MarkovPyfunc,
infer_schema_and_samples_from_dataframe,
)
from markov.api.models.artifacts.inference_pipeline import InferencePipeline
# To create a model app you will need to provide samples from your test/train set.
# You can sample some rows from your test/train dataframe to register with MarkovML.
samples = ["Generative AI has been impacting the industry trends at a very fast pace."]
# Note the content here is the feature column in AG News Dataset,
# You just need to provide a dataframe with a few examples.
sample_input = pd.DataFrame([{"content": samples}])
# Use the utility `infer_schema_and_samples_from_dataframe` to convert your input dataframe
# into schema and samples in the format accepted by Markov backend
schema, samples = infer_schema_and_samples_from_dataframe(sample_input)
#Define your inference model
my_inference_model = InferencePipeline(
name="pytorch-text-classifier-demo", # 1: Add the inference model name
schema=schema, # 2: Give the schema (mandatory)
samples=samples, # 3: User sample inputs
)
Note
Currently, we only support the addition of samples through pandas DataFrames.
Inference Pipeline Operators
Every step in the pipeline is known as a “stage.” The add_pipeline_stage()
method allows you to add stages to the pipeline.
MarkovML provides 3 operators to specify the action performed during each stage. They are:
MarkovPyfunc
: This operator specifies that the stage includes a function. The function can be any Python function you want to use for that stage.MarkovPredictor
: This operator specifies that the stage includes the trained model to perform prediction-related tasks. It also specifies the type of framework used for training the model, as well as the model type, such as Sklearn or PyTorch.Note
MarkovML supports lightGBM, PyTorch, sklearn, and XGBoost frameworks. Use
MarkovSupportedFlavours
to call MarkovML supported framework, which is also referred to as MarkovML Supported Flavours.MarkovTransformer
: This operator specifies that the stage includes a training model usingtransform
method instead ofpredict
in your ML library.
For example, sklearn library has a transformer calledTruncatedSVD
. If you need to callTruncatedSVD.transform()
rather thanTruncatedSVD.predict()
, you useMarkovTransformer
.
Stages of Inference Pipeline
Inference pipeline, comprised of "stages" that execute each of the defined tasks. The stages are basically of three types, as shown below:
Stage 1: Pre-processing
Add your pre-processing functions with all the pre-processing steps needed for the dataset.
You can use MarkovPyfunc
operator to add the pre-processing function to this stage and name the stage. For example, name the stage as “preprocess.”
Sample Code
...
# Model Preprocessing stage
stage=MarkovPyfunc(
name="preprocess", pyfunc= model.dataset_handler.process_text
)
...
Stage 2: Trained Model
Use the MarkovPredictor
operator to add the model with the MarkovML-supported flavor or framework and name the stage. For example, name the stage “pytorch_predictor,” as shown in the below sample code.
Sample Code
# Your trained model
# Get your model and assign it to model
model = get_trained_model()
...
# Model Prediction stage
stage=MarkovPredictor(
name="pytorch_predictor", model=model, flavour=MarkovSupportedFlavours.PYTORCH
)
...
Stage 3: Post-Processing
Once you have the model predictions, you might want to convert them to your desired format. For example, your model is returning 0 and 1 and you want to map it to Negative and positive.
You can use the MarkovPyfunc
operator to add your post_process
function to this stage.
Sample Code
# Sample code for post-processing where the model predictions are mapped to the following
# Dataset used is AG News dataset
# 1 indicate it's the world news, followed by 2 as Sports news, 3 as Business news
# and 4 as Sci/Tech news
def post_process(prediction):
ag_news_label = {1: "World", 2: "Sports", 3: "Business", 4: "Sci/Tec"}
prediction_int = prediction.argmax(1).item() + 1
return ag_news_label[prediction_int]
...
# Post processing stage
stage=MarkovPyfunc(name="post_process", pyfunc=post_process)
...
Your Complete Custom Model Inference Pipeline
Use the add_pipeline_stage()
method to add your stages to the pipeline. You can also add environment requirements to the inference pipeline using theadd_pip_requirements()
method and other dependent code file paths, such as the Python files used to train your custom model, using add_dependent_code()
method.
Note
Stages must be in the correct order of execution.
import os.path
import markov
import pandas as pd
# markov imports
from markov.api.models.artifacts.base import (
MarkovPredictor,
MarkovPyfunc,
infer_schema_and_samples_from_dataframe,
)
from markov.api.models.artifacts.inference_pipeline import InferencePipeline
from markov.library.dependencies_helper import pytorch_pip_requirements
from markov.library.mlflow_helper import MarkovSupportedFlavours
from train_model import get_trained_model
# Sample and Schema
samples = ["Generative AI has been impacting the industry trends at a very fast pace."]
sample_input = pd.DataFrame([{"content": samples}])
schema, samples = infer_schema_and_samples_from_dataframe(sample_input)
# Define inference model
my_inference_model = InferencePipeline(
name="pytorch-text-classifier-demo",
schema=schema,
samples=samples,
)
...
# Build your inference pipeline
# Add stages to the Inference Pipeline
my_inference_model.add_pipeline_stage(
# pre-processing stage
stage=MarkovPyfunc(
name="preprocess", pyfunc= model.dataset_handler.process_text
).add_pipeline_stage(
# Model Predictions
stage=MarkovPredictor(
name="pytorch_predictor", model=model, flavour=MarkovSupportedFlavours.PYTORCH
)
).add_pipeline_stage(
# Post-processing
stage=MarkovPyfunc(name="post_process", pyfunc=post_process)
).add_pip_requirements(
# Add requirements if missing
pytorch_pip_requirements()
).add_dependent_code(
# Add any dependent code
code_paths=[os.path.join(get_current_directory_path(), 'train_model.py')]
)
Updated about 2 months ago