Model Configuration YAML

Full example of a model configuration YAML file

The yaml example below exposes every supported configuration option for creating a model card.

workspace: "./src"
entry_point: ""
bootstrap: |
echo "Bootstrap start..."
sh ./config/
echo "Bootstrap finished"
inference_image_name: "fedml/fedml-default-inference-backend"
enable_custom_image: false
image_pull_policy: "IfNotPresent"
docker_registry_user_name: fedml
docker_registry_user_password: passwd
docker_registry: fedml-official
entry_cmd: tritonserver --model-repository=/model
port_inside_docker: 8000
server_external_port: 2204
server_internal_port: 2203
use_gpu: true
use_triton: true
request_input_example: '{"text": "Hello"}'
authentication_token: "myPrivateToken"
data_cache_dir: "~/data_cache"
TOP_K: "5"
PROMPT_STYLE: "llama_orca"
deploy_timeout: 600
auto_detect_public_ip: true
minimum_num_gpus: 1 # minimum # of GPUs to provision
maximum_cost_per_hour: $3000 # max cost per hour for your job per gpu card
resource_type: A100-80G # e.g., A100-80G,
#allow_cross_cloud_resources: true # true, false
#device_type: CPU # options: GPU, CPU, hybrid

Detailed specification

workspaceDirectory where your source code directory is located. [required]
entry_pointEntry point file name. [required]
bootstrap""Shell commands to install the dependency during setup stage.
inference_image_name"fedml/fedml-default-inference-backend"The base image for inference container.
enable_custom_imagefalseIf you used image other than fedml official image, which is listed in Advanced Features, you need to set it to true.
image_pull_policy"IfNotPresent"When start to deploy / update a endpoint, indicate whether to pull the image (name:tag) again. Could be either "IfNotPresent" or "Always".
docker_registry_user_name / password NoneUsername password for your docker registry
entry_cmdNoneIf you used your own image, here you can indicate the entry cmd(s) for that container.
port_inside_docker2345Inside a container, we default mount 2345 to the host machine port. But if you want to use another port inside container, please indicate here.
worker_portrandomIn the host machine, we default randomly open a port and mount to a port inside docker. This might conflict to your firewall policy. So here you can indicate a accessible one.
use_gputrueEnable GPUs for inference. Only works for local, on-premise mode, for GPU cloud mode, please specify in computing
use_tritonfalseSet to true if your image is a purely provide the triton server service.
request_input_example""The input example of the inference endpoint. Will be shown on the UI for reference.
authentication_tokenrandomly generated by mlops backendThe authentication_token as a parameter in the inference curl command.
data_cache_dir""For on-premise mode, you can indicate a folder that will not be packaged into the model cards. Instead, the worker will read from the host machine.
environment_variablesNoneEnvironment variable that can be read in entry_point file.
deploy_timeout900Maximum waiting time for endpoint to be established.
auto_detect_public_ipfalseFor on-premise mode, auto detect the ip of the master and workers public ip.
computingNoneFor gpu cloud mode, indicate the resource you need for inference. You can visiting URL and check: