Skip to main content

Model Configuration YAML

Full example of a model configuration YAML file

The yaml example below exposes every supported configuration option for creating a model card.

workspace: "./src"
entry_point: "main_entry.py"
bootstrap: |
echo "Bootstrap start..."
sh ./config/bootstrap.sh
echo "Bootstrap finished"
inference_image_name: "fedml/fedml-default-inference-backend"
use_gpu: true
request_input_example: '{"text": "Hello"}'
authentication_token: "myPrivateToken"
data_cache_dir: "~/data_cache"
environment_variables:
TOP_K: "5"
PROMPT_STYLE: "llama_orca"
deploy_timeout: 600
auto_detect_public_ip: true
computing:
minimum_num_gpus: 1 # minimum # of GPUs to provision
maximum_cost_per_hour: $3000 # max cost per hour for your job per gpu card
resource_type: A100-80G # e.g., A100-80G,
#allow_cross_cloud_resources: true # true, false
#device_type: CPU # options: GPU, CPU, hybrid

Detailed specification

NameDefaultDescription
workspaceDirectory where your source code directory is located. [required]
entry_pointEntry point file name. [required]
bootstrap""Shell commands to install the dependency during setup stage.
inference_image_name"fedml/fedml-default-inference-backend"The base image for inference container.
use_gputrueEnable GPUs for inference. Only works for local, on-premise mode, for GPU cloud mode, please specify in computing
request_input_example""The input example of the inference endpoint. Will be shown on the UI for reference.
authentication_tokenrandomly generated by mlops backendThe authentication_token as a parameter in the inference curl command.
data_cache_dir""For on-premise mode, you can indicate a folder that will not be packaged into the model cards. Instead, the worker will read from the host machine.
environment_variablesNoneEnvironment variable that can be read in entry_point file.
deploy_timeout100Maximum waiting time for endpoint to be established.
auto_detect_public_ipfalseFor on-premise mode, auto detect the ip of the master and workers public ip.
computingNoneFor gpu cloud mode, indicate the resource you need for inference. You can visiting URL and check: https://open-dev.fedml.ai/compute/distributed .