Model Configuration YAML
Full example of a model configuration YAML file
The yaml example below exposes every supported configuration option for creating a model card.
workspace: "./src"
entry_point: "main_entry.py"
bootstrap: |
echo "Bootstrap start..."
sh ./config/bootstrap.sh
echo "Bootstrap finished"
inference_image_name: "fedml/fedml-default-inference-backend"
use_gpu: true
request_input_example: '{"text": "Hello"}'
authentication_token: "myPrivateToken"
data_cache_dir: "~/data_cache"
environment_variables:
TOP_K: "5"
PROMPT_STYLE: "llama_orca"
deploy_timeout: 600
auto_detect_public_ip: true
computing:
minimum_num_gpus: 1 # minimum # of GPUs to provision
maximum_cost_per_hour: $3000 # max cost per hour for your job per gpu card
resource_type: A100-80G # e.g., A100-80G,
#allow_cross_cloud_resources: true # true, false
#device_type: CPU # options: GPU, CPU, hybrid
Detailed specification
Name | Default | Description |
---|---|---|
workspace | Directory where your source code directory is located. [required] | |
entry_point | Entry point file name. [required] | |
bootstrap | "" | Shell commands to install the dependency during setup stage. |
inference_image_name | "fedml/fedml-default-inference-backend" | The base image for inference container. |
use_gpu | true | Enable GPUs for inference. Only works for local, on-premise mode, for GPU cloud mode, please specify in computing |
request_input_example | "" | The input example of the inference endpoint. Will be shown on the UI for reference. |
authentication_token | randomly generated by mlops backend | The authentication_token as a parameter in the inference curl command. |
data_cache_dir | "" | For on-premise mode, you can indicate a folder that will not be packaged into the model cards. Instead, the worker will read from the host machine. |
environment_variables | None | Environment variable that can be read in entry_point file. |
deploy_timeout | 100 | Maximum waiting time for endpoint to be established. |
auto_detect_public_ip | false | For on-premise mode, auto detect the ip of the master and workers public ip. |
computing | None | For gpu cloud mode, indicate the resource you need for inference. You can visiting URL and check: https://open-dev.fedml.ai/compute/distributed . |