Inference API¶
Inference API is listening on port 8080 and only accessible from localhost by default. To change the default setting, see TorchServe Configuration.
There are three type of APIs:
API description - Describe TorchServe inference APIs with OpenAPI 3.0 specification
Health check API - Check TorchServe health status
Predictions API - Make predictions API call to TorchServe
API Description¶
To view a full list of inference API, you can use following command:
curl -X OPTIONS http://localhost:8080
The out is OpenAPI 3.0.1 json format. You can use it to generate client code, see swagger codegen for detail.
Health check API¶
TorchServe support a ping
API that user can check TorchServe health status:
curl http://localhost:8080/ping
Your response, if the server is running should be:
{
"health": "healthy!"
}
Predictions API¶
To run inference on the default version of each loaded model, user can make REST call to URI: /predictions/{model_name}.
POST /predictions/{model_name}
curl Example
curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg
curl -X POST http://localhost:8080/predictions/resnet-18 -T kitten.jpg
or:
curl -X POST http://localhost:8080/predictions/resnet-18 -F "data=@kitten.jpg"
To run inference on the specific version of each loaded model, user can make REST call to URI: /predictions/{model_name}/{version}.
POST /predictions/{model_name}/{version}
curl Example
curl -O https://s3.amazonaws.com/model-server/inputs/kitten.jpg
curl -X POST http://localhost:8080/predictions/resnet-18/2.0 -T kitten.jpg
or:
curl -X POST http://localhost:8080/predictions/resnet-18/2.0 -F "data=@kitten.jpg"
The result was some JSON that told us our image likely held a tabby cat. The highest prediction was:
{
"class": "n02123045 tabby, tabby cat",
"probability": 0.42514491081237793
}