Skip to main content
We’ve updated our Terms of Service. A new AI Addendum clarifies how Stack Overflow utilizes AI interactions.
Filter by
Sorted by
Tagged with
0 votes
0 answers
44 views

BrokenPipeError: [Errno 32] Broken pipe - Calculating throughput

I am trying to calculate throughput for my Nvidia Triton Server. I want to send 10k requests from my client and want to pile them up on the server. Only after all the 10k requests are sent by the ...
Tanay Joshi's user avatar
0 votes
0 answers
95 views

Issues Using Essentia Models For Music Tagging

BACKGROUNG: I was using some models to generate tags for music such as genre, mood, and instruments in the music (audio file). The original models were in .pb extension. The models are available on ...
Moeez.ktk's user avatar
0 votes
0 answers
45 views

What makes triton return 503 error sometime?

I had deployed 14 models on triton server and called them with 100 http rest api request at once after finishing them, calling over again and over again. Firsttime deploying, it looks fine. But after ...
semenbari's user avatar
  • 805
1 vote
1 answer
718 views

Cannot get CUDA device count, GPU metrics will not be available , Nvidia triton server issue in docker

I am trying to run nvidia inference server through docker I got the correct Image of triton server from docker but when docker logs sample-tis-22.04 --tail 40 It shows this : I0610 15:59:37.597914 1 ...
MFaiqKhan's user avatar
  • 113
0 votes
1 answer
256 views

Customizing deployment with Model Analyzer in NVIDIA Triton Server

I am following the tutorial from NVIDIA Triton Server and am currently on the 3rd step to getting to know deployments of ML models. The step involves installing the Model Analyser Module and there is ...
Atharav Jadhav's user avatar
0 votes
0 answers
490 views

TensorRT inference with Triton Server Docker

I'm studying how to user the combination of tensorRT and triton. I'm working in this server: NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2 Ubuntu 22.04 and I've ...
Simone Grassi's user avatar
0 votes
1 answer
933 views

Triton inference server does not have onnx backend

nvcr.io/nvidia/tritonserver:24.02-py3 this image doesn't have onnx backend i have been following this tutorial "https://github.com/triton-inference-server/tutorials/tree/main/Conceptual_Guide/...
Nirmesh's user avatar
  • 305
0 votes
1 answer
1k views

CUDA error: device-side assert triggered on tensor.to(device='cuda')

An ML Model is running under Triton Inference Server on a GPU instance group and after a certain amount of successful inferences starts throwing the exception: CUDA error: device-side assert triggered ...
Dan M's user avatar
  • 1,282
1 vote
2 answers
4k views

ONNX Runtime: io_binding.bind_input causing "no data transfer from DeviceType:1 to DeviceType:0"

I am using Nvidia Triton Inference Server and ONNX model for inference on a GPU instance. The Dockerfile, containing the environment, inference server and models contains following from/pip lines: ...
Dan M's user avatar
  • 1,282
0 votes
1 answer
292 views

How to configure AWS API Gateway for NVIDIA Triton's Binary Data Protocol with AWS SageMaker?

I've deployed a model using the NVIDIA Triton Inference Server on AWS SageMaker and am attempting to expose it through a REST API using AWS API Gateway. This would make it accessible to clients. ...
lucidyan's user avatar
  • 3,913
2 votes
1 answer
752 views

Fail to convert tensorflow model to onnx in nvidia NGC tensorflow container

I follow instructions in triton-inference-server/tutorials to convert a tensorflow model to onnx with the purpose of testing the triton inference server. However, the conversion fails inside of ngc ...
shijie xu's user avatar
  • 2,107
1 vote
1 answer
515 views

Loader Constraint Violation for class io.grpc.Channel when trying to create ManagedChannel for GRPC Request

I'm trying to setup grpc client to make inference requests to Nvidia Triton inference server (version:23.06-py3) in Kotlin for my project. I've setup protoc code generation using gradle (attached ...
Ayush Vachaspati's user avatar
0 votes
1 answer
468 views

Converting triton container to work with sagemaker MME

I have a custom triton docker container that use a python backend. This container works perfectly on local. Here is the container dockerfile (I have ommitted irrelevant parts). ARG ...
toing_toing's user avatar
  • 2,462
0 votes
1 answer
1k views

How to set up configuration file for sagemaker triton inference?

I have been looking examples and ran into this from aws, https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-triton/ensemble/sentence-transformer-trt/examples/ensemble_hf/bert-trt/...
luwa's user avatar
  • 39
0 votes
1 answer
2k views

Error Code 1: Serialization (Serialization assertion magicTagRead == kMAGIC_TAG failed.Magic tag does not match) Trion Inference Server

I run nvcr.io/nvidia/tritonserver:23.01-py3 docker image with the following command docker run --gpus=0 --rm -it --net=host -v ${PWD}/models:/models nvcr.io/nvidia/tritonserver:23.01-py3 ...
Long Vu's user avatar
1 vote
1 answer
554 views

How to create 4d array with random data using numpy random

My model accepts data in the shape(1, 32, 32, 3), I am looking for a way to pass the data using np.array from numpy. Any help on this will be appreciated please
Mahesh's user avatar
  • 51
0 votes
1 answer
467 views

How to pass inputs for my triton model using tritionclient python package?

My triton model config.pbtxt file looks like below. How can I pass inputs and outputs using tritonclient and perform an infer request. name: “cifar10” platform: “tensorflow_savedmodel” max_batch_size: ...
Mahesh's user avatar
  • 51
2 votes
0 answers
389 views

Can I deploy kserve inference service using XGBoost model on kserve-tritonserver?

I want to deploy XGBoost model on kserve. I deployed it on default serving runtime. But I want to try it on kserve-tritonserver. I know kserve told me kserve-tritonserver supports Tensorflow, ONNX, ...
HoonCheol Shin's user avatar
1 vote
1 answer
2k views

how to host/invoke multiple models in nvidia triton server for inference?

based on documentation here, https://github.com/aws/amazon-sagemaker-examples/blob/main/inference/nlp/realtime/triton/multi-model/bert_trition-backend/bert_pytorch_trt_backend_MME.ipynb, I have set up ...
haju's user avatar
  • 271
2 votes
0 answers
772 views

Serve concurrent requests with NVIDIA Triton on a GPU

I currently have a triton server with a python backend that serves a model. The machine I am running the inference on is a g4dn.xlarge machine. The instance count provided for the GPU in the config....
Ajayv's user avatar
  • 406
0 votes
0 answers
668 views

AttributeError: 'NoneType' object has no attribute 'encode' and AttributeError: 'InferenceServerClient' object has no attribute '_stream'

I had two 2 docker container in the server. One is Triton Client Server whose GRPC port I set is 1747. Triton Client Server port had a TorchScript model running on it. The other container is where I ...
Văn Tuấn Nguyễn's user avatar
1 vote
1 answer
963 views

Starting triton inference server docker container on kube cluster

Description Trying to deploy the triton docker image as container on kubernetes cluster Triton Information What version of Triton are you using? -> 22.10 Are you using the Triton container or did ...
Transwert's user avatar
0 votes
0 answers
530 views

How to start triton server after building the tritonserver Image for custom windows server 2019?

Building the windows-based triton server image. Building the Dockerfile.win10.min for triton server version 22.11 was not working as base image required for building the server image was not available ...
Gp01's user avatar
  • 11
0 votes
1 answer
661 views

How to start triton server after building the Windows 10 "Min" Image?

I have followed the steps mentioned here. I am able to build the win10-py3-min image. After that I am trying to build the Triton Server as mentioned here Command: python build.py -v --no-container-...
Gp01's user avatar
  • 11
1 vote
1 answer
404 views

Running Triton Server Inference on AWS GPU Graviton instance

I am currently running a Triton server in production on AWS Cloud using a standard GPU enabled EC2 (very expensive). I have seen these new GPU enabled Graviton instances can be 40% cheaper to run. ...
jtm123's user avatar
  • 11
0 votes
1 answer
889 views

triton inference server: deploy model with input shape BxN config.pbtxt

I have installed triton inference server with docker, docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v /mnt/data/nabil/triton_server/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 ...
Zabir Al Nazi Nabil's user avatar
8 votes
3 answers
7k views

NVIDIA Triton vs TorchServe for SageMaker Inference

NVIDIA Triton vs TorchServe for SageMaker inference? When to recommend each? Both are modern, production grade inference servers. TorchServe is the DLC default inference server for PyTorch models. ...
juvchan's user avatar
  • 6,263
0 votes
1 answer
90 views

Cannot find the definition of a constant

I am trying to add a new accelerator to the Nvidia Triton inference server. One of the last thing I need to do it add a new constant like this one (kOpenVINOExecutionAccelerator) but for some reason I ...
Francois's user avatar
  • 954
6 votes
2 answers
5k views

Is there a way to get the config.pbtxt file from triton inferencing server

Recently, I have come across a solution of the triton serving config file disable flag "--strict-model-config=false" while running the inferencing server. This would enable to create its own ...
Rajesh Somasundaram's user avatar
0 votes
1 answer
4k views

Triton Inference Server - tritonserver: not found

I try to run NVIDIA’s Triton Inference Server. I pulled the pre-built container nvcr.io/nvidia/pytorch:22.06-py3 and then run it with the command run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -...
Antonina's user avatar
  • 604
3 votes
0 answers
225 views

Cog vs Triton Inference Server

I'm considering Cog and Triton Inference Server for inference in production. Does someone know what is the difference in capabilities as well as in run times between the two, especially on AWS?
Dolev Shapira's user avatar
4 votes
2 answers
8k views

Using String parameter for nvidia triton

I'm trying to deploy a simple model on the Triton Inference Server. It is loaded well but I'm having trouble formatting the input to do a proper inference request. My model has a config.pbtxt set up ...
Regalia's user avatar
  • 179
2 votes
0 answers
481 views

nvidia dali video decode from external_source buffer (instead of file)

This article explains how to do image decoding and preprocessing on server side with Dali while using triton-inference-server. I am trying to find something similar for doing video decoding from h.264 ...
dumbPy's user avatar
  • 1,598
1 vote
1 answer
2k views

Streaming responses from the Triton Inference Server with Python backend

I am using Triton Inference Server with Python backend, at the moment I send gRPC requests. Does anybody know how we can use the Python backend with streaming (e.g. model responses), because I didn't ...
Rizwan Ishaq's user avatar
0 votes
1 answer
437 views

pose estimation on Triton inference server

I am struggling with running pose models in NVIDIA Triton inference server. The model (open pose , alpha pose , HRNet ... etc ) load normally but the post processing is the problem
younes's user avatar
  • 1
1 vote
1 answer
654 views

faster_rcnn_r50 pretrained converted to ONNX hosted in Triton model server

I went through the mmdetection documentation to convert a pytorch model to onnx here link All installations are correct and i'm using onnxruntime==1.8.1, custom operators for ONNX Runtime ...
Nrepesh Joshi's user avatar
1 vote
0 answers
1k views

Triton inference server: Explicit model control

I need a little advice with deploying Triton inference server with explicit model control. From the looks of it, this mode gives the user the most control to which model goes live. But the problem I’m ...
Buddhi De Seram's user avatar
0 votes
0 answers
296 views

Cmake on centos/rhel system installs to .../lib64 while on ubuntu it installs to .../lib

I'm trying to compile triton inference server on centos/rhel instead of ubuntu. One problem I encounter is that I'll get the following error for some packages (e.g. protobuf, prometheus-cpp): Could ...
MaGi's user avatar
  • 171
1 vote
1 answer
2k views

Is it possible to use another model within Nvidia Triton Inference Server model repository with a custom Python model?

I want to use a model in my Triton Inference Server model repository in another custom Python model that I have in the same repository. Is it possible? If yes, how to do that? I guess it could be done ...
Kıvanç Yüksel's user avatar
0 votes
0 answers
344 views

Triton into Gitlab CI

Having problems with implementing triton service into gitlab CI. As I noticed in the triton github https://github.com/triton-inference-server/server, they don't have any exposed port by default in ...
Leemosh's user avatar
  • 905
2 votes
0 answers
1k views

Use real image data with perf_analyzer - Triton Inference Server

I'm currently trying use perf_analyzer of Nvidia Triton Inference Server with Deep Learning model which take as input a numpy array (which is an image).* I followed the steps to use real data from the ...
A.BURIE's user avatar
  • 31
8 votes
1 answer
2k views

How to use Triton server "ensemble model" with 1:N input/output to create patches from large image?

I am trying to feed a very large image into Triton server. I need to divide the input image into patches and feed the patches one by one into a tensorflow model. The image has a variable size, so the ...
Stiefel's user avatar
  • 2,803