The config seems to point to type "nim" (Nvidia's microservices). I'm guessing that you'd recommend an openai compatible inference server such as vllm for local hosting on the DGX Spark? It'd be nice if you appended instructions for running the same models with Triton Server via TensorRT-LLM backend.
Linh Tran
ProfLinh
AI & ML interests
None yet
Recent Activity
commented on
an
article
7 days ago
NVIDIA brings agents to life with DGX Spark and Reachy Mini
upvoted
an
article
7 days ago
NVIDIA brings agents to life with DGX Spark and Reachy Mini
liked
a model
6 months ago
cross-encoder/ms-marco-MiniLM-L6-v2
Organizations
None yet