Peter's Mind Vault

❯

kserve

May 06, 20241 min read

Highly scalable and standards based. Model Inference Platform on Kubernetes for Trusted AI

KServe provides a Kubernetes Custom Resource Definition for serving predictive and generative machine learning (ML) models. It aims to solve production model serving use cases by providing high abstraction interfaces for Tensorflow, XGBoost, ScikitLearn, PyTorch, Huggingface Transformer/LLM models using standardized data plane protocols.

It encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and Canary Rollouts to your ML deployments. It enables a simple, pluggable, and complete story for Production ML Serving including prediction, pre-processing, post-processing and explainability.

usage

store model on Google cloud storage gs:// url
InferenceService
create Ingress

Graph View

Backlinks

Kubeflow
Kubernetes

Recent notes

Agentic Engineering
Jul 09, 2026
When You Gaze Into the AI Slop the AI Slop Also Gazes Into You
Jul 09, 2026
- articles
Changelog
Jul 09, 2026
Projects
Jul 09, 2026
Zettelkasten
Jul 09, 2026

See 649 more →

Peter's Mind Vault by Peter Peerdeman is licensed under CC BY-NC-SA 4.0

Created with Quartz v4.5.1 © 2026