challenges with building LLM applications

source: anyscale, infra for openai & uber

combine different componentes
operational challenges
- composing multiple models
- low latency is crucial, streaming responses works best
mlops challenges
- zero downtime upgrades
- canary rollouts
- observability
- autoscaling / flexible reduction to reduce cost
solutions
- use platform like ray serve to do all the above

LLM routing with Ray Serve
- hybrid model composition
  - use OSS LLM’s in combination with for instance ChatGPT
Anyscale Endpoints
- drop in replacement for openai api
- cheapest way to use llame 2 70b in production

Peter's Mind Vault