challenges with building LLM applications
source: anyscale, infra for openai & uber
- https://www.linkedin.com/video/live/urn:li:ugcPost:7114994414636126212/
- https://www.anyscale.com/blog/tackling-the-cost-and-complexity-of-serving-ai-in-production-ray-serve
scale
- computing embeddings
- scale up the data pipeline
- parallellize evaluations
- solutions
- use platform like Ray Data
deployment
- combine different componentes
- operational challenges
- composing multiple models
- low latency is crucial, streaming responses works best
- mlops challenges
- zero downtime upgrades
- canary rollouts
- observability
- autoscaling / flexible reduction to reduce cost
- solutions
- use platform like ray serve to do all the above
improvement
- finetune embeddings
- train ancillary models
- solutions
- ray training
cost
- LLM routing with Ray Serve
- hybrid model composition
- use OSS LLM’s in combination with for instance ChatGPT
- hybrid model composition
- Anyscale Endpoints
- drop in replacement for openai api
- cheapest way to use llame 2 70b in production