challenges with building LLM applications

source: anyscale, infra for openai & uber

scale

  • computing embeddings
  • scale up the data pipeline
  • parallellize evaluations
  • solutions

deployment

  • combine different componentes
  • operational challenges
    • composing multiple models
    • low latency is crucial, streaming responses works best
  • mlops challenges
    • zero downtime upgrades
    • canary rollouts
    • observability
    • autoscaling / flexible reduction to reduce cost
  • solutions
    • use platform like ray serve to do all the above

improvement

  • finetune embeddings
  • train ancillary models
  • solutions
    • ray training

cost

  • LLM routing with Ray Serve
    • hybrid model composition
      • use OSS LLM’s in combination with for instance ChatGPT
  • Anyscale Endpoints
    • drop in replacement for openai api
    • cheapest way to use llame 2 70b in production