Cover|200

Episode metadata

Show notes > We are excited to be joined by J.D. Zamfirescu-Pereira, a Ph.D. student at UC Berkeley. He focuses on the intersection of human-computer interaction (HCI) and artificial intelligence (AI). He joins us to share his work in his paper, Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts.  The discussion also explores lessons learned and achievements related to BotDesigner, a tool for creating chat bots.

Episode AI notes

  1. Beware of skewed test sets in machine learning, as having a high accuracy metric does not necessarily mean that the model is performing well. It is important to have a balanced test set with equal representation across different classes.
  2. Evaluation of response effectiveness is often biased towards longer, more complete answers, but in rapid-fire voice interactions, shorter and more frequent checking is preferred. Models that prioritize longer, more complete answers may not work well in different contexts. OpenAI may not keep older models around indefinitely, posing challenges if relied upon functionality disappears.
  3. The future of bot development lies in more people engaging in programming for their personal use. Previous attempts at developing such tools have often been fragile, with the capability of achieving the desired behavior no longer existing. There is concern about the future of bot development, but an expectation of more individuals engaging in programming for personal use.
  4. LLM’s dramatically reduce the amount of work needed to get a computer to perform desired tasks. They provide an outlet for non-programmers to achieve their computing goals and can interactively guide users in better prompting and understanding coding. Exploring the potential of LLM’s to fundamentally restructure the way people engage in computing is an interesting question.

Snips

[21:18] Beware of skewed test sets in machine learning

🎧 Play snip - 1min️ (20:41 - 21:31)

✨ Key takeaways

  1. Having a high accuracy metric does not necessarily mean that the model is performing well.
  2. It is important to have a balanced test set with equal representation across different classes.
  3. A model that always predicts the majority class can still achieve high accuracy but provides no meaningful insights.

📚 Transcript

Click to expand
Speaker 1

So like one super common failure mode that happens very often with people who are learning that this sort of machine learning world is that they'll have some like accuracy metric defined For themselves, right, they'll have a test set, they'll be like how many of the items in the test set are correctly classified, but they'll have a test set where 90% of the items are all In one bucket, right, they're all classified to one thing. And that's not ideal because then you sort of build your model and your model just shunts everything says everything is in that bucket. And suddenly it's got like 90% accuracy, even though the model is not really doing anything, right, like it's you've got two classifications, right, maybe A or B, and your test set 90% Of everything is in class A. And so the model that just says A all the time is going to be 90% accurate. And you'll be like, Oh, 90% accuracy. That's great. Like I'm close, right? But actually you've got nothing,

[40:08] OpenAI’s model update schedule and its implications

🎧 Play snip - 24sec️ (39:46 - 40:11)

✨ Key takeaways

  1. OpenAI has not committed to keeping older models around indefinitely.
  2. OpenAI releases new models every three months and keeps the old ones for three months.
  3. If functionality you rely on disappears within a six-month increment, you may be out of luck.
  4. It is unclear what actions to take if a capability no longer exists.

📚 Transcript

Click to expand
Speaker 1

OpenAI has sort of, has not really committed, I think, to keeping older models around indefinitely, right? Like, as you're going through this process every three months, they're going to release a new model and then they'll keep the old one around for three months. And so you've got this like six-month increment. But if a bunch of functionality that you're relying on disappears, sort of in that six-month increment, you're kind of, you know, you're a little SOL, right? Like, you're a little bit out of luck in the sense that, you know, what are you going to do if that capability just doesn't exist anymore,

[41:24] The Rise of Personal Bot Development: Programming for Everyone

🎧 Play snip - 1min️ (40:38 - 41:17)

✨ Key takeaways

  1. More people will start using computing and engaging in bot development programming.
  2. Senior programmers often create little tools for themselves to make their lives easier.
  3. Using computing and programming tools can make work and daily life more efficient.

📚 Transcript

Click to expand
Speaker 1

One of the things that we're going to see is more people sort of using computing and I'm going to go out on a limb here and call bot development programming, even if it's like just the sort Of prompted style of programming. But I think we're going to see more people engaging in that kind of programming for their own personal use. You know, one of the things that you often see from like senior programmers is they'll just like make little tools for themselves. Much in the same way that you imagine like a woodworker just like builds themselves jigs, those sort of little helpful things that in their day to day that just kind of like makes their Lives a little easier, makes it a little better, sort of helps them with their work, whatever it is.

[43:23] Excitement for non-programmers to have an outlet for computing

🎧 Play snip - 1min️ (42:30 - 43:00)

✨ Key takeaways

  1. Excitement for non-programmers to have an outlet for computing
  2. Interactive way for the bot to help users with coding
  3. Ability for the bot to generate code and explain it to the user

📚 Transcript

Click to expand
Speaker 1

I'm excited for people who aren't programmers who you know haven't spent years sort of dedicated to figuring out how to write code and like integrate libraries and all this stuff. I'm excited for all of those folks to have this like output to have this outlet to get computing to do what it is they want. And the interactive way it can do it where the bot can even explain to the user how to better prompt itself especially when it comes to coding. Exactly. Or you know sometimes code is the right thing and it will just write some code for you and explain it to you if that's what's needed.