A large language model (LLM) is a computational model notable for its ability to achieve general-purpose language generation and other natural language processing tasks such as classification.

Even though I am a forward thinking optimistic technologist, I have been blindsided by the rapid advancement of large language model capabilities. Natural Language Processing techniques have been around for a long time, but now that we have much bigger computing power available it has been very remarkable what is possible when we compact the wide range of textual knowledge that we have been accumulating in the age of desktop computing.

An interesting development in this space is that this type of manipulation of language works so well, engineers are now trying to leverage this language based reasoning in other fields as well. By transforming for instance an image into words we can then ask the model a question about the image, even though the language model just has a textual representation of the image.

This is where we find out that large language models work so well because language is limited: there are only a finite number of words we can use, and thus can have the model predict what the most sensible next word is. According to Meta’s AI lead Yann Lecun, transforming data into language models will lead to a plateau in capabilities. If we find other ways of representing real world concepts into vector models, such as JEPA (Joint Embedding Predictive Architecture) we might reach a real next phase of generative capabilities.

resources

learning

topics

project ideas