LLMs: Overview & Recommended Reading List

Ben Fried recently joined the Rally Ventures team as a Venture Partner. In his role as venture partner, Ben is responsible for driving new investments, supporting existing portfolio companies and deepening our expertise in systems engineering and software development.

Ben spent the previous 14 years as the CIO of Google, where he oversaw the creation and deployment of technologies and systems that power Google’s enterprise. Prior to joining Google, Ben led development of mission scheduling software for NASA at a Bay Area startup and spent over a decade at Morgan Stanley.

Ben is a world-class technologist and executive, and we’re thrilled to have him on the Rally team. He recently put together a Large Language Model (or LLM for short; ChatGPT is probably the most well-known LLM) overview and recommended reading list for those of us interested in learning more about how it works, what it’s good at (and not good at) and where the technology is going.

We’ve never seen a technology move this fast, and we’re both excited about the opportunities it will bring and cautious about implementing LLMs responsibly. We look forward to sharing more about this topic with you, and hope you enjoy this initial dive into the world of LLMs!

An Introduction to LLMs by Ben Fried

LLMs are a type of machine learning model used for natural language processing (NLP) tasks. They use deep learning techniques to understand language and generate output (usually textual) in response.

ChatGPT and other LLMs are a type of neural network called a transformer. A neural network is composed of many “neurons”, each of which performs a simple calculation on its inputs and then passes that result on to the neurons with which it has outgoing connections. A lot of Machine Learning research is about how to shape and configure the connections between neurons; this is called model architecture.

The art is in which neurons are connected to which other neurons, what calculations go on at each neuron and how training is done to “teach” the network what the weights — some of the numbers used in each neuron’s calculations — should be.

At the end of the day, the combination of the architecture of a model and the data it was trained on are almost entirely what determine its capabilities. For GPT and other LLMs, the architecture might determine how much of the prompt and the previous answers the model remembers in each conversation, or how much it understands about a word.

ChatGPT and the class of neural networks like it work by completing the next word in a sequence of words. The initial sequence that the model is asked to complete is called the prompt. When the model is working, some randomness (called the temperature) is thrown in to select from the list of likely next words the model generates. This is why you get different results from the same prompt.

Once trained, LLMs can be used for a variety of NLP tasks, such as text classification, sentiment analysis, machine translation and question-answering. What’s even more impressive is that when the models are large enough, and trained with enough data, they demonstrate emergent behavior — skills and abilities not planned for by their designers.

Additional Recommended Reading:

What Will Transformers Transform by Rodney Brooks. This article is my favorite so far in making sense of what GPTs can do now, what they’ll soon be able to do and where they may be going. The links he provides in his article are all worth a scan.

This Week in AI Doublespeak by Gary Marcus. Gary is an AI expert and GPT skeptic. His view on the technology is somewhat pessimistic, but his is a highly informed opinion.

Does a GPT Future Need Software Engineers by Oxide. Rally portfolio company Oxide just recorded a podcast on how GPT will affect software engineering. Bryan Cantrill is an exceptionally skilled and creative technologist, and he remains fairly optimistic about the future of human software engineers.

Here are three articles, at three different levels of depth, on how ChatGPT works. They range from least mathy to most mathy, and shortest to longest: