Once again, it was very enjoyable to connect with the vibrant Brazilian VC community (founders and VC managers) during Brazil at Silicon Valley (April 7-9), held this time at the Google Cloud venue.
One of the most thought-provoking speeches was delivered by Thomas Siebel, founder of the CRM company Siebel Systems (acquired by Oracle in 2005 for $5.85B), and founder and CEO of the enterprise AI software platform C3.ai (NYSE AI, $2.8B market cap).
Needless to say, everything at the event revolved around GenAI.
At one point, Siebel asked the audience if we knew how ChatGPT works. Because most people don’t, he had bought thousands of copies of Stephen Wolfram’s “What is ChatGPT Doing … and Why Does It Work?” to hand out. He also encouraged everyone to buy the book (76 pages only), offering to buy it back from those who didn’t like it.
That ChatGPT, and other LLMs, can automatically generate a human-like text is remarkable.
“What ChatGPT is always fundamentally trying to do is to produce a reasonable continuation of whatever text it has got so far, where by ‘reasonable’ we mean what one might expect someone to write after seeing what people have written on billions of web pages, etc.”
“Any model you use has some particular underlying structure – then a certain set of knobs you can turn (i.e. parameters that you can set) to fit your data. And in the case of ChatGPT (3), lots of such knobs are used – actually, 175 billion of them”.
ChatGPT is based on the concept of neural nets. “… what makes neural nets so useful is that not only can they in principle do all sorts of tasks, but they can be incrementally trained from examples to do those tasks. When we make a neural net to distinguish cats from dogs, we don’t effectively have to write a program that (say) explicitly finds whiskers; instead, we just show lots of examples of what’s a cat and what’s a dog, and then have the network machine learn from these how to distinguish them”.
“… so what about the actual learning process in a neural net? In the end it’s all about determining what weights will best capture the training examples that have been given”.
“Ultimately a neutral net is a connected collection of neurons – usually arranged in layers. Each neuron is effectively set up to evaluate a simple numerical function. And to use the network, we simply feed numbers in at the top, then have neurons on each layer ‘evaluate their functions’ and feed the results forward through the network – eventually producing the final result at the bottom”.
“… there’ve been many advances in the art of training neural nets. … But mostly things have been discovered by trial and error, adding ideas and tricks that have progressively built a significant lore about how to work with neural nets”.
And the text goes through explaining many engineering embellishments like attractors, loss function, embedding, transformers, attention, positional marking, and connection dropping. For instance, “but instead of just defining a fixed region in the sequence over which can be connections, transformers instead introduce the notion of attention – and the idea of paying attention more to some parts of the sequence than others”.
“How were all those 175 billion weights in ChatGPT’s neural net determined? Basically, they are the result of very large-scale training, based on a huge corpus of text written by humans. … Even given all that training data, it’s certainly not obvious that a neural net would be able to successfully produce human-like text… But the big surprise – and discovery – of ChatGPT is that it’s possible at all. And that – in effect – a neural net with just 175 billion weights can make a reasonable model of text humans write”.
“Human language – and the process of thinking involved in generating it – have always seemed to represent a kind of pinnacle of complexity. And indeed it’s seemed somewhat remarkable that human brains – with their network of ‘mere’ 100 billion or so neurons (and maybe 100 trillion connections) could be responsible for it. Perhaps, one might have imagined, there’s something more to brains than their network of neurons – like some layer of undiscovered physics. But now with ChatGPT we’ve got an important new piece of information: we know that a pure, artificial neural network with about as many connections as brains have neurons is capable of doing a surprisingly good job of generating human language”.
Conclusion
Thanks for the suggestion, Mr. Siebel, I will keep my copy of the book.
Funnily, my main takeaway was not tackled by the book: it became clear to me that copyright is a nonissue on GenAI.
There is a common misconception about the nature of LLMs. People often think these models just regurgitate large amounts of data they have been trained on. But the truth is that LLMs don’t actually hold the data they are trained on. Instead, they learn from it and predict potential outcomes, much like how humans do. After all, humans learn from other people’s works, and then synthesize that knowledge and experience on their own creative processes.