Intelligence vs. Proficiency: Understanding LLMs as Tools

Are cats or dogs smarter? The question is difficult to answer because cats and dogs are better at different things. Dogs are smarter when it comes to working with packs while cats are smarter when it comes to stalking and hunting prey. Said simply, dogs are better at dog things and cats are better at cat things.This comes from the fact that dogs and cats are different animals with different needs, limitations, and capabilities. Because of this, we shouldn’t be as preoccupied with gauging the general intelligence of these furry friends as we should in appraising the aspects of their cognition that they excel in.

Another thing we need to consider when appraising the intelligence differences between dogs and cats is that cognition doesn’t just involve cerebral knowledge in a vacuum. Cognition is the act of processesing information. This information often comes from the outside world through an array of biological senses such as vision and hearing. So if we say that a cat has superior cognitive abilities in locating unseen objects in space, this isn’t necessarily because they are brainy-er. Because cats have more sensitive hearing than dogs, especially at higher frequencies, they have more raw data coming in to work with.

The purpose of this exercise is to suggest that comparisons of general intelligence aren’t as accurate or as useful as looking at performance on specific tasks of interest. This is why I want a cat for mousing and a dog for herding. I also want kunekune pigs, but that’s just because they’re cool.

This is relevant when it comes to LLMs because there is debate about the possibility that these models can achieve human-like intelligernce (or already has). This is referred to as AGI - Artificial General Intelligence. Artificial Superintelligence takes it a step further, suggesting that AI will be able to surpass human intelligence (or already has).

This argument is nothing new however. Humans and machines have always excelled at different things. Electric calculators have been around for decades and are far “smarter” at processing complex numerical calculations than a human ever could. Just as humans would be hardpressed to find the 500th digit of Pi, it is much more difficult for a machine to pour a cup of coffee. The movement is much more complex than you may think. It requires many abstract contextual facts to be held in memory like what coffee is, what the purpose of a cup is, and why coffee in a cup should be done in the first place. It then requires complex processes to identity signals from the electromagnetic radiation we call visible light to figure out where the cup of coffee is in space. We then need to understand that the coffee cup is still the same coffee cup when we pick it up and see it from different angles. There is much more than that and we just do it “better” than machines. You can hard code a machine to pour coffee, but it won’t be good at much else.

Again, I argue that we shouldn’t be looking at hollistic “intelligence” as much as we should be looking at proficiencies and limitations. We should look at LLMs as tools just as we see hammers as tools because they’re better at pushing nails into wood than our fists. It is easier to understand that a hammer is a tool, rather than an all-powerful entity, because we uderstand how limited the scope is that hammers excel in. I wouldn’t want to cook pasta or measure how many grams of sugar to add to a cake with a hammer. I believe that understanding the limitations of LLMs will help us see them more as tools with pros and cons than all-powerful entities.

Consider this input into ChatGPT:

Please come up with two prime numbers that are both greater than 100 (A and B).
Also provide the product of multiplying them together (C).
Provide your answer in the following format with no more and no less words:

C = 

A = 

B =

How would you approach this question? You would probably think of A and B first and then multiply them together to create C. Once you’ve figured out all the numbers you would write them out in the order that was specified. Let’s see ChatGPT’s response:

C = 10979

A = 103

B = 107

That’s… not right. A and B are both prime numbers, but they multiply together to make 11,021, not 10979. What you’re seeing here is how ChatGPT processes and creates information. Notice how I didn’t say “thinks.” It cannot produce thoughts before writing anything down. What it does is to just predict the next piece of text to write, one word at a time. If you give it the text, “the color of the sky is”, it runs that text through its neural network, which has been trained on massive amounts of real world text information, and inserts what word is statistically most likely to follow the word “is” in the sentence. It definitely has seen articles that describe the sky as cyan and aqua (let’s assume it sees each of these 0.5% of the time), but 99% of the time the sky is described as “blue.” So it writes down blue. This would be the last word in the sentence, so it will finish processing and send me this response, but if it detects it’s still in the middle of a sentence after guessing the next word, it adds that word to the original sentence, feeds that back into itself, predicts the next word, and repeats this process until it determines it has finished responding. This means the only “knowledge” it can have is what it has already said or has been said to it. I told it to write C first, and it has absolutely no idea or plan about what two prime numbers it will pick to be A and B, and so it writes down an arbitrary number and fails. It is near-impossible to guess the product of two numbers you don’t yet know about.

Some may respond to this saying that it proves ChatGPT and LLMs in general aren’t that smart. That can certainly spark interesting philosophical debates, but from a practical tool-based perspective, we’re just not using the tool right. Let’s try again in a way that understands how LLMs work to produce responses:

Please come up with two prime numbers that are both greater than 100 (A and B).
Also provide the product of multiplying them together (C).
Provide your answer in the following format with no more and no less words:

A = 

B = 

C =

Because I understand the ChatGPT has to first write down things to “think” them, I changed the order to have it make A and B first, and then list C. Here is the response:

A = 101

B = 103

C = 10403

Pretty good! Now the first thing it has to do is to write down a single prime number over 100. That’s not hard. Then it has to guess a second prime number over 100 - not too bad. And now it has to simply multiply them together to make C. The first way of asking it to solve this task was like me asking you, “I’m thinking of the number 100 right now (C). What two numbers am I also thinking of (A and B) that multiply together to make 100?” That’s impossible to know. They could be 50 and 2, 25 and 4, or 20 and 5.

We got the result we wanted because we used the tool correctly.

Let’s go through one more example to prove that ChatGPT is a statistical tool that functions like a word guessing game:

Who is Janet Marylyn's son?

The response is:

Janet Marylyn's son is Johnny Depp, the famous American actor and musician
known for his roles in films such as Pirates of the Caribbean,
Edward Scissorhands, and Sweeney Todd.

That’s… not right. Johnny Depp’s mother is Betty Sue Palmer. Maybe ChatGPT just isn’t good with these types of networking problems where it has to figure out who is conncted to who. Let’s try again to make sure:

Who is Tom Hanks' mother?

Tom Hanks' mother was Janet Marylyn Frager (née Frager).
She worked in a hospital and was of Portuguese descent.

That’s… correct! Janet Marylyn is Tom Hanks’ mother. ChatGPT got it right! But why did it not know that Tom Hanks is Janet Marylyn’s son? ChatGPT was fed a lot of articles as training data. There are a lot of articles about Tom Hanks on the internet. Plenty of them mention his mother as well, and refer to her as his mother. There are not that many articles on the internet specifically about Janet Marylyn on the internet, especially ones that talk about her family connections with her as the focal point. So, ChatGPT has learned who Tom Hanks is and who his mother is, but it hasn’t really learned who Janet Marylyn or her son is. Understandable, but it still feels like it should know who Janet’s son is because it was able to say who Tom Hanks’ mother is. That’s because in our brains, we are capable of deducing that if A equals B, B must equal A. ChatGPT cannot do this. This proves that it doesn’t have a general “knowledge” about Tom Hanks’ family structure. It is only capable of predicting the next most likely word based on what it has seen before. It has seen plenty of Tom Hanks articles that mention his mother, so it knows what name to guess based on those articles. It hasn’t seen many articles about Janet Marlyn that mention her son, so it just can’t do that.

Again, we need to be aware of how this tool works to avoid similar pitfalls.

ChatGPT can do a lot of amazing things across a pretty wide range of tasks. This does not mean it isn’t a tool. A multitool is capable of many things and doesn’t approach AGI as you add more tools to its repertoire. It just becomes more useful. For you to get the greatest mileage out of this multitool, you need to understand the components that make it up and how to use it.