There’s a lot of debate on where ChatGPT stands with regard to being able to reason. Papers such as Sparks of AGI make the case for version 4.0 being a sort of proto-AGI. While it’s interesting to think about, I’m more interested in what I can get it to do. This is the first in what is likely to be a series of posts where I poke and prod and see what I can get out of it. These are NOT scientifically rigorous, nor expected to produce anything meaningful aside from providing a record of my shower thoughts and maybe sparking some ideas in the reader.
In this post I want to look at how well it reasons when it comes to ordering and stacking items. That is, given directions which require having some conception of order, and seeing how well it performs.
You can find the conversations I had below, or you view the original here.
A relatively easy task and no issues.
In this task I create less meaningful names for the objects and give more complicated instructions.
To determine the current order of the papers, let’s go through the actions step by step:
You initially place the papers in this order (from bottom to top):
You remove paper 432 and place it between 16 and 19. The new order is:
You take away aaa and 34, leaving this order:
Then you put 34 back on top of the stack, resulting in this final order:
So, the current order from bottom to top is: tw, 3s, lo, 16, 432, 19, 34.
I was actually somewhat surprised it was able to do this and that it did so by providing such a nice representation.
Let’s use the same information, but give it some more complex, and somewhat ambiguous instructions.
To solve this, let’s go through the steps one by one.
Initial stack order (from bottom to top):
Removing papers as per your instructions:
The stack is now (from bottom to top):
Flipping the stack, the new order (from bottom to top) is:
Finally, swapping the top and bottom papers results in:
So, the final order from bottom to top is: aaa, 34, tw.
This seemed to sufficiently confuse it. It didn’t seem to be able to connect the instructions from removing the top item, skip… with the following sentence, and instead reset, going back to the start of the list.
I followed this up and asked it if it was sure of it’s answer and it was then able to product the correct sequence.
To see if has difficulty connecting instructions from one section to another, let’s create a more simple problem.
To address your instructions, let’s break down the steps:
Count Backwards and Skip: We start with the list of numbers from 1 to 10 and count backwards from 10, skipping every second number. This means we will skip 9, 7, 5, 3, and 1.
The skipped numbers in this first phase are: 9, 7, 5, 3, 1.
Reverse and Repeat: We then reverse this list of skipped numbers, resulting in 1, 3, 5, 7, 9. We repeat the process of counting backwards and skipping every second number.
So, we skip 3, 7.
The final list of numbers, which are the ones skipped in the second phase, is: 3, 7.
No issues, so let’s move on.
We’ll go back to our previous problem and borrow a lot of the same language, but change up the context.
To solve this, let’s follow your instructions step by step:
Initial Order: The cards are placed in increasing order, Ace (1) to 10.
Remove and Skip:
Reverse the Order:
Replace Leftmost with Rightmost and Discard Leftmost:
So, the final order of the cards is: 2, 9, 7, 6, 5, 3.
I intentionally added the word stack to see if that might throw it off, given that our context is about stacking while this is not. I can’t be certain if it did, but it seems take the rule on skipping the third item and mix it with something else (waves hands), producing this mish-mash of I’m not sure what. Regardless, it’s way off the mark here.
I also tossed in an ambiguous instruction to not remove the last thing, which it didn’t seem to explicitly note, while it does so in the next task.
In this final one I removed the word stack and reran the previous task.
Let’s go through the steps with the playing cards:
Initial Order: The cards are in increasing order: Ace (1), 2, 3, 4, 5, 6, 7, 8, 9, 10.
Remove and Skip:
Reverse the Order:
Replace Leftmost with Rightmost and Discard Leftmost:
The final order of the cards is, from left to right: 2, 8, 6, 4.
This is seems to do well, and as mentioned at the end of task 5, it took last thing to mean the 10 as opposed to the last item to be removed.
It means absolutely nothing, though it would be interesting to conduct similar tests to see how the context of the conversation can impact instructions. Personally I’m impressed with its ability to perform some of these tasks, which require some understanding of space and order.