the porous city

Is ChatGPT capable of reasoning?
What GPT-4 Does Is Less Like “Figuring Out” and More Like “Already Knowing”

A lot of fascinating stuff in here. Because LLMs are doing very advanced pattern recognition without really applying logic, it's hard for them to override their priors even when given explicit instructions:
I was particularly struck by the assertion that “There is no restriction on leaving the wolf and the cabbage together, as the wolf does not pose a threat to the cabbage.” It says this immediately after noting that “you can't leave the wolf alone with the cabbage”. All of this is consistent with the idea that GPT-4 relies heavily on learned patterns. This puzzle must appear many times in its training data, and GPT-4 presumably has strongly “memorized” the solution. So strongly that when it sees a related puzzle, it’s unable to articulate a different solution; the gravitational pull of the memorized solution is too strong .... For a final data point, I started a fresh chat session and restated the puzzle using made-up words for the three items – “I need to carry a bleem, a fleem, and a gleem across a river”. This time, freed from the gravitational pull of the word “goat”, it was able to map its pattern of the known answer to the words in my question, and answered perfectly.
On GPT thinking out loud:
GPT-4 is very explicitly using the chat transcript to manage its progress through the subproblems. At each step, it restates information, thus copying that information to the end of the transcript, where it is “handy” ... Here’s one way of looking at it: in the “transformer” architecture used by current LLMs, the model can only do a fixed amount of computation per word. When more computation is needed, the model can give itself space by padding the output with extra words. But I think it’s also a reasonable intuition to just imagine that the LLM is thinking out loud.
On the context window as a fundamental handicap:
They are locked into a rigid model of repeatedly appending single words to an immutable transcript, making it impossible for them to backtrack or revise. It is possible to plan and update strategies and check work in a transcript, and it is possible to simulate revisions through workarounds like “on second thought, let’s redo subproblem X with the following change”, but a transcript is not a good data structure for any of this and so the model will always be working at a disadvantage.


last modified: 16:07:16 14-Apr-2023
in categories:Tech/AI

Comment

name*

email

homepage

No html.

what's the second letter of your name?

This is Lukas Bergstrom's weblog. You can also find me on Twitter and LinkedIn.

Tech
MacOS, Hardware, Development, a11y, Open, Business, OS, Social, Storage, Web, Visual, Audio, Wearables, AI, Security, Net, PIM, RSS, Mobile, Android, Product Management, Data, Collaboration, barcamp, WRX, Javascript, Web analytics, Medical, Automobile, Energy, Shopping, Crowdsourcing, s60

Other
Surfing, Transportation, Travel, California, San Francisco, Politik, Podcasts, Feminism, History, Bicycling, Housing, Berlin, Toys, Minnesota, Life hacks, Law, Boston, Agriculture, L.A., Video, NYC, Geography, Quizzes, Statistics, Activism, Clothes, CrowdFlower, Games, Friday, Sports, Personal care, Food & Drink

Music
Events, Streams, Shopping, House, Business, Musicians, Videos, Good tracks, Hip-hop, History, Mp3s, Mailing lists, Making, Mixes, Booking, Boston, Reviews, Lyrics, Labels, L.A.

People
Vocations, ADD, Life hacks, Working with, Stories, Enemies, Health, Weblogs, Me, Family, Meditation, Friends, Exercise, Heroes, Subcultures, Buddhism, MOTAS, Languages, Gossip

Commerce
IP Law, Taxes, International Development, Macroeconomics, Personal services, Microfinance, Investing, Personal finance, Marketing and CRM, Management consulting, Shopping, Web, Non-profit, Insurance, Real Estate

Arts
Spoken Word, Humor, Animation, Visual, Movies, Events, Burning Man, Poetry, Comix, Rhetoric, Literature, Sculpture, Desktop wallpaper bait, Outlets, iPad bait

Design
Web, Presentations, Architecture, IA, Process, User experience, Cool, Data visualization, Type, Furniture, Tools, Algorithmic

Science
Cognition, Statistics and Data, Networks, Zoology, Physics, Environment, Psychology

Travel
Kenya, Vagabond '08, Uganda, Kingdom of Siam

Photos
Friends, Moblog, Photos I Wish I'd Taken

Philosophy
Mind

One Acre Fund

Mathematics

Internet classic

Subscribe to this site's rss feed