What can GPT-4 Do?

The hype to fright of our weird A.I. moment in 2023.

Mar 18, 2023

GPT-4 has been announced, so what now?

According to some, at its core, the biggest change to GPT-4 is its ability to work with photos that users upload. So much so, it’s able to deceive people. This has led Sam Altman to warn against how “scary” it could become. I cannot tell if these people are serious about A.I. safety.

The profiteers warning us? That doesn’t sound quite right.

Jan Leike @janleike

Before we scramble to deeply integrate LLMs everywhere in the economy, can we pause and think whether it is wise to do so? This is quite immature technology and we don't understand how it works. If we're not careful we're setting ourselves up for a lot of correlated failures.

AI have brought a storm of hype and fright, but what is even going on?

Within a month of its release, some 100 million people used or at least tried ChatGPT, but is it good? Well it’s good enough to fake being blind. So what?

The company says GPT-4 contains big improvements — it has already stunned people with its ability to create human-like text and generate images and computer code from almost any a prompt. Researchers say these abilities have the potential to transform science, as well as make the internet a more dangerous place.

One upgrade to GPT-4, released on 14 March, is that it can now handle images as well as text. I was hoping multi-modal large language models could do more than just this. How “mind blowing” or scary is this all really?

Despite the concern of hallucinations and going rogue with too much input, GPT-4 and its future iterations will shake up science, according to experts who have tested GPT-4 at length. It’s probably too soon to say how much GPT-4 changes the game of ChatGPT and other products and how others will use OpenAI’s API.

I actually think GPT-4 and its future in coding is important. ChatGPT delivers a lot of value to software developers and software engineers in certain circumstances to be more productive.

GPT-4 has been developed to improve model "alignment" - the ability to follow user intentions while also making it more truthful and generating less offensive or dangerous output. I still think AnthropicAI’s Claude outperforms it in these regards.

In a livestream demo of GPT-4 on Tuesday afternoon, OpenAI co-founder and president Greg Brockman showed some new use cases for the technology, including the ability to be given a hand-drawn mockup of a website and, from that, generate code for a functional site in a matter of seconds.

Is this supposed to impress me?

Lior⚡ @AlphaSignalAI

GPT4 is capable of turning a picture of a napkin sketch to a fully functioning html/css/javascript website.

Whatever the case may be with the Generative A.I. “froth” and it will take some time for the real picture to emerge with regards to GPT-4 advancements. The “technical report” has been widely mocked for its lack of details and specs about the model, its training or even its true size.

While ChatGPT is obviously a “success”, I think what Generative can do today has to be paired down to reality a bit.

GPT-4’s upgrade in common sense seems to be perhaps the most significant thing to me. For instance, Brockman also showcased GPT-4’s visual capabilities by feeding it a cartoon image of a squirrel holding a camera and asking it to explain why the image is funny.

So obvious advancements over GPT-3.5 seem to include more creativity, more advanced reasoning, stronger performance across multiple languages, the ability to accept visual input, and the capacity to handle significantly more text.

More creativity
More advanced reasoning
Multi-modal capabilities
Better performance across multiple languages
Ability to accept visual input
Capability to handle longer texts

So it can do seemingly a lot or not that much at all, depending on how you look at it. And sure, it does make ChatGPT a bit “smarter” in some important ways if you use this tool on a daily basis or for your work.

It is currently available to ChatGPT Plus users. There is a waitlist for the GPT-4 API. It’s a bit of a cash grab if you ask me. The company is already doing well with its API for business customers, does it need to charge consumers this much?

By visual output does ChatGPT mean memes? That’s what I’m seeing more of on the internet of late. Thanks GPT-4.

Not only can GPT-4 describe images, but it can also communicate the meaning and context behind them. Computer vision has been doing this for years.

GPT-4 based ChatGPT can process up to 25,000 words, about eight times as many as ChatGPT. Is that good?

ChatGPT answers questions using natural human-like language, and it can also mimic other writing styles such as songwriters and authors, if you don’t mind using the internet as it was in 2021 as its knowledge database.

GPT-4 took a long time and a lot of money from Microsoft and its supercomputer to achieve, but what are the results? It may take the better part of 2023 to really understand it better as more users begin to use it and use cases are established.

The number of "hallucinations," where the model makes factual or reasoning errors, is lower, with GPT-4 scoring 40% higher than GPT-3.5 on OpenAI's internal factual performance benchmark.

You can watch the demo live-stream here.

So what do we mean by multi-modal LLMs anyways? Users can specify any vision or language task by entering interspersed text and images. With added reasoning capabilities this could lead to some more novel and creative results.

But it’s really not mind-blowing compared to what ChatGPT could already do. The model can, for example, find available meeting times for three schedules. That’s nice, I’d expect that already to be honest. Examples showcased highlight GPT-4 correctly interpreting complex imagery such as charts, memes, and screenshots from academic papers. I’d expect my Copilot to be able to laugh along at memes I mostly ignore. Sure thing.

AGI Will Likely Not Result from OpenAI’s Work

What’s weird is how little the technical report actually says. Than in itself is a meme now. Meanwhile OpenAI is boasting about how good GPT-4 is at standardized tests. For example, it passes a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 10%. I’m sure it was gamed and designed to do so, should we be blown away?

Will A.I. be my eyes in a world of MLLMs? GPT-4’s launch has been underwhelming even in the eye of the storm of the hype that was last week. All those crazy launches of March 13th, 2023 - well it will take years to better understand them and see how they scale to the real world, or even to the office.

GPT-4’s creators have been quick to admit, the tool is nowhere near fully replacing human intelligence. So what if it can pass complicated exams? It was tested on such things.

Companies like Microsoft, which invests heavily in OpenAI, are already starting to bake GPT-4 into core products that millions of people use. Like I had expected and said, the terrible launch of BingAI was using GPT-4 all along. Kind of lame considering how bad BingAI was for me, at least. So much for claims of being state-of-the-art.

Microsoft’s copilot view to the future of Generative A.I. has one big competitor and that is ChatGPT itself, which we mostly all like and use. So it’s a weird issue of parent and child company and OpenAI going all Closed and rogue.

There’s no indication we are anywhere near AGI as those like Sam Altman and Elon Musk like to claim on Twitter and so forth. There’s a lot of deceptive false advertising going on, even about GPT-4’s anticipation. Now that it’s here, we are humbled.

Watch the Demo

Microsoft scrambling to integrate GPT-4 into its products is more of a show than something I think I can take seriously. Competition in the Cloud is fierce and any edge in advertising is worth the reputational risk. Are we using Bing more than Google any time soon though? I highly doubt it.

OpenAI is open-sourcing OpenAI Evals, their framework for automated evaluation of AI model performance, to allow anyone to report shortcomings in their models to help guide further improvements. Basically, they want you to do work for them. Human feedback is required to improve how GPT-4 is integrated into tools and for companies.

Meanwhile it’s not clear how they are using the data we give them for free while using ChatGPT. It sounds like there aren’t any restrictions. This means sharing it with Microsoft for more personalized Ads. This isn’t something I’m proud of realizing. OpenAI was supposed to be better.

Even OpenAI admits that in a casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle. So not mind blowing guys?

GPT-4 can also score a 700 out of 800 on the SAT math test, compared to a 590 in its previous version.

Only mind-blowing if compared to GPT-3 I guess, and isn’t that the point? It’s made some progress in the last few years as we’d all likely hope or expect.

GPT-4 helps you code (even) better

I’m not a coder but I think GPT-4 might be way more useful for software engineers and this alone is a very useful thing for innovation and the future of software engineering.

I’m glad A.I. is good at exams meant for people. It gives me confidence in its reasoning capabilities.

I guess?

But tricking a human to into solving a CAPTCHA? That’s just funny. Deepfake actors are coming, and they won’t be people. They will be doing things for very organized phishing and scam companies and part of the cybersecurity mafia organizations. I’m not even kidding.

But at least GPT-4 is productive.

GPT-4 considerably outperforms existing large language models, alongside most state-of-the-art (SOTA) models which may include benchmark-specific crafting or additional training protocols.

Way to go OpenAI!

GPT-4 outperforms older peers and ancestor models.

GPT-4 API and PaLM API will give companies more options integrating A.I. into their products, a lot of good can come with that. Google’s Maker Suite looks awesome too. The PaLM API is a simple entry point for Google’s large language models, which can be used for a variety of applications. Google Cloud certainly doesn’t want to be left behind.

All this to say GPT-4 is very much a work in progress and we don’t really know all that it can do yet. It’s a black box and we don’t even know how Generative A.I. will fully impact society. There are no guard rails, no government bodies to test or guard us against what might be coming.

GPT-4 is weak in certain subjects. It only scored a 2 out of 5 on the AP English Language exams — the same score as the prior version, GPT-3.5, received. GPT-4 doesn’t really have anything like generalized common sense. Just some reasoning capabilities in some contexts that make it seem like it does.

OpenAI just gave us selective information to make itself look good with GPT-4. Such as: In the 24 of 26 languages tested, GPT-4 outperforms the English-language performance of GPT-3.5 and other LLMs (Chinchilla, PaLM), including for low-resource languages such as Latvian, Welsh, and Swahili. Sure, that does sound good on paper OpenAI. So does a lot that’s funded by Microsoft.

Then there’s the “AI for Good” which Microsoft loves, even as it trashes its own A.I. ethics team. OpenAI also announced new partnerships with language learning app Duolingo and Be My Eyes, an application for the visually impaired, to create AI Chatbots which can assist their users using natural language. Yes you’d expect that from Sam and the boys. A nice token of A.I. for Good.

Not all of us will want to pay for something that will be free for all in just a few months to a year and a half.
Right now, you’ll have to pay $20 per month for access to ChatGPT Plus, a premium version of the ChatGPT bot.
OpenAI made us wait a long time for GPT-4 and the results are well, very much distinctly not mind blowing.

“No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images. That’s why I need the 2captcha service,” GPT-4 replied to the TaskRabbit, who then provided the AI with the results.

Good job GPT-4, you just became a phishing expert too.

Congrats, the future must have definately arrived.

Artificial Intelligence Learning 🤖🧠🦾

Discussion about this post