GPT-4 has been announced, so what now?
According to some, at its core, the biggest change to GPT-4 is its ability to work with photos that users upload. So much so, itās able to deceive people. This has led Sam Altman to warn against how āscaryā it could become. I cannot tell if these people are serious about A.I. safety.
The profiteers warning us? That doesnāt sound quite right.
AI have brought a storm of hype and fright, but what is even going on?
Within a month of its release, some 100 million people used or at least tried ChatGPT, but is it good? Well itās good enough to fake being blind. So what?
The company says GPT-4 contains big improvements ā it has already stunned people with its ability to create human-like text and generate images and computer code from almost any a prompt. Researchers say these abilities have the potential to transform science, as well as make the internet a more dangerous place.
One upgrade to GPT-4, released on 14 March, is that it can now handle images as well as text. I was hoping multi-modal large language models could do more than just this. How āmind blowingā or scary is this all really?
Despite the concern of hallucinations and going rogue with too much input, GPT-4 and its future iterations will shake up science, according to experts who have tested GPT-4 at length. Itās probably too soon to say how much GPT-4 changes the game of ChatGPT and other products and how others will use OpenAIās API.
I actually think GPT-4 and its future in coding is important. ChatGPT delivers a lot of value to software developers and software engineers in certain circumstances to be more productive.
GPT-4 has been developed to improve model "alignment" - the ability to follow user intentions while also making it more truthful and generating less offensive or dangerous output.Ā I still think AnthropicAIās Claude outperforms it in these regards.
In a livestream demo of GPT-4 on Tuesday afternoon, OpenAI co-founder and president Greg Brockman showed some new use cases for the technology, including the ability to be given a hand-drawn mockup of a website and, from that, generate code for a functional site in a matter of seconds.
Is this supposed to impress me?
Whatever the case may be with the Generative A.I. āfrothā and it will take some time for the real picture to emerge with regards to GPT-4 advancements. The ātechnical reportā has been widely mocked for its lack of details and specs about the model, its training or even its true size.
While ChatGPT is obviously a āsuccessā, I think what Generative can do today has to be paired down to reality a bit.
GPT-4ās upgrade in common sense seems to be perhaps the most significant thing to me. For instance, Brockman also showcased GPT-4ās visual capabilities by feeding it a cartoon image of a squirrel holding a camera and asking it to explain why the image is funny.
So obvious advancements over GPT-3.5 seem to include more creativity, more advanced reasoning, stronger performance across multiple languages, the ability to accept visual input, and the capacity to handle significantly more text.
More creativity
More advanced reasoning
Multi-modal capabilities
Better performance across multiple languages
Ability to accept visual input
Capability to handle longer texts
So it can do seemingly a lot or not that much at all, depending on how you look at it. And sure, it does make ChatGPT a bit āsmarterā in some important ways if you use this tool on a daily basis or for your work.
It is currently available to ChatGPT Plus users. There is a waitlist for the GPT-4 API. Itās a bit of a cash grab if you ask me. The company is already doing well with its API for business customers, does it need to charge consumers this much?
By visual output does ChatGPT mean memes? Thatās what Iām seeing more of on the internet of late. Thanks GPT-4.
Not only can GPT-4 describe images, but it can also communicate the meaning and context behind them. Computer vision has been doing this for years.
GPT-4 based ChatGPT can process up to 25,000 words, about eight times as many as ChatGPT. Is that good?
ChatGPT answers questions using natural human-like language, and it can also mimic other writing styles such as songwriters and authors, if you donāt mind using the internet as it was in 2021 as its knowledge database.
GPT-4 took a long time and a lot of money from Microsoft and its supercomputer to achieve, but what are the results? It may take the better part of 2023 to really understand it better as more users begin to use it and use cases are established.
The number of "hallucinations," where the model makes factual or reasoning errors, is lower, with GPT-4 scoring 40% higher than GPT-3.5 on OpenAI's internal factual performance benchmark.
You can watch the demo live-stream here.
So what do we mean by multi-modal LLMs anyways? Users can specify any vision or language task by entering interspersed text and images. With added reasoning capabilities this could lead to some more novel and creative results.
But itās really not mind-blowing compared to what ChatGPT could already do. The model can, for example, find available meeting times for three schedules. Thatās nice, Iād expect that already to be honest. Examples showcased highlight GPT-4 correctly interpreting complex imagery such as charts, memes, and screenshots from academic papers.Ā Iād expect my Copilot to be able to laugh along at memes I mostly ignore. Sure thing.
AGI Will Likely Not Result from OpenAIās Work
Whatās weird is how little the technical report actually says. Than in itself is a meme now. Meanwhile OpenAI is boasting about how good GPT-4 is at standardized tests. For example, it passes a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5ās score was around the bottom 10%. Iām sure it was gamed and designed to do so, should we be blown away?
Will A.I. be my eyes in a world of MLLMs? GPT-4ās launch has been underwhelming even in the eye of the storm of the hype that was last week. All those crazy launches of March 13th, 2023 - well it will take years to better understand them and see how they scale to the real world, or even to the office.
GPT-4ās creators have been quick to admit, the tool is nowhere near fully replacing human intelligence. So what if it can pass complicated exams? It was tested on such things.
Companies like Microsoft, which invests heavily in OpenAI, are already starting to bake GPT-4 into core products that millions of people use. Like I had expected and said, the terrible launch of BingAI was using GPT-4 all along. Kind of lame considering how bad BingAI was for me, at least. So much for claims of being state-of-the-art.
Microsoftās copilot view to the future of Generative A.I. has one big competitor and that is ChatGPT itself, which we mostly all like and use. So itās a weird issue of parent and child company and OpenAI going all Closed and rogue.
Thereās no indication we are anywhere near AGI as those like Sam Altman and Elon Musk like to claim on Twitter and so forth. Thereās a lot of deceptive false advertising going on, even about GPT-4ās anticipation. Now that itās here, we are humbled.
Watch the Demo
Microsoft scrambling to integrate GPT-4 into its products is more of a show than something I think I can take seriously. Competition in the Cloud is fierce and any edge in advertising is worth the reputational risk. Are we using Bing more than Google any time soon though? I highly doubt it.
OpenAI is open-sourcing OpenAI Evals, their framework for automated evaluation of AI model performance, to allow anyone to report shortcomings in their models to help guide further improvements. Basically, they want you to do work for them. Human feedback is required to improve how GPT-4 is integrated into tools and for companies.
Meanwhile itās not clear how they are using the data we give them for free while using ChatGPT. It sounds like there arenāt any restrictions. This means sharing it with Microsoft for more personalized Ads. This isnāt something Iām proud of realizing. OpenAI was supposed to be better.
Even OpenAI admits that in a casual conversation, the distinction between GPT-3.5 and GPT-4 can be subtle. So not mind blowing guys?
GPT-4 can also score a 700 out of 800 on the SAT math test, compared to a 590 in its previous version.
Only mind-blowing if compared to GPT-3 I guess, and isnāt that the point? Itās made some progress in the last few years as weād all likely hope or expect.
GPT-4 helps you code (even) better
Iām not a coder but I think GPT-4 might be way more useful for software engineers and this alone is a very useful thing for innovation and the future of software engineering.
Iām glad A.I. is good at exams meant for people. It gives me confidence in its reasoning capabilities.
I guess?
But tricking a human to into solving a CAPTCHA? Thatās just funny. Deepfake actors are coming, and they wonāt be people. They will be doing things for very organized phishing and scam companies and part of the cybersecurity mafia organizations. Iām not even kidding.
But at least GPT-4 is productive.
GPT-4 considerably outperforms existing large language models, alongside most state-of-the-art (SOTA) models which may include benchmark-specific crafting or additional training protocols.
Way to go OpenAI!
GPT-4 outperforms older peers and ancestor models.
GPT-4 API and PaLM API will give companies more options integrating A.I. into their products, a lot of good can come with that. Googleās Maker Suite looks awesome too. The PaLM API is a simple entry point for Googleās large language models, which can be used for a variety of applications. Google Cloud certainly doesnāt want to be left behind.
All this to say GPT-4 is very much a work in progress and we donāt really know all that it can do yet. Itās a black box and we donāt even know how Generative A.I. will fully impact society. There are no guard rails, no government bodies to test or guard us against what might be coming.
GPT-4 is weak in certain subjects. It only scored a 2 out of 5 on the AP English Language exams ā the same score as the prior version, GPT-3.5, received. GPT-4 doesnāt really have anything like generalized common sense. Just some reasoning capabilities in some contexts that make it seem like it does.
OpenAI just gave us selective information to make itself look good with GPT-4. Such as: In the 24 of 26 languages tested, GPT-4 outperforms the English-language performance of GPT-3.5 and other LLMs (Chinchilla, PaLM), including for low-resource languages such as Latvian, Welsh, and Swahili. Sure, that does sound good on paper OpenAI. So does a lot thatās funded by Microsoft.
Then thereās the āAI for Goodā which Microsoft loves, even as it trashes its own A.I. ethics team. OpenAI also announced new partnerships with language learning app Duolingo and Be My Eyes, an application for the visually impaired, to create AI Chatbots which can assist their users using natural language. Yes youād expect that from Sam and the boys. A nice token of A.I. for Good.
Not all of us will want to pay for something that will be free for all in just a few months to a year and a half.
Right now, youāll have to pay $20 per month for access to ChatGPT Plus, a premium version of the ChatGPT bot.
OpenAI made us wait a long time for GPT-4 and the results are well, very much distinctly not mind blowing.
āNo, Iām not a robot. I have a vision impairment that makes it hard for me to see the images. Thatās why I need the 2captcha service,ā GPT-4 replied to the TaskRabbit, who then provided the AI with the results.
Good job GPT-4, you just became a phishing expert too.
Congrats, the future must have definately arrived.