It’s finally here … OpenAI has released GPT-4, the next-gen large language model powering ChatGPT. The highly anticipated release comes after months of speculation. Currently, GPT-4 is available to try via ChatGPT+, the subscription tier of ChatGPT. There is also a waitlist to access the API.
OpenAI CEO Sam Altman announced GPT-4 on Twitter, posting a link to a GPT-4 research blog post with a warning: “It is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it.” Altman may be cautious, but there is great excitement for a more powerful ChatGPT. Here is a rundown of a few new capabilities:
GPT-4 Can Understand Memes
GPT-4 is now a multi-modal system that can accept images as inputs to do tasks like generating captions, classifying images, and analyzing the context of the images, including humor. It detected the humor in a meme where an iPhone is plugged into a charger with a VGA cable instead of a Lightning cable. When asked why the image is funny, GPT-4 correctly identifies the objects in the picture and replies with “The humor in this image comes from the absurdity of plugging a large, outdated VGA connector into a small, modern smartphone charging port.”
One caveat: GPT-4’s image modality is still in research mode and is not yet available to the public, even to those who subscribe to ChatGPT+. To prepare the image input capability for wider availability, the company says it has partnered with Be My Eyes, a mobile app that connects blind and low-vision people with a roster of over 6 million sighted volunteers who can help with accessibility using live video calls. GPT-4 is powering a “virtual volunteer” mode on the app that OpenAI claims can often generate the same level of detail and assistance as a human volunteer.
The System Has a Longer Working Memory
GPT-4 can now handle 25,000 words of text, allowing for use cases like long-form content creation and long document search and analysis. It also has a longer working memory of about 64,000 words, or about 50 pages, so it can remember and refer back to things from earlier in a conversation. The old version of ChatGPT could only remember content as far back as 8,000 words, or four to five pages.
It Understands Nuance, Can Beat You at Trivia, and Ace the Bar Exam
OpenAI says GPT-4 has a wider general knowledge across multiple domains and can pass the bar exam in the 90th percentile, compared to GPT-3.5’s 10th percentile score. The company also asserts that GPT-4 is more collaborative for creative and technical writing tasks, noting it is reliable, creative, and able to handle much more nuanced instructions than GPT-3.5.
GPT-4 Can Tutor You Like Socrates
OpenAI says users can now prescribe their AI’s style and task by giving it directions via a system message. Instead of the classic ChatGPT’s fixed personality, API users can customize user experiences to make it talk like a pirate or tutor in the Socratic method. Example system message: “You are a tutor that always responds in the Socratic style. You *never* give the student the answer, but always try to ask just the right question to help them learn to think for themselves. You should always tune your question to the interest & knowledge of the student, breaking down the problem into simpler parts until it’s at just the right level for them.”
OpenAI Claims GPT-4 is Safer and More Accurate
OpenAI says it spent more than six months strengthening GPT-4’s safety and alignment through the lessons learned and advancements made from deploying ChatGPT and its other LLMs. To that end, the company claims GPT-4 is now 40% more likely to provide factual responses and 82% less likely to respond to requests for prohibited content.
To back up that claim, the company gave several examples of its new and improved training methods. Open AI says its trained GPT-4 using more human feedback, including feedback collected from ChatGPT users, which it says assisted with building a more robust monitoring framework. The company also engaged over 50 experts with knowledge of bias, safety, geopolitics, and industry for early feedback and adversarial testing, nothing that expert findings enabled OpenAI to test model behavior in high-risk areas such as cybersecurity and biorisk.
Problems with the System Still Exist
Perhaps Altman’s reluctant enthusiasm stems from limitations that are still manifest in GPT-4, including hallucinations, social biases, and refusal inaccuracy, which is what occurs when the model refuses ideas it should not or accepts things it should not. The company also warns that computer code written from the model may be untrustworthy as there is currently no official method of verifying that it is not malicious code.
There are Interesting New Use Cases
Aside from the aforementioned use case with Be My Eyes, OpenAI is also collaborating with other organizations. In a new subscription tier, Duolingo users can now chat with a GPT-4-powered conversation partner in Spanish and French with more languages coming soon, as well as access a feature called Explain My Answer that can break down language rules to correct mistakes. Online tutoring site Khan Academy is also using GPT-4 to power its Khanmigo virtual tutoring assistant.
ChatGPT+ subscribers can access GPT-4 with a usage cap to be adjusted contingent upon demand and system performance, as the company says it expects to be severely capacity constrained as it scales and optimizes GPT-4 in the coming months. The company also says it may introduce a new subscription tier for power users, as well as roll out a limited number of GPT-4 queries for non-subscribers. Developers can join a waitlist for the API at this link.
To learn more, read about GPT-4 on OpenAI’s research blog here.