GPT-4, the latest version of the artificial intelligence chatbot ChatGPT, can pass high school tests and law school exams with scores ranking in the 90th percentile and has new processing capabilities that were not possible with the prior version.
The figures from GPT-4’s test scores were shared on March 14 by creator OpenAI, revealing it can also convert image, audio and video inputs to text in addition to handling “much more nuanced instructions” more creatively and reliably.
“It passes a simulated bar exam with a score around the top 10% of test takers,” OpenAI added. “In contrast, GPT-3.5’s score was around the bottom 10%.”
The figures show that GPT-4 achieved a score of 163 in the 88th percentile on the LSAT exam — the test college students need to pass in the United States to be admitted into law school.
GPT4’s score would put it in a good position to be admitted into a top 20 law school and is only a few marks short of the reported scores needed for acceptance to prestigious schools such as Harvard, Stanford, Princeton or Yale.
The prior version of ChatGPT only scored 149 on the LSAT, putting it in the bottom 40%.
GPT-4 also scored 298 out of 400 in the Uniform Bar Exam — a test undertaken by recently graduated law students permitting them to practice as a lawyer in any U.S. jurisdiction.
The old version of ChatGPT struggled in this test, finishing in the bottom 10% with a score of 213 out of 400.
As for the SAT Evidence-Based Reading & Writing and SAT Math exams taken by U.S. high school students to measure their college readiness, GPT-4 scored in the 93rd and 89th percentile, respectively.
GPT-4 excelled in the “hard” sciences too, posting well above average percentile scores in AP Biology (85-100%), Chemistry (71-88%) and Physics 2 (66-84%).
However its AP Calculus score was fairly average, ranking in the 43rd to 59th percentile.
Another area where GPT-4 was lacking was in English literature exams, posting scores in the 8th to 44th percentile across two separate tests.
OpenAI said GPT-4 and GPT-3.5 took these tests from the 2022-2023 practice exams, and that “no specific training” was taken by the language processing tools:
“We did no specific training for these exams. A minority of the problems in the exams were seen by the model during training, but we believe the results to be representative.”