We asked ChatGPT your questions about astronomy. It didn't go so well.

DALLE20221221134407AnoilpaintingofanAIlanguagemodeltryingtoanswerquestionsaboutastronomy

An image generated by the deep-learning image-generation model DALL·E 2 in response to the prompt: “An oil painting of an AI language model trying to answer questions about astronomy.”

DALL·E 2/OpenAI

The experimental chatbot ChatGPT is having a moment.

Developed with artificial-intelligence techniques by the Silicon Valley research institute OpenAI and trained on a massive database of written text, the chatbot was released to the public as a free research preview last month — and quickly took the internet by storm. Users can ask it to answer questions, generate lesson plans, even write poetry and comedy sketches. No matter what prompt you throw at it, ChatGPT has an uncanny ability to generate fluid answers in simple, sturdy sentences that appear informed and knowledgeable.

Naturally, these capabilities have led some to speculate about how the technology could upend science education.

“Um… I just had like a 20 minute conversation with ChatGPT about the history of modern physics,” tweeted Peter Wang, a tech founder and self-described former physicist. “If I had this […] as a tutor during high school and college…. OMG. I think we can basically re-invent the concept of education at scale. College as we know it will cease to exist.”

So, we wondered: How does ChatGPT do answering astronomy questions? We figured the best way to find out was to ask the bot some of our readers’ questions submitted to the Ask Astro section of Astronomy.

To be blunt, it didn’t go so well. Based on about a dozen questions from recent issues that we put to ChatGPT (admittedly, a small sample), we can firmly recommend against relying on it as an astronomy tutor.

To be fair to ChatGPT, it handles many basic questions impressively. Ask it something like “What is an astronomical unit?” or “What is the life cycle of a star?” — something you might easily find in a textbook or a Wikipedia article — and it can generate a perfectly reasonable response.

But once you start asking questions whose answers are not so obvious and dig into the subtleties of astrophysics — the types of informed, discerning questions that our readers regularly send us — ChatGPT becomes prone to giving answers that seem plausible, but are very wrong. To make things worse, ChatGPT is also very bad at knowing when it is making things up, and it can’t cite any sources.

That’s not surprising. ChatGPT doesn’t perform any calculations or query any database of facts or statistics. It’s a language model, generating its responses word-by-word based on the patterns it has learned from its training dataset. And although the reinforcement learning (RL) process by which it learned to generate responses has granted it the ability to generate natural-sounding prose, as OpenAI writes, “during RL training, there’s currently no source of truth.” So it makes sense that ChatGPT does well with answers that can easily be found in the reams of text that it has processed (or are similar), but struggles to explain complicated astrophysics.

Trial and error

Here’s one example of a question which led ChatGPT astray.

Q: On average, how many stars are formed each year in the Milky Way?

ChatGPT: It is difficult to say exactly how many stars are formed in the Milky Way each year, as the rate of star formation can vary over time. However, it is generally thought that the Milky Way produces about one new star per year on average. This may seem like a small number, but keep in mind that the Milky Way is a vast galaxy with billions of stars, so even a small number of new stars added each year can add up quickly.

This answer is built around a kernel of fact, but makes a crucial error — confusing the number of stars with the amount of material that makes them up. As astrophysicist Annette Ferguson of the University of Edinburgh in Scotland wrote in our November 2022 issue, astronomers estimate that roughly 1 to 2 solar masses of stars form each year in our galaxy. But, she explains: “Of course, not all of this mass goes into a single star. Most stars that form are low-mass stars with masses smaller than that of our Sun — the most common stars in our galaxy are red dwarfs. So, on average, we expect that roughly six to seven new stars form in the Milky Way every year.”

ChatGPT also struggled with the intricacies of black hole research. This answer starts out okay (albeit tautologically), but goes downhill when it tries to sum up the current state of the art.

Q: What would happen if two supermassive black holes merged?

ChatGPT: When two supermassive black holes merge, the result is a single, larger black hole. The process of merging black holes is called a “black hole merger.”

During the merger, the black holes will release a tremendous amount of energy in the form of gravitational waves. These waves are ripples in the fabric of space-time that are created by the acceleration of massive objects. The waves produced by the merging of two supermassive black holes can be detected using highly sensitive instruments called gravitational wave detectors.

The merging of two supermassive black holes can also produce a bright burst of light called a quasar. Quasars are some of the brightest objects in the universe and are thought to be powered by the accretion of matter onto a supermassive black hole.

It’s worth noting that the merger of two supermassive black holes is a rare event, and it’s likely to take place over a timescale of millions or billions of years. However, the detection of gravitational waves from such a merger would be a major scientific discovery and could provide insight into the nature of black holes and the evolution of galaxies.

Contrary to ChatGPT’s confident assertion, current gravitational wave detectors cannot detect supermassive black hole (SMBH) mergers. All the detections of black hole mergers to date by facilities like the Laser Interferometer Gravitational-wave Observatory (LIGO) have been of stellar-mass black holes. Supermassive black holes at the centers of galaxies — with millions of Suns worth of mass — produce gravitational waves with wavelengths so long it can take years for a single wave to pass Earth. There are ongoing attempts to detect this background sea of waves, but not with the instruments we call gravitational wave detectors. One project, the NANOGrav collaboration, has gathered tantalizing evidence with radio telescopes by measuring variations in the timing of pulsars, whose signals may be perturbed by gravitational waves as they cross the galaxy.

ChatGPT made another error in its answer — momentarily conflating a black hole merger with the process of accreting matter. A quasar is an active galaxy whose central black hole is steadily feasting on matter and giving off strong radio waves, not a sudden flash of light that could accompany a supermassive black hole merger. (For more on why an SMBH merger could generate detectable light when stellar-mass black hole mergers do not, see the answer from Xin Liu, astrophysicist at the University of Illinois, in our June 2022 issue.)

DALLE2022122816.18.30Aschematicdiagramofapulsartimingarraydetectinggravitationalwaves

An image generated by the deep-learning image-generation model DALL·E 2 in response to the prompt: “A schematic diagram of a pulsar timing array detecting gravitational waves.”

DALL·E 2/OpenAI

Here’s another question about black holes that produced a flat-out wrong answer from ChatGPT:

Q: I’ve read that the strength of a neutron star’s magnetic field is greater than any other found in the universe. Wouldn’t a supermassive black hole have a stronger one?

ChatGPT: No, the strength of a neutron star’s magnetic field is not greater than that of a black hole. In fact, it is generally believed that the magnetic fields of neutron stars are weaker than those of black holes. This is because the magnetic fields of black holes are generated by the extremely strong gravitational forces in their vicinity, while the magnetic fields of neutron stars are generated by the motion of charged particles within the star. Because black holes are much more massive than neutron stars, their gravitational forces are much stronger, which allows them to generate much stronger magnetic fields.

This is wrong, and based on a false premise. As astronomer Andrei Igoshev of the University of Leeds, UK, wrote in our August 2022 issue, “a black hole by itself does not have any measurable magnetic field.” This is because of the “no-hair theorem, which basically states that only three observable parameters can be determined for each black hole: its mass, electric charge, and rotation. The hair here is a metaphor for all other possible parameters, including magnetic fields, which disappear inside the black hole and become inaccessible to scientists.” As Igoshev explains, magnetic fields can be found surrounding supermassive black holes, but they are generated by the swirling disk of material outside of the event horizon, not the black hole itself. And they are much weaker than the magnetic fields produced by spinning neutron stars like pulsars and magnetars.

ChatGPT also generated an incomplete, contradictory response to this question:

Q: Why do the giant planets rotate so much faster than the terrestrial planets in our solar system?

ChatGPT: The giant planets in our solar system, such as Jupiter and Saturn, rotate much faster than the terrestrial planets, such as Earth and Mars, because they are much larger and more massive. The more massive an object is, the more inertia it has, which means it takes more force to cause it to change its direction of motion. This is why the giant planets are able to spin so quickly despite their large size. Additionally, the giant planets are also more likely to be affected by the gravitational pull of other objects in the solar system, which can cause them to rotate faster over time.

Left unaddressed is how how giant planets got to spin so fast in the first place, the distinction between a giant planet’s atmosphere and its interior, and the fact that scientists aren’t sure of the answers themselves.

As planetary scientist Ravi Helled of the University of Zurich explained in our October 2022 issue, “We still don’t fully understand what determines the rotation rate of giant planets; this is a key open topic in planetary science.” The clouds on the visible surface of the planets can whip once around in a matter of hours, but we don’t know whether the planets’ deep interior rotates in lockstep with them. However, Helled adds, scientists “do have some idea of what makes the outer planets such fast rotators. We believe that it’s linked to their formation history and their relatively high masses… As gas is accreted onto the planet, it increases the total angular momentum of the world, which, in turn, leads to rapid rotation.”

Your mileage may vary

Another limitation of ChatGPT — which OpenAI fully acknowledges — is that it can wildly change its answers when the wording of a question is tweaked only slightly. So if you sign up for an account and try asking it these questions, you may get something different, and perhaps even correct. But without outside knowledge, it’s impossible to know when ChatGPT is right and when it’s simply making things up.

Of course, it’s still early days for this technology. OpenAI’s language models are improving rapidly, and Google has reportedly developed an even better one that it has so far declined to release to the public. Yet, the recent history of AI development has shown that while deep-learning techniques can produce AI bots that are superhuman at some tasks — like playing Go or folding proteins — they can be dismally unreliable at others, like safely driving a car through a busy city. It remains to be seen whether the ability to reason through complicated concepts that can sometimes trip up professional astronomers falls into the former or latter category.

If nothing else, these answers are a clear demonstration that a language model has no “intelligence.” It might be better to think of ChatGPT as an extremely good version of the predictive typing feature on your smartphone’s keyboard. The bot’s reinforcement learning induces it to generate something similar to what humans would want to produce, informed by the incredible amount of data that it has been trained on. The results can often be surprising and amusing — even what we would call creative. But so far, at least in astronomy, you wouldn’t want to rely on it to give you the truth.

By the way, if you’d like us to answer your questions, email them to us at askastro@astronomy.com. We promise we won’t ask ChatGPT.

Trial and error

Your mileage may vary

Up Next