ChatGPT is an unbelievably bad tutor if what you want a tutorial about is even a little bit obscure (e.g. the answer you want isn't already included in Wikipedia). It just confidently states vaguely plausible sounding made up nonsense, and then when you ask it if it was mistaken it shamelessly makes up different total nonsense, and if you ask it for sources it makes up non-existent sources, and then when you look whatever it was up for yourself you have to spend 2x as long as you originally would have chasing down wrong paths and trying to understand exactly which parts ChatGPT was wrong about (most of them).
And that's assuming you are a very savvy and media literate inquirer with plenty of domain expertise.
In cases where the answer you want was already easily findable, ChatGPT still is wrong about a lot of it, and you could have more easily gotten a (mostly) correct answer by looking at standard sources, or if you want to be more careful tracking down their actually existing cited sources or doing a skim search through the academic literature.
If you ask it something in a topic you are not already an expert about, or if you are e.g. an ordinary high school or college student, you are almost certainly coming away from the conversation with serious misconceptions.
ChatGPT is an unbelievably bad tutor if what you want a tutorial about is even a little bit obscure
That has absolutely not been my experience at all. It's brought me up to speed in areas from ML to advanced DSP that I'd been struggling with for a long time.
How long has it been since you used it, and what did you ask it?
If the code I wrote based on my newly-acquired insight works, which it does, that's good enough for me.
Beyond that, there seems to be some kind of religious war in play on this topic, about which I have no opinion... at least, none that would be welcomed here.
> The “possible error surface” is large, logical (as opposed to syntactic), and very tricky to unit test. For example, perhaps you forgot to flip your labels when you left-right flipped the image during data augmentation. Your net can still (shockingly) work pretty well because your network can internally learn to detect flipped images and then it left-right flips its predictions. Or maybe your autoregressive model accidentally takes the thing it’s trying to predict as an input due to an off-by-one bug. Or you tried to clip your gradients but instead […]
> Therefore, your misconfigured neural net will throw exceptions only if you’re lucky; Most of the time it will train but silently work a bit worse.
Actually Karpathy is a good example to cite. I took a few months off last year and went through his "Zero to hero" videos among other things, following along to reimplement his examples in C++ as an introductory learning exercise. I spent a lot of time going back and forth with ChatGPT to understand various aspects of backpropagation through operations including matmuls and softmax. I ended up well ahead of where I would otherwise have been, starting out as a rank noob.
Look: again, this is some kind of religious thing where a lot of people with vested interests (e.g., professors) are trying to plug the proverbial dyke. Just how much water there is on the other side remains to be seen. But finding ways to trip up a language model by challenging its math skills isn't the flex a lot of you folks think it is... and when you discourage students from taking advantage of every tool available to them, you aren't doing them the favor you think you are. AI got a hell of a lot smarter over the past few years, along with many people who have found ways to use it effectively. Did you?
With regard to being fooled by buggy code or being satisfied with mistaken understanding, you don't know me from Adam, but if you did you'd give me a little more credit than that.
I'm not a professor and I don't have any vested interest in ChatGPT being good or bad. It just isn't currently useful for me, so I don't use it. In my experience so far it's basically always a waste of my time, but I haven't really put in that much work to find places where it isn't.
It's not a religious thing. If it suddenly becomes significantly better at answering nontrivial questions and stops confidently making up nonsense, I might use it more.
You are obviously experienced and have knowledge of advanced abstract topics.
For you using ChatGPT as a NLP and flawed search mechanism is fine and even more efficient than some alternatives.
Advocating that it would be just as useful and manageable by inexperienced young students with far less context in their minds is disingenuous at best.
I have tried asking it all sorts of questions about specific obscure word etymologies and translations, obscure people's biographies (ancient and modern), historical events, organizations, academic citations, mathematical definitions and theorems, physical experiments, old machines, native plants, chemical reactions, diseases, engineering methods, ..., and it almost invariably flubs every question I throw at it, sometimes subtly and sometimes quite dramatically, often making up abject nonsense out of whole cloth. As a result I don't bother too much; I've found it to waste more time than it saves. To be fair, the kinds of questions I would want a tool like this to answer are usually ones I would have to spend some time and effort hunting to answer properly, and I'm pretty fast and effective at finding information.
I haven't tried asking too much about questions that I could trivially answer some other way. If what you want to know can be found in any intro undergrad textbook or standard dictionary (or Wikipedia), it's plausible that it would be better able to parrot back more or less the correct thing. But again, I haven't done much of this, preferring to just get hold of the relevant dictionary or textbook and read it directly.
I'll give you an example. I just now asked chatgpt.com what Lexell's theorem is and it says this:
> Lexell's theorem is a result in geometry related to spherical triangles. Named after the mathematician Michel Léonard Jean Leclerc, known as Lexell, it states: ¶ In a spherical triangle, if the sum of the angles is greater than π radians (or 180 degrees), then the spherical excess (the amount by which the sum of the angles exceeds π) is equal to the area of the spherical triangle on a unit sphere. ¶ In simpler terms, for a spherical triangle, the difference between the sum of its angles and π radians (180 degrees) gives the area of the triangle when the sphere is of unit radius. This theorem is fundamental in spherical geometry and helps relate angular measurements directly to areas on a sphere.
This gets the basic topic right ("is a result in geometry related to spherical triangles", involves area or spherical excess) but everything else about the answer, starting with the mathematician's identity, is completely wrong.
If I tell it that this is incorrect, it repeats a random assortment of other statements, none of which is actually the theorem I am asking about. E.g.
> [...] In a spherical triangle, if you have a spherical triangle with vertices A, B, and C, and the sides of the triangle are a, b, and c (measured in radians), then: ¶ cos(a)cos(b) + sin(a)sin(b)cos(C) = cos(c). [...]
or
> [...] In a spherical polyhedron, the sum of the angles at each vertex is equal to 2π radians minus the sum of the interior angles of the faces meeting at that vertex. [...]
> every spherical triangle with the same surface area on a fixed base has its apex on a small circle, called Lexell's circle or Lexell's locus, passing through each of the two points antipodal to the two base vertices.
The problem ChatGPT has is that it's not able to just say something true but incomplete such as "I'm not sure what Lexell's theorem is or who Lexell was, but I know the theorem has something to do with spherical trigonometry; maybe it could be found in the more comprehensive books about the subject such as Todhunter & Leathem 1901 or Casey 1889".
Instead it just authoritatively spouts one bit of nonsense after another. (Every topic I have ever tried asking it about in detail is more or less the same.) The incorrect statements range from subtly wrong (e.g. two different things with similar names got conflated and some of the properties of the more common one were incorrectly applied to the other) to complete nonsense (jumbles of technical jargon strung together that are more or less gibberish). It's clear if you read carefully about any technical topic that it doesn't actually understand what it is saying, and is just combining bits of vaguely related material. Answers to technical questions are almost never entirely technically accurate unless you ask a very standard question about a very basic topic.
Anyone using it for any purpose should (a) be already pretty media literate with some domain expertise, and (b) be willing to carefully verify every part of every statement.
Can't argue with that. Your earlier point is the key: "e.g. the answer you want isn't already included in Wikipedia." Anything specialized enough not to be covered by Wikipedia or similar resources -- or where, in your specific example, the topic was only recently added -- is not a good subject for ChatGPT. Not yet, anyway.
Now, pretend you're taking your first linear algebra course, and you don't quite understand the whole determinant thing. Go ask it for help with that, and you will have a very different experience.
In my own case, what opened my eyes was asking it for some insights into computing the Cramer-Rao bound in communications theory. I needed to come up to speed in that area awhile back, but I'm missing some prereqs, so textbook chapters on the topic aren't as helpful as an interactive conversation with an in-person tutor would be. I was blown away at how effective GPT4o was at answering follow-up questions and imparting actionable insights.
A problem, though, is that it is not binary. There is a whole spectrum of nonsense, and if you are not a specialist it is not obvious to figure out the accuracy of the reply. Sometimes by chance you end up asking for something the model knows about for some reason, but very often not. That is the wrong aspect of it. Students might rely on it in their 1st year because it worked a couple of times and then learn a lot of nonsense among the truthy facts LLMs tend to produce.
The main problem is not that they are wrong. It would be simpler if they were. But then, recommending students to use them as tutors is really not a good idea, unless what you want is overconfidently wrong students (I mean more than some of them already are). It’s not random doomsayers saying this; it’s university professors and researchers with advanced knowledge. Exactly the people that should be trusted for this kind of things, more than AI techbros.
We could probably find a middle ground for agreement if we said, "Don't use current-gen LLMs as a tutor in fields where the answer can't be checked easily."
So... advanced math? Maybe not such a good idea, at least for independent study where you don't have access to TAs or profs.
I do think there's a lot of value in the ELI5 sense, though. Someone who spends time asking ChatGPT4 about Galois theory may not come away with the skills to actually pass a math test. But if they pursue the conversation, they will absolutely come away with a good understanding of the fundamentals, even with minimal prior knowledge.
Programming? Absolutely. You were going to test that code anyway, weren't you?
Planning and specification stages for a complex, expensive, or long-term project? Not without extreme care.
Generating articles on quantum gravity for Social Text? Hell yeah.
A statement I would support is: "Don't use LLMs, for anything where correctness or accuracy matters, period, and make sure you carefully check every statement they make against some more reliable source before relying on it. If you use LLMs for any purpose, make sure you have a good understanding of their limitations, some relevant domain experience, and are willing to accept that the output may be wrong in a wide variety of ways from subtle to total."
There are many uses where accuracy may not matter: loose machine translation to get a basic sense of what topic some text is about; good-enough OCR or text to speech to make a keyword index for searching; generation of acceptably buggy code to do some basic data formatting for a non-essential purpose; low-fidelity summarization of long texts you don't have time to read; ... (or more ethically questionably, machine generating mediocre advertising copy / routine newspaper stories / professional correspondence / school essays / astroturf propaganda on social media / ...)
But "tutoring naïve students" seems currently like a poor use case. It would be better to spend some time teaching those students to better find and critically examine other information sources, so they can effectively solve their own problems.
Again, it's not only old theorems where LLMs make up nonsense, but also (examples I personally tried) etymologies, native plants, diseases, translations, biographies of moderately well known people, historical events, machines, engineering methods, chemical reactions, software APIs, ...
Other people have complained about LLMs making stuff up about pop culture topics like songs, movies, and sports.
> good understanding of the fundamentals
This does not seem likely in general. But it would be worth doing some formal study.
> Anything specialized enough not to be covered by Wikipedia or similar resources [...] is not a good subject for ChatGPT.
Things don't have to be incredibly obscure to make ChatGPT completely flub them (while authoritatively pretending it knows all the answers), they just have to be slightly beyond the most basic details of a common subject discussed at about the undergraduate level. Lexell's theorem, to take my previous example, is discussed in a wide variety of sources over the past 2.5 centuries, including books and papers by several of the most famous mathematicians in history, canonical undergraduate-level spherical trigonometry textbooks from the mid 20th century, and several easy-to-find papers from the past couple decades, including historical and mathematical surveys of the topic. It just doesn't happen to be included in the training data of reddit comments and github commit messages or whatever, because it doesn't get included in intro college courses so nobody is asking for homework help about it.
If you stick to asking single questions like "what is Pythagoras's theorem" or "what is the most common element in the Earth's atmosphere" or "who was the 4th president of the USA" or "what is the word for 'dog' in French", you are fine. But as soon as you start asking questions that require knowledge beyond copy/pasting sections of introductory textbooks, ChatGPT starts making (often significant) errors.
As a different kind of example, I have asked ChatGPT to translate straightforward sentences and gotten back a translation with exactly the opposite meaning intended by the original (as verified by asking a native speaker).
The limits of its knowledge and response style make ChatGPT mostly worthless to me. If something I want to know can be copy/pasted from obvious introductory sources, I can already find it trivially and quickly. And I can't really trust it even for basic routine stuff, because it doesn't link to reliable sources which makes its claims unnecessarily difficult to verify. Even published work by professionals often contains factual errors, but when you read them you can judge their name/reputation, look at any cited sources, compare claims from one source to another, and so on. But if ChatGPT tells you something, you have no idea if it read it on a conspiracist blog, found it in the canonical survey paper about the topic, or just made it up.
> Go ask it for help [understanding determinants], and you will have a very different experience.
It's going to give you the right basic explanation (more or less copy/pasted from some well written textbook or website), but if you start asking follow-up questions that get more technically involved you are likely to hit serious errors within not too many hops which reveal that it doesn't actually understand what a determinant is, but only knows how to selectively regurgitate/paraphrase from its training corpus (and routinely picks the wrong source to paraphrase or mashes up two unrelated topics).
You can get the same accurate basic explanation by doing a quick search for "determinant" in a few introductory linear algebra textbooks, without really that much more trouble; the overhead of finding sources is small compared to the effort required to read and think about them.
Are you using the free version?
GPT 4 Turbo (which is paid) gives this:
> Lexell's theorem is a result in geometry related to triangles and circles. Named after the mathematician Anders Johan Lexell, the theorem describes a special relationship between a triangle and a circle inscribed in one of its angles. Here's the theorem:
Given a triangle \(ABC\) and a circle that passes through \(B\) and \(C\) and is tangent to one of the sides of the angle at \(A\) (say \(AB\)), the theorem states that the circle's other tangent point with \(AB\) will lie on the circumcircle of triangle \(ABC\).
In other words, if you have a circle that touches two sides of a triangle and passes through the other two vertices, the point where the circle touches the third side externally will always lie on the triangle’s circumcircle. This theorem is useful in solving various geometric problems involving circles and triangles.
And that's assuming you are a very savvy and media literate inquirer with plenty of domain expertise.
In cases where the answer you want was already easily findable, ChatGPT still is wrong about a lot of it, and you could have more easily gotten a (mostly) correct answer by looking at standard sources, or if you want to be more careful tracking down their actually existing cited sources or doing a skim search through the academic literature.
If you ask it something in a topic you are not already an expert about, or if you are e.g. an ordinary high school or college student, you are almost certainly coming away from the conversation with serious misconceptions.