• lunarul@lemmy.world
      link
      fedilink
      arrow-up
      5
      arrow-down
      2
      ·
      8 months ago

      Because that’s the most likely answer a human would give. If I ask how many Rs in “resurrection” and you say three, I’d think you’re being a wise guy because I obviously meant to ask if it’s spelled “resurrection” (two Rs) or “resurection” (one R).

      But if I ask a computer the same question, I expect it to be interpreted literally and to get the answer 3.

      LLMs are not about factual answers, they’re about human-sounding answers.

      • TootSweet@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        ·
        edit-2
        8 months ago

        I can’t imagine a human answering “two” without elaborating. “Two in a row, and then of course the one on the front” or some such. I myself might say “three” in such a situation without elaborating, but I really don’t think I’d ever say “two” without elaborating to make sure there wasn’t a miscommunication.

        You say LLMs are about “human-sounding” answers. But who on earth would answer the question “How many R’s are there in ‘resurrection’?” with “The word ‘resurrection’ contains two R’s.” Every word in that sentence but one is restating the question. It would only be in more formal “corporate speak” situations where the answer would restate the question like that. And in a formal situation, I’d expect the answer given to be the most “correct”/“accurate” answer. Probably something like “The word ‘resurrection’ contains three R’s.” Either way, the answer given doesn’t look “human-sounding” to me.

        I suppose you could hypothesize that the LLM was trained on a mix of formal and informal sources. Random forum posts where someone asks and gets an informal answer of “two” without elaboration. But also more formal sources where the answer given is more like “the word ‘resurrection’ contains three R’s.” And perhaps the LLM mashed those up into a formally-worded answer but less formally using the “two” in the answer which arguably (and the word “arguably” is doing a lot of heavy lifting here) still communicates the information the asker is seeking, just in a more (and again, this seems like a big reach to me even on its own) “human-sounding” way. But that’s kindof a lot of mental gymnastics to excuse an incorrect answer.

        Just as reasonable an hypothesis is that because there are more words in English that end in “rection” that have two R’s than words that end in “rection” that have one or three (or any other number) of R’s (and because there would be more sources in the training data talking about how many R’s there were in words that end in “rection” where the correct answer was two), the LLM has a good chance of incorrectly answering with a two to most or all words that end in “rection”.

        Or perhaps the real “reason” why LLMs do this is because LLMs are bullshit generators that aren’t good at giving answers that are correct or “human-sounding” (or anything else useful, for that matter.)

        Plus, if people in general would tend to agree that “humans” would answer “two” to that question, then the joke in the original comic wouldn’t work, which makes it seem like the comic’s author and probably a significant portion of readers would agree with me that your explanation is a reach.

        In short… your comment smells a lot like copium.

        Now do this one:

        Conversation with ChatGPT version 3.5. Human: "Best chess player of all time whose name starts with B". ChatGPT: "The greatest chess player of all time whose name starts with "B" is Bagnus Barlsen.

        (There is a world-class chess player - arguably the best of all time - named “Magnus Carlsen”.)

        • lunarul@lemmy.world
          link
          fedilink
          arrow-up
          4
          ·
          8 months ago

          I just tried to keep my comment short. The thing is that the answer depends on context. The only context I would ask how many Rs in “resurrection” would be if I’m in the middle of writing the word and I’m asking someone nearby. They, being aware of the context, would absolutely answer just “two” without elaborating. That’s an actual common scenario in my house (non-native English speakers living in the US). Although, to be fair, I’d more likely word it as “hey, is resurrection two Rs or one?”, not “how many Rs in resurrection?”

          I imagine sources parsed by LLMs would be similar conversations about how “resurrection” is spelled with two Rs. I don’t imagine many natural conversations about how many total letters are in the word. The LLM can’t pick up the difference in context between such sources and the question addressed to it, because LLMs don’t actually have an understanding of the meaning of the prompt or of the answer. It’s just a very advanced evolution of the next word predictor in your phone’s keyboard.

          Again, LLMs are meant to produce something that sounds like something a human would say. Not factually correct, just something that sounds correct and natural.

          And that’s why the best use for LLMs is where you give them the facts, not asking for facts. Like give it a paragraph of information and ask to word it in a specific tone and expand it into a full page of nice sounding text. Or give it an article and ask to summarize.

          As long as you know what a LLM is and what it isn’t, it has its uses. If you’re trying to use it as something that it’s not (and too many people are doing just that), then you’re in for disappointment.

          It’s not copium, I’m not an AI apologist, I personally hate how it’s being forced everywhere where it doesn’t make sense (even in my company, which has previously stayed away from other overhyped technologies), but I do also recognize its usefulness in the places where it does make sense.