• Voroxpete@sh.itjust.works
    link
    fedilink
    arrow-up
    28
    ·
    17 days ago

    Why? No one ever accused chatbots of always being wrong. In fact, it would be actually be better if they were. The biggest problem with LLMs is that they’re right just often enough that its hard to catch when they’re wrong.

    • ipkpjersi@lemmy.ml
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      15 days ago

      To be fair, I have actually seen fringe cases where people accuse AI of always being wrong.

      You’re right, it would be easier if we could just ignore it, but sadly it’s correct enough that it becomes useful for widespread usage, which is why it’s seeing widespread usage. Like always, trust but verify, or just don’t trust it. lol

      • Voroxpete@sh.itjust.works
        link
        fedilink
        arrow-up
        1
        ·
        16 days ago

        You can find fringe cases of anything. That’s why they’re fringe. I refuse to constantly add ten pages of fucking legal disclaimers to every comment I make just to account for the possibility that one idiot tweeted something one time to their ten followers.

      • Voroxpete@sh.itjust.works
        link
        fedilink
        arrow-up
        5
        arrow-down
        2
        ·
        edit-2
        16 days ago

        Not even remotely, and it’s really important to understand a) why there is a difference, and b) why that difference matters, or else you are going to hoover up every bit of propoganda these desperate conmen feed you.

        People are not automated systems, and automated systems are not people.

        Something that people are generally pretty good at is understanding that a process has failed, even if we can’t understand how it has failed. As the adage goes “I don’t need to be a helicopter pilot to see one stuck in a tree and immediately conclude that someone fucked up.”

        LLMs can’t do that. A human and an LLM will both cheerfully produce the wrong answer to “How many Rs in Strawberry.” But a human, even one who knows nothing about cooking, will generally suspect that something might be up when asked to put glue on pizza. That’s because the human is capable of two things the LLM isn’t; reasoning, and context. The human can use their reasoning to draw upon the context provided by their real life experience and deduce that “Glue is not food, and I’ve never previously heard of it being used in food. So something here seems amiss.”

        That’s the first key difference. The second is in how these systems are deployed. You see the conmen trying to sell us all on their “AI” solutions will use exactly the kind of reasoning that you bought - “Hey, humans fuck up too, it’s OK” - in order to convince us that these AI systems can take the place of human beings. But in the process that requires us to place an automated system in the position of a human system.

        There’s a reason why we don’t do that.

        When we use automation well, it’s because we use it for tasks where the error rate on the automated system can be reduced to something far, far lower than that of a well trained human. We don’t expect an elevator to just have a brain fart and take us to the wrong floor every now and then. We don’t expect that our emails will sometimes be sent to a completely different address to the one we typed in. We don’t expect that there’s a one in five chance that our credit card will be billed a different about to what was shown on the machine. None of those systems would ever have seen widespread adoption if they had a standard error rate of even 5%, or 1%.

        Car manufacturing is something that can be heavily automated, because many of the procedures are simple, repeatable, and controllable. The last part is especially important. If you move all the robots in a GM plant to new spots they will instantly fail. If you move the u humans to new spots, they’ll be quite annoyed, but perfectly capable of moving themselves back to the correct places. Yet despite how automatable car manufacturing is, it still employs a LOT of humans, because so many of those tasks do not automate sufficiently well.

        And at the end of the day, a fucked up car is just a fucked up car. Healthcare uses a lot less automation than car manufacturing. That’s not because healthcare companies are stupid. Healthcare is one of the largest industries in North America. They will gladly take any automation they can get. I know this because my line of work involves healthcare companies regularly asking me for automotion. But they also have a very, very low threshold for failure. If one of our systems fails even one time they will demand a full investigation of the failure.

        This is because automated systems, when they are employed, have to be load bearing. They have to be something reliable enough that people can stop thinking about it, even though that same level of reliability isn’t demanded from the human components of these systems.

        This is largely because, generally speaking, humans have much more ability to recognize and correct the failures of other humans. Medical facilities organise themselves around multiple layers of trust and accountability. One of the demands we get most is for more tools to give oversight into what the humans in the system are doing. But that’s because a human is well equipped to recognize when another human is in a failure state. A human can spot that another human came into work hungover. A human can build a context for which of their fellow humans are reliable and which aren’t. Human systems are largely self-healing. High risk work is doled out to high reliability humans. Low reliability humans have their work checked more often.

        But it’s very hard for a human to build context for how reliable an automated system is. This is because the workings of that system are opaque; they do not have the context to understand why the system fails when it fails. In fact, when presented with an automated system that sometimes fails, the way most humans will react its to treat the system as if it always fails. If a button fails to activate on the first press one or two times, you will come back to that same facility a year later to find that it has become common practice for every staff member to press the button five times in a row, because they’ve all been told that sometimes it fails on the first press.

        When presented with an unreliable automated system, humans will choose to use a human instead, because they have assessed that they can better determine when the human has failed and what to do about it.

        And, paradoxically, because we have such a low tolerance for failure in automated systems, when presented with an automated system that will be taking on the work of a human, humans naturally expect that system to be more or less perfect. They expect it to meet the threshold that we tend to set for automated systems. So they don’t check its work, even when when told to.

        The lie that LLMs fuck up in the same way that humans do is used to get a foot in the door, to sell LLM driven systems as a replacement for human labour. But as soon as that replacement is actually being sold, the lie goes away, replaced by a different lie (often a lie by omission); that this will be as reliable as every other automated system you use. Or, at the very least, that “It will be more reliable than a human.” The sellers say this meaning, say, 5% more reliable (in reality the actual failure rate of humans in these tasks is often much, much lower than that of LLMs, especially when you account for false positives which are usually ignored whenever someone touts numbers saying that an LLM did a job better than a human). But the people using the system naturally assume it means “More reliable in the way you expect automated systems to be reliable.”

        All of this creates a massive possibility for real, meaningful hazard. And all of this is before you even get into the specific ways in which LLMs fuck up, and how those fucks up are much more difficult to correct or control for. But thats a whole separate rant.