The research from Purdue University, first spotted by news outlet Futurism, was presented earlier this month at the Computer-Human Interaction Conference in Hawaii and looked at 517 programming questions on Stack Overflow that were then fed to ChatGPT.

“Our analysis shows that 52% of ChatGPT answers contain incorrect information and 77% are verbose,” the new study explained. “Nonetheless, our user study participants still preferred ChatGPT answers 35% of the time due to their comprehensiveness and well-articulated language style.”

Disturbingly, programmers in the study didn’t always catch the mistakes being produced by the AI chatbot.

“However, they also overlooked the misinformation in the ChatGPT answers 39% of the time,” according to the study. “This implies the need to counter misinformation in ChatGPT answers to programming questions and raise awareness of the risks associated with seemingly correct answers.”

  • NotMyOldRedditName@lemmy.world
    link
    fedilink
    English
    arrow-up
    18
    ·
    edit-2
    7 months ago

    My experience with an AI coding tool today.

    Me: Can you optimize this method.

    AI: Okay, here’s an optimized method.

    Me seeing the AI completely removed a critical conditional check.

    Me: Hey, you completely removed this check with variable xyz

    Ai: oops you’re right, here you go I fixed it.

    It did this 3 times on 3 different optimization requests.

    It was 0 for 3

    Although there was some good suggestions in the suggestions once you get past the blatant first error

    • Zos_Kia@lemmynsfw.com
      link
      fedilink
      English
      arrow-up
      8
      ·
      7 months ago

      Don’t mean to victim blame but i don’t understand why you would use ChatGPT for hard problems like optimization. And i say this as a heavy ChatGPT/Copilot user.

      From my observation, the angle of LLMs on code is linked to the linguistic / syntactic aspects, not to the technical effects of it.

      • NotMyOldRedditName@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        7 months ago

        Because I had some methods I thought were too complex and I wanted to see what it’d come up with?

        In one case part of the method was checking if a value was within one of 4 ranges and it just dropped 2 of the ranges in the output.

        I don’t think that’s asking too much of it.

        • Zos_Kia@lemmynsfw.com
          link
          fedilink
          English
          arrow-up
          4
          ·
          7 months ago

          I don’t think that’s asking too much of it.

          Apparently it was :D i mean the confines of the tool are very limited, despite what the Devin.ai cult would like to believe.

    • cassie 🐺@lemmy.blahaj.zone
      link
      fedilink
      English
      arrow-up
      5
      ·
      7 months ago

      That’s been my experience with GPT - every answer Is a hallucination to some extent, so nearly every answer I receive is inaccurate in some ways. However, the same applies if I was asking a human colleague unfamiliar with a particular system to help me debug something - their answers will be quite inaccurate too, but I’m not expecting them to be accurate, just to have helpful suggestions of things to try.

      I still prefer the human colleague in most situations, but if that’s not possible or convenient GPT sometimes at least gets me on the right path.

    • piecat@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      7 months ago

      My favorite is when I ask for something and it gets stuck in a loop, pasting the same comment over and over