Study Finds That 52 Percent of ChatGPT Answers to Programming Questions Are Wrong

ekZepp@lemmy.world · 2 years ago

Study Finds That 52 Percent of ChatGPT Answers to Programming Questions Are Wrong

dgmib@lemmy.world · 2 years ago

Sometimes ChatGPT/copilot’s code predictions are scary good. Sometimes they’re batshit crazy. If you have the experience to be able to tell the difference, it’s a great help.

fossilesque@mander.xyz · 2 years ago

I find the mistakes it makes and trouble shooting them really good for learning. I’m self taught.

Potatos_are_not_friends@lemmy.world · 2 years ago

Pretty much this. Experienced developers see AI just as a next level lorem Ipsum.

0x01@lemmy.ml · 2 years ago

I’m a 10 year pro, and I’ve changed my workflows completely to include both chatgpt and copilot. I have found that for the mundane, simple, common patterns copilot’s accuracy is close to 9/10 correct, especially in my well maintained repos.

It seems like the accuracy of simple answers is directly proportional to the precision of my function and variable names.

I haven’t typed a full for loop in a year thanks to copilot, I treat it like an intent autocomplete.

Chatgpt on the other hand is remarkably useful for super well laid out questions, again with extreme precision in the terms you lay out. It has helped me in greenfield development with unique and insightful methodologies to accomplish tasks that would normally require extensive documentation searching.

Anyone who claims llms are a nothingburger is frankly wrong, with the right guidance my output has increased dramatically and my error rate has dropped slightly. I used to be able to put out about 1000 quality lines of change in a day (a poor metric, but a useful one) and my output has expanded to at least double that using the tools we have today.

Are LLMs miraculous? No, but they are incredibly powerful tools in the right hands.

Don’t throw out the baby with the bathwater.

MajorHavoc@programming.dev · 2 years ago

As a fellow pro, who has no issues calling myself a pro, because I am…

You’re spot on.

The stuff most people think AI is going to do - it’s not.

But as an insanely convenient auto-complete, modern LLMs absolutely shine!

sylver_dragon@lemmy.world · 2 years ago

I think AI is good with giving answers to well defined problems. The issue is that companies keep trying to throw it at poorly defined problems and the results are less useful. I work in the cybersecurity space and you can’t swing a dead cat without hitting a vendor talking about AI in their products. It’s the new, big marketing buzzword. The problem is that finding the bad stuff on a network is not a well defined problem. So instead, you get the unsupervised models faffing about, generating tons and tons of false positives. The only useful implementations of AI I’ve seen in these tools actually mirrors you own: they can be scary good at generating data queries from natural language prompts. Which is, once again, a well defined problem.

Overall, AI is a tool and used in the right way, it’s useful. It gets a bad rap because companies keep using it in bad ways and the end result can be worse than not having it at all.

jsomae@lemmy.ml · 2 years ago

In fairness, it’s possible that if 100 companies try seemingly bad ideas, 1 of them will turn out to be extremely profitable.

TrickDacy@lemmy.world · 2 years ago

Refreshing to see a reasonable response to coding with AI. Never used chatgpt for it but my copilot experience mirrors yours.

I find it shocking how many developers seem to think so many negative thoughts about it programming with AI. Some guy recently said “everyone in my shop finds it useless”. Hard for me to believe they actually tried copilot if they think that

Specal@lemmy.world · 2 years ago

I’ve found that the better I’ve gotten at writing prompts and giving enough information for it to not hallucinate, the better answers I get. It has to be treated as what it is, a calculator that can talk, make sure it has all of the information and it will find the answer.

One thing I have found to be super helpful with GPT4o is the ability to give it full API pages so it can update and familiarise it’s self with what it’s working with.

nephs@lemmygrad.ml · 2 years ago

Omg, I feel sorry for the people cleaning up after those codebases later. Maintaing that kind of careless “quality” lines of code is going to be a job for actual veterans.

And when we’re all retired or dead, the whole world will be a pile of alien artifacts from a time when people were still able to figure stuff out, and llms will still be ridiculously inefficient for precise tasks, just like today.

https://youtu.be/dDUC-LqVrPU

raspberriesareyummy@lemmy.world · 2 years ago

I’m a 10 year pro,

You wish. The sheer idea of calling yourself a “pro” disqualifies you. People who actually code and know what they are doing wouldn’t dream of giving themselves a label beyond “coder” / “programmer” / “SW Dev”. Because they don’t have to. You are a muppet.

TrickDacy@lemmy.world · 2 years ago

A lot of rage for a small amount of confidence

Gsus4@mander.xyz · 2 years ago

elon?

Boozilla@lemmy.world · edit-2 2 years ago

It’s been a tremendous help to me as I relearn how to code on some personal projects. I have written 5 little apps that are very useful to me for my hobbies.

It’s also been helpful at work with some random database type stuff.

But it definitely gets stuff wrong. A lot of stuff.

The funny thing is, if you point out its mistakes, it often does better on subsequent attempts. It’s more like an iterative process of refinement than one prompt gives you the final answer.

Downcount@lemmy.world · 2 years ago

The funny thing is, if you point out its mistakes, it often does better on subsequent attempts.

Or it get stuck in an endless loop of two different but wrong solutions.

Me: This is my system, version x. I want to achieve this.

ChatGpt: Here’s the solution.

Me: But this only works with Version y of given system, not x

ChatGpt: <Apology> Try this.

Me: This is using a method that never existed in the framework.

ChatGpt: <Apology> <Gives first solution again>

UberMentch@lemmy.world · 2 years ago

I used to have this issue more often as well. I’ve had good results recently by **not ** pointing out mistakes in replies, but by going back to the message before GPT’s response and saying “do not include y.”

Boozilla@lemmy.world · 2 years ago

Ha! That definitely happens sometimes, too.

BrianTheeBiscuiteer@lemmy.world · 2 years ago

While explaining BTRFS I’ve seen ChatGPT contradict itself in the middle of a paragraph. Then when I call it out it apologizes and then contradicts itself again with slightly different verbiage.

WalnutLum@lemmy.ml · 2 years ago

This is because all LLMs function primarily based on the token context you feed it.

The best way to use any LLM is to completely fill up it’s history with relevant context, then ask your question.

Boozilla@lemmy.world · 2 years ago

I worked on a creative writing thing with it and the more I added, the better its responses. And 4 is a noticeable improvement over 3.5.

Crisps@lemmy.world · 2 years ago

In the short term it really helps productivity, but in the end the reward for working faster is more work. Just doing the hard parts all day is going to burn developers out.

birbs@lemmy.world · 2 years ago

I program for a living and I think of it more as doing the interesting tasks all day, rather than the mundane and repetitive. Chat GPT and GitHub Copilot are great for getting something roughly right that you can tweak to work the way you want.

Epzillon@lemmy.ml · 2 years ago

I worked for a year developing in Magento 2 (an open source e-commerce suite which was later bought up by Adobe, it is not well maintained and it just all around not nice to work with). I tried to ask some Magento 2 questions to ChatGPT to figure out some solutions to my problems but clearly the only data it was trained with was a lot of really bad solutions from forum posts.

The solutions did kinda work some of the times but the way it was suggesting it was absolutely horrifying. We’re talking opening so many vulnerabilites, breaking many parts of the suite as a whole or just editing database tables. If you do not know enough about the tools you are working with implementing solutions from ChatGPT can be disasterous, even if they end up working.

muhyb@programming.dev · 2 years ago

Ask “are you sure?” and it will apologize right away.

jsomae@lemmy.ml · 2 years ago

Sure, but by randomly guessing code you’d get 0%. Getting 48% right is actually very impressive for an LLM compared to just a few years ago.

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 2 years ago

Exactly, I also find that it tends to do a pretty good job pointing you in the right direction. It’s way faster than googling or going through sites like stackoverflow because the answers are contextual. You can ask about a specific thing you want to do, and and an answer that gives you a general idea of what to do. For example, I’ve found it to be great for crafting complex sql queries. I don’t really care if the answer is perfect, as long as it gives me an idea of what I need to do.

InvaderDJ@lemmy.world · 2 years ago

You can also play with it to try and get closer to correct. I had problems with getting an Excel macro working and getting unattended-updates working on my pihole. GPT was wrong at first, but got me partly there and I could massage the question and Google and get closer to the right answer. Without it, I wouldn’t have been able to get any of it, especially with the macro.

sturlabragason@lemmy.world · 2 years ago

For someone doing a study on LLM they don’t seem to know much about LLMs.

They don’t even mention which model was used…

Here’s the study used for this clickbait garbage :

https://dl.acm.org/doi/pdf/10.1145/3613904.3642596

THCDenton@lemmy.world · 2 years ago

It was pretty good for a while! They lowered the power of it like immortan joe. Do not be come addicted to AI

Samueru@lemmy.ml · 2 years ago

I find it funny that thumbnail with a “fail” I’m actually surprised that it got 48% right.

finestnothing@lemmy.world · 2 years ago

I use chatgpt semi-often… For generating stuff in a repeating pattern. Any time I have used it to make code, I don’t save any time because I have to debug most of the generated code anyway. My main use case lately is making python dicts with empty keys (e.g. key1, key2… becomes “key1”: “”, “key2”: “”,…) or making a gold/prod level SQL view by passing in the backend names and frontend names (e.g. value_1, value_2… Value 1, Value 2,… Becomes value_1 as Value 1,…).

ramirezmike@programming.dev · 2 years ago

I know this is gonna sound annoying but I just use vim for stuff like this. Even notepad++ has a macro thing too, right? My coworkers keep saying how much of a productivity boost it is but all I see it do is mess up stuff like this that only takes a few seconds in vim to setup and I know it’ll be correct every time

finestnothing@lemmy.world · 2 years ago

I use vim keybinds (via doom emacs) for this sort of stuff if I’m doing it for personal projects, my professional work is all done in an online platform (no way around it) so it’s just faster and easier to throw the pattern and columns at the integrated chatgpt terminal rather than hop to a local editor and back

Evotech@lemmy.world · 2 years ago

Probably more than 52% of what programmers type is wrong too

habl@lemmy.world · 2 years ago

We mostly suck in emails.

paddirn@lemmy.world · 2 years ago

I wonder if the AI is using bad code pulled from threads where people are asking questions about why their code isn’t working, but ChatGPT can’t tell the difference and just assumes all code is good code.

Optional@lemmy.world · 2 years ago

AI Defenders! Assemble!

Veraxus@lemmy.world · edit-2 2 years ago

I’m surprised it scores that well.

Well, ok… that seems about right for languages like JavaScript or Python, but try it on languages with a reputation for being widely used to write terrible code, like Java or PHP (hence having been trained on terrible code), and it’s actively detrimental to even experienced developers.