I'm tired of LLM bullshitting. So I fixed it.

SuspciousCarrot78@lemmy.world · edit-2 2 months ago

I'm tired of LLM bullshitting. So I fixed it.

bilouba@jlai.lu · 5 months ago

Very impressive! Do you have benchmark to test the reliability? A paper would be awesome to contribute to the science.

SuspciousCarrot78@lemmy.world · edit-2 2 months ago

[deleted by user]

bilouba@jlai.lu · 5 months ago

I understand, no idea on how to do it. I heard about SWE‑Bench‑Lite that seems to focus on real-world usage. Maybe try to contact “AI Explained” on YT, he’s the best IMO. Your solution might be novel or not but he might help you figuring that. If it is indeed novel, it might be worth it to share it with the larger community. Of course, I totally get that you might not want to do any of that. Thank you for your work!

I'm tired of LLM bullshitting. So I fixed it.

I'm tired of LLM bullshitting. So I fixed it.

llama-conductor