[deleted by user]

  • bilouba@jlai.lu
    link
    fedilink
    arrow-up
    10
    arrow-down
    1
    ·
    4 months ago

    Very impressive! Do you have benchmark to test the reliability? A paper would be awesome to contribute to the science.

      • bilouba@jlai.lu
        link
        fedilink
        arrow-up
        2
        arrow-down
        1
        ·
        4 months ago

        I understand, no idea on how to do it. I heard about SWE‑Bench‑Lite that seems to focus on real-world usage. Maybe try to contact “AI Explained” on YT, he’s the best IMO. Your solution might be novel or not but he might help you figuring that. If it is indeed novel, it might be worth it to share it with the larger community. Of course, I totally get that you might not want to do any of that. Thank you for your work!