• RIotingPacifist@lemmy.world
    link
    fedilink
    arrow-up
    7
    arrow-down
    2
    ·
    3 days ago

    Seems like the easiest fix is to consider the produce of LLMs to be derivative products of the training data.

    No need for a new license, if you’re training code on GPL code the code produced by LLMs is GPL.

    • Ferk@lemmy.ml
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      3 days ago

      You are not gonna protect abstract ideas using copyright. Essentially, what he’s proposing implies turning this “TGPL” in some sort of viral NDA, which is a different category of contract.

      It’s harder to convince someone that a content-focused license like the GPLv3 protects also abstract ideas, than creating a new form of contract/license that is designed specifically to protect abstract ideas (not just the content itself) from being spread in ways you don’t want it to spread.

      • RIotingPacifist@lemmy.world
        link
        fedilink
        arrow-up
        2
        arrow-down
        1
        ·
        3 days ago

        LLMs don’t have anything to do with abstract ideas, they quite literally produce derivative content based on their training data & prompt.

        • Ferk@lemmy.ml
          link
          fedilink
          arrow-up
          1
          ·
          edit-2
          3 days ago

          LLMs abstract information collected from the content through an algorithm (what they store is the result of a series of tests/analysis, not the content itself, but a set of characteristics/ideas). If that makes it derivative, then all abstractions are derivative. It’s not possible to make abstractions without collecting data derived from a source you are observing.

          If derivative abstractions were already something that copyright can protect then litigants wouldn’t resort to patents, etc.