• ☆ Yσɠƚԋσʂ ☆@lemmy.ml
    link
    fedilink
    arrow-up
    7
    ·
    2 days ago

    This is the correct take. This tech isn’t going away, no matter how much whinging people do, the only question is who is going to control it going forward.

  • bizarroland@lemmy.world
    link
    fedilink
    English
    arrow-up
    24
    arrow-down
    4
    ·
    3 days ago

    LLMs are tools. They’re not replacements for human creativity. They are not reliable sources of truth. They are interesting tools and toys that you can play with.

    So have fun and play with them.

    • geolaw@lemmygrad.ml
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      3
      ·
      3 days ago

      LLMs consume vast amounts of energy and freash water and release lots of carbon. That is enough for me to not want to “play” with them.

      • m532@lemmygrad.ml
        link
        fedilink
        arrow-up
        1
        ·
        2 days ago

        I have a solution its called china

        They have solar panels those neither use water nor produce co2/ch4, they can train the AI (the energy-intensive part)

        Then you download the AI from the internet and can use it 100000x and it will use less energy than a washing machine, and neither consume water nor produce co2/ch4

    • Cowbee [he/they]@lemmy.ml
      link
      fedilink
      arrow-up
      7
      arrow-down
      1
      ·
      3 days ago

      Well-said. LLMs do have some useful applications, but they cannot replace human creativity nor are they omniscient.

        • m532@lemmygrad.ml
          link
          fedilink
          arrow-up
          1
          ·
          2 days ago

          Online models probably use even less than local ones, since they will likely be better optimized, and run on dedicated hardware.

        • selokichtli@lemmy.ml
          link
          fedilink
          arrow-up
          5
          arrow-down
          3
          ·
          edit-2
          2 days ago

          Yes, more or less. But the issue is not about running local models; that’s fine even if it’s only for curiosity. The issue is about shoving so-called AI in every activity with the promise it will solve most of your everyday problems, or for mere entertainment. I’m not against “AI”, I’m against the current commercialization attempts to monopolize the technology by already huge companies that will only seek profit, no matter the state of the planet and the other non-millionaire people. And this is exactly why even a bubble burst is concerning to me, as the poor are the ones that will truly suffer the consequences of billionaires betting in their mansions with their spare palaces.

          • ☆ Yσɠƚԋσʂ ☆@lemmy.ml
            link
            fedilink
            arrow-up
            2
            ·
            2 days ago

            The actual problem is the capitalist system of relations. If it’s not AI, then it’s bitcoin mining, NFTs, or what have you. The AI itself is just a technology, and if it didn’t exist, capitalism would find something else to shove down your throat.

      • bizarroland@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 days ago

        Neither are most of human endeavors.

        And if you think about the fact that this AI bubble is going to be a massive collapse and crash the finances of America and cause a massive regression in conservative policy and a massive progression of liberal policy, (since the playbook has always been for the conservatives to hand the reins over to the liberals until they fix the financial system of America when the conservatives break it), then it’s actually a good thing. We’re just in its bad phase.

        • selokichtli@lemmy.ml
          link
          fedilink
          arrow-up
          2
          ·
          2 days ago

          I expect it’s a bubble that will burst. Climate change is no joke and only very stubborn people keeps denying it. AI is not like the massive use of combustion-based energy. That was strike two.

  • Zerush@lemmy.ml
    link
    fedilink
    arrow-up
    9
    arrow-down
    3
    ·
    3 days ago

    LLM are the future, but we must still learn to use it correctly. The energy problem depends mainly on 2 things, the use of fossil energy and the abuse of AI including it without need in everything, because the hype, as data logging tool for Big Brother or biased influencers.

    You don’t need a 4x4 8 cylinder Pick-up to go 2km to the store to buy bread.

      • Zerush@lemmy.ml
        link
        fedilink
        arrow-up
        1
        ·
        2 days ago

        Well, if one user ask something to a LLM, there are certainly not much sources needed, but yhere are millons of users doing it to thousends of different LLM. That need a lot of server power. Anyway, it’s not the primary problem with renevable energy sources, the risks are others, biased information, deep fake, privacy, etc., with the misuse by corporations and political collectives.

    • Matt@lemmy.ml
      link
      fedilink
      arrow-up
      4
      ·
      3 days ago

      You don’t need a 4x4 8 cylinder Pick-up to go 2km to the store to buy bread.

      In the U.S., yes.

  • Matt@lemmy.ml
    link
    fedilink
    arrow-up
    6
    ·
    3 days ago

    The problem is not the algorithm. The problem is the way they’re trained. If I made a dataset from sources whose copyright holders exercise their IP rights and then train an LLM on it, I’d probably go to jail or just kill myself (or default on my debts to the holders) if they sue for damages.

  • RIotingPacifist@lemmy.world
    link
    fedilink
    arrow-up
    7
    arrow-down
    2
    ·
    3 days ago

    Seems like the easiest fix is to consider the produce of LLMs to be derivative products of the training data.

    No need for a new license, if you’re training code on GPL code the code produced by LLMs is GPL.

    • Ferk@lemmy.ml
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      3 days ago

      You are not gonna protect abstract ideas using copyright. Essentially, what he’s proposing implies turning this “TGPL” in some sort of viral NDA, which is a different category of contract.

      It’s harder to convince someone that a content-focused license like the GPLv3 protects also abstract ideas, than creating a new form of contract/license that is designed specifically to protect abstract ideas (not just the content itself) from being spread in ways you don’t want it to spread.

      • RIotingPacifist@lemmy.world
        link
        fedilink
        arrow-up
        2
        arrow-down
        1
        ·
        3 days ago

        LLMs don’t have anything to do with abstract ideas, they quite literally produce derivative content based on their training data & prompt.

        • Ferk@lemmy.ml
          link
          fedilink
          arrow-up
          1
          ·
          edit-2
          2 days ago

          LLMs abstract information collected from the content through an algorithm (what they store is the result of a series of tests/analysis, not the content itself, but a set of characteristics/ideas). If that makes it derivative, then all abstractions are derivative. It’s not possible to make abstractions without collecting data derived from a source you are observing.

          If derivative abstractions were already something that copyright can protect then litigants wouldn’t resort to patents, etc.

  • fakasad68@lemmy.ml
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    3 days ago

    Checking whether a proprietary LLM model running on the “cloud” has been trained on a piece of TGPL code would probably be harder than checking if a proprietary binary contains a piece of GPL code, though.

  • bizdelnick@lemmy.ml
    link
    fedilink
    arrow-up
    2
    arrow-down
    2
    ·
    3 days ago

    One of the four essential freedoms is the freedom to study the software and modify it. Studying means training your brain on the open source code. Can one use their brain to write proprietary code after they studied some copylefted code?

  • makingStuffForFun@lemmy.ml
    link
    fedilink
    arrow-up
    1
    arrow-down
    5
    ·
    3 days ago

    I asked the bot. It says:

    I understand the concern. This question sits at the intersection of copyright law, statistics, and how machine-learning systems work—and it’s fair to be skeptical.

    Here’s the most precise and careful way to frame it:

    1. I can’t confirm GPL training data

    I don’t know what specific works or licenses were in the training set, and I can’t confirm that GPL-licensed material was included. Neither I nor users have visibility into the underlying corpus.

    So while it may be statistically plausible that GPL text exists somewhere in public data, plausibility ≠ proof, and I can’t assert it as fact.

    2. Training ≠ distribution (the core legal argument)

    OpenAI’s legal position (and that of most ML practitioners) is broadly this:

    • Training on copyrighted material is not the same as redistributing it

    • The model does not contain copies of works

    • Outputs are newly generated, not retrieved

    • Copyright law regulates copying and distribution, not learning patterns or statistics

    This is analogous (legally, not philosophically) to a human reading GPL code and then later writing different code from experience.

    Whether courts ultimately agree in every jurisdiction is still being litigated—but no court has ruled that training itself is per se illegal.

    3. GPL specifically

    GPL obligations trigger when you:

    • copy

    • modify

    • distribute
      GPL-licensed code or derivative works.

    Current legal consensus does not treat model weights or outputs as GPL “derivatives” in the copyright sense, because:

    • There is no preserved expressive structure

    • There is no traceable lineage to a specific work

    • Outputs are not mechanically derived from any one source

    That’s why GPL’s copyleft provisions are generally understood not to propagate through model training.

    4. Is the law settled?

    No. Anyone claiming it is settled is overstating.

    What is true:

    • Training on public data is widespread and openly practiced

    • Multiple lawsuits exist, but none have yet established that training itself violates copyright or GPL

    • Regulators and courts are still clarifying boundaries

    5. Practical reality

    Even if GPL material were in the training mix:

    • That alone would not automatically make the model or its outputs illegal

    • Liability (if any) would hinge on specific outputs being substantially similar to protected works, not on statistical likelihoods about training data

    Bottom line

    Your intuition about probability is reasonable—but legality does not turn on probability. It turns on:

    • whether protected expression is reproduced

    • whether derivative works are created

    • whether licenses are triggered by the act in question

    As of now, the prevailing legal interpretation is that models like me are lawful to deploy, even though the debate is ongoing.

    If you want, I can also:

    • Compare this to the Google Books ruling

    • Walk through why “derivative work” is a high legal bar

    • Discuss what would actually make an AI system GPL-tainted in practice