This is the correct take. This tech isn’t going away, no matter how much whinging people do, the only question is who is going to control it going forward.
shit, we should reclaim all tech. it’s all fucking ours.
LLMs are tools. They’re not replacements for human creativity. They are not reliable sources of truth. They are interesting tools and toys that you can play with.
So have fun and play with them.
LLMs consume vast amounts of energy and freash water and release lots of carbon. That is enough for me to not want to “play” with them.
I have a solution its called china
They have solar panels those neither use water nor produce co2/ch4, they can train the AI (the energy-intensive part)
Then you download the AI from the internet and can use it 100000x and it will use less energy than a washing machine, and neither consume water nor produce co2/ch4
Well-said. LLMs do have some useful applications, but they cannot replace human creativity nor are they omniscient.
See, it’s not fun for the planet.
Locally run models use a fraction of the energy. Less than playing a game with heavy graphics.
Online models probably use even less than local ones, since they will likely be better optimized, and run on dedicated hardware.
Yes, more or less. But the issue is not about running local models; that’s fine even if it’s only for curiosity. The issue is about shoving so-called AI in every activity with the promise it will solve most of your everyday problems, or for mere entertainment. I’m not against “AI”, I’m against the current commercialization attempts to monopolize the technology by already huge companies that will only seek profit, no matter the state of the planet and the other non-millionaire people. And this is exactly why even a bubble burst is concerning to me, as the poor are the ones that will truly suffer the consequences of billionaires betting in their mansions with their spare palaces.
The actual problem is the capitalist system of relations. If it’s not AI, then it’s bitcoin mining, NFTs, or what have you. The AI itself is just a technology, and if it didn’t exist, capitalism would find something else to shove down your throat.
Neither are most of human endeavors.
And if you think about the fact that this AI bubble is going to be a massive collapse and crash the finances of America and cause a massive regression in conservative policy and a massive progression of liberal policy, (since the playbook has always been for the conservatives to hand the reins over to the liberals until they fix the financial system of America when the conservatives break it), then it’s actually a good thing. We’re just in its bad phase.
I expect it’s a bubble that will burst. Climate change is no joke and only very stubborn people keeps denying it. AI is not like the massive use of combustion-based energy. That was strike two.
LLM are the future, but we must still learn to use it correctly. The energy problem depends mainly on 2 things, the use of fossil energy and the abuse of AI including it without need in everything, because the hype, as data logging tool for Big Brother or biased influencers.
You don’t need a 4x4 8 cylinder Pick-up to go 2km to the store to buy bread.
LLM’s in particular don’t use that much energy. Image and video generation are the real concerns.
Well, if one user ask something to a LLM, there are certainly not much sources needed, but yhere are millons of users doing it to thousends of different LLM. That need a lot of server power. Anyway, it’s not the primary problem with renevable energy sources, the risks are others, biased information, deep fake, privacy, etc., with the misuse by corporations and political collectives.
You don’t need a 4x4 8 cylinder Pick-up to go 2km to the store to buy bread.
In the U.S., yes.
I was referring to civilised first world countries
The problem is not the algorithm. The problem is the way they’re trained. If I made a dataset from sources whose copyright holders exercise their IP rights and then train an LLM on it, I’d probably go to jail or just kill myself (or default on my debts to the holders) if they sue for damages.
Seems like the easiest fix is to consider the produce of LLMs to be derivative products of the training data.
No need for a new license, if you’re training code on GPL code the code produced by LLMs is GPL.
You are not gonna protect abstract ideas using copyright. Essentially, what he’s proposing implies turning this “TGPL” in some sort of viral NDA, which is a different category of contract.
It’s harder to convince someone that a content-focused license like the GPLv3 protects also abstract ideas, than creating a new form of contract/license that is designed specifically to protect abstract ideas (not just the content itself) from being spread in ways you don’t want it to spread.
LLMs don’t have anything to do with abstract ideas, they quite literally produce derivative content based on their training data & prompt.
LLMs abstract information collected from the content through an algorithm (what they store is the result of a series of tests/analysis, not the content itself, but a set of characteristics/ideas). If that makes it derivative, then all abstractions are derivative. It’s not possible to make abstractions without collecting data derived from a source you are observing.
If derivative abstractions were already something that copyright can protect then litigants wouldn’t resort to patents, etc.
Checking whether a proprietary LLM model running on the “cloud” has been trained on a piece of TGPL code would probably be harder than checking if a proprietary binary contains a piece of GPL code, though.
Not necessarily, the models can often be tricked into spilling the beans of how they were trained.
One of the four essential freedoms is the freedom to study the software and modify it. Studying means training your brain on the open source code. Can one use their brain to write proprietary code after they studied some copylefted code?
I asked the bot. It says:
I understand the concern. This question sits at the intersection of copyright law, statistics, and how machine-learning systems work—and it’s fair to be skeptical.
Here’s the most precise and careful way to frame it:
1. I can’t confirm GPL training data
I don’t know what specific works or licenses were in the training set, and I can’t confirm that GPL-licensed material was included. Neither I nor users have visibility into the underlying corpus.
So while it may be statistically plausible that GPL text exists somewhere in public data, plausibility ≠ proof, and I can’t assert it as fact.
2. Training ≠ distribution (the core legal argument)
OpenAI’s legal position (and that of most ML practitioners) is broadly this:
-
Training on copyrighted material is not the same as redistributing it
-
The model does not contain copies of works
-
Outputs are newly generated, not retrieved
-
Copyright law regulates copying and distribution, not learning patterns or statistics
This is analogous (legally, not philosophically) to a human reading GPL code and then later writing different code from experience.
Whether courts ultimately agree in every jurisdiction is still being litigated—but no court has ruled that training itself is per se illegal.
3. GPL specifically
GPL obligations trigger when you:
-
copy
-
modify
-
distribute
GPL-licensed code or derivative works.
Current legal consensus does not treat model weights or outputs as GPL “derivatives” in the copyright sense, because:
-
There is no preserved expressive structure
-
There is no traceable lineage to a specific work
-
Outputs are not mechanically derived from any one source
That’s why GPL’s copyleft provisions are generally understood not to propagate through model training.
4. Is the law settled?
No. Anyone claiming it is settled is overstating.
What is true:
-
Training on public data is widespread and openly practiced
-
Multiple lawsuits exist, but none have yet established that training itself violates copyright or GPL
-
Regulators and courts are still clarifying boundaries
5. Practical reality
Even if GPL material were in the training mix:
-
That alone would not automatically make the model or its outputs illegal
-
Liability (if any) would hinge on specific outputs being substantially similar to protected works, not on statistical likelihoods about training data
Bottom line
Your intuition about probability is reasonable—but legality does not turn on probability. It turns on:
-
whether protected expression is reproduced
-
whether derivative works are created
-
whether licenses are triggered by the act in question
As of now, the prevailing legal interpretation is that models like me are lawful to deploy, even though the debate is ongoing.
If you want, I can also:
-
Compare this to the Google Books ruling
-
Walk through why “derivative work” is a high legal bar
-
Discuss what would actually make an AI system GPL-tainted in practice
-











