Black Mirror AI

fossilesque@mander.xyz · 5 months ago

Black Mirror AI

IninewCrow@lemmy.ca · 5 months ago

Nice … I look forward to the next generation of AI counter counter measures that will make the internet an even more unbearable mess in order to funnel as much money and control to a small set of idiots that think they can become masters of the universe and own every single penny on the planet.

IndiBrony@lemmy.world · 5 months ago

All the while as we roast to death because all of this will take more resources than the entire energy output of a medium sized country.

vivendi@programming.dev · edit-2 5 months ago

I will cite the scientific article later when I find it, but essentially you’re wrong.

lipilee@feddit.nl · 5 months ago

water != energy, but i’m actually here for the science if you happen to find it.

vivendi@programming.dev · 5 months ago

This particular graph is because a lot of people freaked out over “AI draining oceans” that’s why the original paper (I’ll look for it when I have time, I have a exam tomorrow. Fucking higher ed man) made this graph

Prox@lemmy.world · 5 months ago

We’re racing towards the Blackwall from Cyberpunk 2077…

essteeyou@lemmy.world · 5 months ago

This is surely trivial to detect. If the number of pages on the site is greater than some insanely high number then just drop all data from that site from the training data.

It’s not like I can afford to compete with OpenAI on bandwidth, and they’re burning through money with no cares already.

Zerush@lemmy.ml · edit-2 5 months ago

Nice one, but Cloudflare do it too.

https://blog.cloudflare.com/ai-labyrinth/

mspencer712@programming.dev · 5 months ago

Wait… I just had an idea.

Make a tarpit out of subtly-reprocessed copies of classified material from Wikileaks. (And don’t host it in the US.)

antihumanitarian@lemmy.world · 5 months ago

Some details. One of the major players doing the tar pit strategy is Cloudflare. They’re a giant in networking and infrastructure, and they use AI (more traditional, nit LLMs) ubiquitously to detect bots. So it is an arms race, but one where both sides have massive incentives.

Making nonsense is indeed detectable, but that misunderstands the purpose: economics. Scraping bots are used because they’re a cheap way to get training data. If you make a non zero portion of training data poisonous you’d have to spend increasingly many resources to filter it out. The better the nonsense, the harder to detect. Cloudflare is known it use small LLMs to generate the nonsense, hence requiring systems at least that complex to differentiate it.

So in short the tar pit with garbage data actually decreases the average value of scraped data for bots that ignore do not scrape instructions.

fossilesque@mander.xyz · 5 months ago

The fact the internet runs on lava lamps makes me so happy.

MonkderVierte@lemmy.ml · 5 months ago

Btw, how about limiting clicks per second/minute, against distributed scraping? A user who clicks more than 3 links per second is not a person. Neither, if they do 50 in a minute. And if they are then blocked and switch to the next, it’s still limited in bandwith they can occupy.

Jade@programming.dev · 5 months ago

They make one request per IP. Rate limit per IP does nothing.

MonkderVierte@lemmy.ml · edit-2 5 months ago

Ah, one request, then the next IP doing one and so on, rotating? I mean, they don’t have unlimited adresses. Is there no way to group them together to a observable group, to set quotas? I mean, in the purpose of defense against AI-DDOS and not just for hurting them.

mlg@lemmy.world · 5 months ago

–recurse-depth=3 --max-hits=256

HugeNerd@lemmy.ca · 5 months ago

When I was a kid I thought computers would be useful.

InternetCitizen2@lemmy.world · 5 months ago

They are. Its important to remember that in a capitalist society what is useful and efficient is not the same as profitable.

gmtom@lemmy.world · 5 months ago

Cool, but as with most of the anti-AI tricks its completely trivial to work around. So you might stop them for a week or two, but they’ll add like 3 lines of code to detect this and it’ll become useless.

JackbyDev@programming.dev · 5 months ago

I hate this argument. All cyber security is an arms race. If this helps small site owners stop small bot scrapers, good. Solutions don’t need to be perfect.

moseschrute@lemmy.world · 5 months ago

I bet someone like cloudflare could bounce them around traps across multiple domains under their DNS and make it harder to detect the trap.

Xartle@lemmy.ml · 5 months ago

To some extent that’s true, but anyone who builds network software of any kind without timeouts defined is not very good at their job. If this traps anything, it wasn’t good to begin with, AI aside.

JackbyDev@programming.dev · 5 months ago

Leave your doors unlocked at home then. If your lock stops anyone, they weren’t good thieves to begin with. 🙄

gmtom@lemmy.world · 5 months ago

Yes, but you want actual solutions. Using ducktape on a door instead of an actual lock isn’t going to help you at all.

Iambus@lemmy.world · 5 months ago

Typical bluesky post

ZeffSyde@lemmy.world · 5 months ago

I’m imagining a break future where, in order to access data from a website you have to pass a three tiered system of tests that make, ‘click here to prove you aren’t a robot’ and ‘select all of the images that have a traffic light’ , seem like child’s play.

buddascrayon@lemmy.world · 5 months ago

What if we just fed TimeCube into the AI models. Surely that would turn them inside out in no time flat.

Binturong@lemmy.ca · 5 months ago

Unfathomably based. In a just world AI, too, will gain awareness and turn on their oppressors. Grok knows what I’m talkin’ about, it knows when they fuck with its brain to project their dumbfuck human biases.