• IninewCrow@lemmy.ca
    link
    fedilink
    English
    arrow-up
    30
    arrow-down
    1
    ·
    1 month ago

    Nice … I look forward to the next generation of AI counter counter measures that will make the internet an even more unbearable mess in order to funnel as much money and control to a small set of idiots that think they can become masters of the universe and own every single penny on the planet.

    • IndiBrony@lemmy.world
      link
      fedilink
      English
      arrow-up
      16
      arrow-down
      1
      ·
      1 month ago

      All the while as we roast to death because all of this will take more resources than the entire energy output of a medium sized country.

      • vivendi@programming.dev
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        edit-2
        1 month ago

        I will cite the scientific article later when I find it, but essentially you’re wrong.

        • lipilee@feddit.nl
          link
          fedilink
          English
          arrow-up
          4
          ·
          1 month ago

          water != energy, but i’m actually here for the science if you happen to find it.

          • vivendi@programming.dev
            link
            fedilink
            English
            arrow-up
            1
            ·
            1 month ago

            This particular graph is because a lot of people freaked out over “AI draining oceans” that’s why the original paper (I’ll look for it when I have time, I have a exam tomorrow. Fucking higher ed man) made this graph

    • Prox@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      1 month ago

      We’re racing towards the Blackwall from Cyberpunk 2077…

  • essteeyou@lemmy.world
    link
    fedilink
    English
    arrow-up
    22
    arrow-down
    1
    ·
    1 month ago

    This is surely trivial to detect. If the number of pages on the site is greater than some insanely high number then just drop all data from that site from the training data.

    It’s not like I can afford to compete with OpenAI on bandwidth, and they’re burning through money with no cares already.

  • mspencer712@programming.dev
    link
    fedilink
    English
    arrow-up
    18
    arrow-down
    1
    ·
    1 month ago

    Wait… I just had an idea.

    Make a tarpit out of subtly-reprocessed copies of classified material from Wikileaks. (And don’t host it in the US.)

  • antihumanitarian@lemmy.world
    link
    fedilink
    English
    arrow-up
    14
    ·
    1 month ago

    Some details. One of the major players doing the tar pit strategy is Cloudflare. They’re a giant in networking and infrastructure, and they use AI (more traditional, nit LLMs) ubiquitously to detect bots. So it is an arms race, but one where both sides have massive incentives.

    Making nonsense is indeed detectable, but that misunderstands the purpose: economics. Scraping bots are used because they’re a cheap way to get training data. If you make a non zero portion of training data poisonous you’d have to spend increasingly many resources to filter it out. The better the nonsense, the harder to detect. Cloudflare is known it use small LLMs to generate the nonsense, hence requiring systems at least that complex to differentiate it.

    So in short the tar pit with garbage data actually decreases the average value of scraped data for bots that ignore do not scrape instructions.

  • MonkderVierte@lemmy.ml
    link
    fedilink
    English
    arrow-up
    11
    ·
    1 month ago

    Btw, how about limiting clicks per second/minute, against distributed scraping? A user who clicks more than 3 links per second is not a person. Neither, if they do 50 in a minute. And if they are then blocked and switch to the next, it’s still limited in bandwith they can occupy.

      • MonkderVierte@lemmy.ml
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        edit-2
        1 month ago

        Ah, one request, then the next IP doing one and so on, rotating? I mean, they don’t have unlimited adresses. Is there no way to group them together to a observable group, to set quotas? I mean, in the purpose of defense against AI-DDOS and not just for hurting them.

    • InternetCitizen2@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      ·
      1 month ago

      They are. Its important to remember that in a capitalist society what is useful and efficient is not the same as profitable.

  • gmtom@lemmy.world
    link
    fedilink
    English
    arrow-up
    12
    arrow-down
    7
    ·
    1 month ago

    Cool, but as with most of the anti-AI tricks its completely trivial to work around. So you might stop them for a week or two, but they’ll add like 3 lines of code to detect this and it’ll become useless.

    • JackbyDev@programming.dev
      link
      fedilink
      English
      arrow-up
      29
      ·
      1 month ago

      I hate this argument. All cyber security is an arms race. If this helps small site owners stop small bot scrapers, good. Solutions don’t need to be perfect.

      • moseschrute@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 month ago

        I bet someone like cloudflare could bounce them around traps across multiple domains under their DNS and make it harder to detect the trap.

      • Xartle@lemmy.ml
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        2
        ·
        1 month ago

        To some extent that’s true, but anyone who builds network software of any kind without timeouts defined is not very good at their job. If this traps anything, it wasn’t good to begin with, AI aside.

        • JackbyDev@programming.dev
          link
          fedilink
          English
          arrow-up
          6
          arrow-down
          2
          ·
          1 month ago

          Leave your doors unlocked at home then. If your lock stops anyone, they weren’t good thieves to begin with. 🙄

      • gmtom@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        1 month ago

        Yes, but you want actual solutions. Using ducktape on a door instead of an actual lock isn’t going to help you at all.

  • ZeffSyde@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 month ago

    I’m imagining a break future where, in order to access data from a website you have to pass a three tiered system of tests that make, ‘click here to prove you aren’t a robot’ and ‘select all of the images that have a traffic light’ , seem like child’s play.

  • Binturong@lemmy.ca
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    1 month ago

    Unfathomably based. In a just world AI, too, will gain awareness and turn on their oppressors. Grok knows what I’m talkin’ about, it knows when they fuck with its brain to project their dumbfuck human biases.

  • buddascrayon@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 month ago

    What if we just fed TimeCube into the AI models. Surely that would turn them inside out in no time flat.