How do we actually confront or evade "kirkification" and the flood of ai slop?

h333d@lemmy.world · 3 days ago

How do we actually confront or evade "kirkification" and the flood of ai slop?

SuspciousCarrot78@lemmy.world · edit-2 24 hours ago

I’ll try explaining using an analogy (though I can go nerd mode if that’s better? Let me know; I’m assuming an intelligent lay audience for this but if you want nerd-core, my body is ready lol).

PS: Sorry if scattered - am dictating using my phone (on holiday / laptop broke).

Hallucinations get minimized the same way a teacher might minimise a student from confidently bullshitting on their book reports: you control context (what they’re allowed to talk about), when they’re allowed to improvise, and you make them show their work when it matters by doing a class presentation.

Broadly speaking, that involves using RAG and GAG (of your own documents) as “ground truth”, setting temperature low (so LLM has no flights of fancy) and adding verifier passes / critic assessment by second model.

Additionally, a lot of hallucinations come from the model half-remembering something that isn’t in front of it and then “improvising”.

To minimise that, I coded a little python tool that forces the llm to store facts verbatim (triggered by using !!) into a JSON (text) file, so that when you ask it something it recalls it exactly as a sort of rolling memory. The basis of that is from something I made earlier for OWUI

https://openwebui.com/posts/total_recall_4a918b04

So what I have in place is this -

I use / orchestrate a couple of different models, each one tuned for a specific behaviour. They work together to produce an answer.

My python router then invokes the correct model for the task at hand based on simple rules (is the question over 300 words? Does it have images? Does it involve facts and figures or is it brain storming/venting/shooting the shit?)

The models I use are

Qwen 3-4B 2507 Instruct (usual main brain)
Phi-4-mini (critic)
Nanbeige 3B (2nd main brain when invoked / shit shooter)
You-tu LLM (coding stuff)
Qwen3-VL-4b (visual processing)
Qwen3-8b (document summariser)
Qwen3-1.7b (court jester that when invoked rewrites “main brain” output with contextually appropriate Futurama, Simpsons, Firefly etc quotes. With blackjack. And hookers!).

To give a workflow example - you ask a question.

The python router decides where it needs to go to. Let’s suppose its a technical look up / thinking about something in my documents.

The “main brain” generates an answer using whatever grounded stuff you’ve given it access to (in Qdrant database and JSON text file). If no stored info, it notes that explicitly and proceeds to next step (I always want to know where it’s pulling it’s into from, so I make it cite its references).

That draft gets handed to a separate “critic” whose entire job is to poke holes in it. (I use very specific system prompt for both models so they stay on track).

Then the main brain comes back for a final pass where it fixes the mistakes, reconciles the critique, and gives you the cleaned‑up answer.

It’s also allowed to say “I’m not sure; I need XYZ for extra context. Please provide”.

It’s basically: propose → attack → improve.

Additionally, I use a deterministic memory system (basically just a python script that writes to a JSON / text file that the LLM writes exactly into and then retrives exactly out from), without editorialising facts of a conversation in progress.

Facts stored get recalled exactly without llm massage or rewrite.

Urgh, I hope that came out OK. I’ve never had to verbally rubber-duck (explain) it to my phone before :)

TL;DR

Hallucinations minimised by -

Careful fact scraping and curation (using Qdrant database, markdown text summaries and rolling JSON plain text facts file)
Python router that decides which LLM (or more accurately, SLM, given I only have 8GB VRAM) answers what, based on simple rules (eg: coding questions go to coder, science questions go to science etc)
Keeping important facts outside of the LLM, that it needs to reference directly (RAG, GAG, JSON rolling summary).
Setting model temperatures so that responses are as deterministic as possible (no flowery language or fancy reinterpretations; just the facts, ma’am).
Letting the model say “I don’t know, based on context. Here’s my best guess. Give me XYZ if you want better answer”.

Basic flow:

ask question --> router calls model/s --> “main brain” polls stored info, thinks and writes draft --> get criticized by separate “critic” --> “main brain” gets critic output, responds to that, and produces final version.

That reduces “sounds right” answers that are actually wrong. All the seams are exposed for inspection.

pineapple@lemmy.ml · 15 hours ago

Thats awesome! I was going to add some sort of AI to my proxmox homelab for researching but I figured the risk of halloucination was too high, and I thought that the only way to fix this was getting a bigger model. But thid seams like a really good setup (if I can actually figure out how to implement it.) And I wont need to upgrade my gpu!

Althogh I only have one ai suitable gpu (I have a gtx 1660 6gb in my homelab which is really only suitable for movie transcoding.) I have a 3060 12gb that I use in my gaming pc I was thinking I could setup some kind of wol system that boots the pc and sets up the ai software on that. Maybe my homelab hosts openwebui and when I send a queory it prompts my gaming pc to wake up and do the ai crunching.

SuspciousCarrot78@lemmy.world · edit-2 8 hours ago

Well, technically, you don’t need any GPU for the system I’ve set up, because only 2-3 models are “hot” in memory (so about…10GB?) and the rest are cold / invoked as needed. My own GPU is only 8GB (and my prior one was 4GB!). I designed this with low end rigs in mind.

The minimum requirement is probably a CPU equal to or better than mine (i7-8700; not hard to match), 8-10GB RAM and maybe 20GB disk space. Bottom of the barrel would be 4gb but you’ll have to deal with ssd thrashing.

Anything above that is a bonus / tps multiplier.

FYI; CPU only (my CPU at least) + 32gb system RAM, this entire thing runs at about 10-11 tps, which is interactive enough speed / faster than reading speed. Any decent gpu should get you 3-10x that. I designed this for peasant level hardware / to punch GPTs in the dick thru clever engineering, not sheer grunt. Fuck OpenAi. Fuck Nvidia. Fuck DDR6. Spite + ASD > “you can’t do that” :). Yes I fucking can - watch me.

If you want my design philosophy, here is one of my (now shadowbanned) posts from r/lowendgaming. Seeing you’re a gamer, this might make sense to you! The MoA design I have is pure “level 8 spite, zip tie Noctura fan to server grade GPU and stick it in a 1L shoebox” YOLOing :).

It works, but it’s ugly, in a beautiful way.

Lowend gaming iceberg

Level 1

Drop resolution to 720p
Turn off AA, AF, Shadows etc
Vsync OFF
Windowed mode? OK.
Pray for decent FPS

Level 2

Use Nvidia/Intel/AMD control panel for custom tweaks
Create custom low end resolutions (540p, 480p) so GPU can enumerate them to games
Pray for decent FPS

Level 3

Start tweaking .cfg and .ini files like you’re a caveman from the ancient year of 1998
FPS capping? Sure.
FOV size of a keyhole? Do it
Texture filtering hacks / replacements? Rock on.
Pray for decent FPS

Level 4

Time to get serious. Crack open the box - repaste, clean, try to add more ram from anything that even remotely fits. We can hack the timings to match, no problem!
BIOS tweaking time! Let’s see what breaks! Oh…everything.
May as well undervolt and over clock, seeing we’re in here already. Where’s my paperclip…
EDID hacks to make TV / monitor do dumb shit, like run at resolutions it shouldn’t or Hz it pretends it can’t? Why not.
Pray for decent FPS

Level 5

Software time again! Lossless scaling? Sure!
Reshade post processing to sharpen ultra low mush? Ok.!
Integer scaling? Scanlines? Why not
Special K swap chain injection to force low res where no low res exists? Right on.
DXVK? Yolo.
Pray for decent FPS

Level 6

Fuck it; time for real black magic
Hack registry keys in windows settings.
Hex edit settings directly
Make windows believe impossible things, like imaginary VRAM.
Sacrifice boxed copy of Win98 to Linus Torvalds for absolution.
Pray for decent FPS

Level 7

Fine…I’ll do it myself then.
Strip out the game assets and rewrite shaders
No fancy lighting, kill the fill rate, post processing gone.
At this point, you may as well just recode the fucking game from scratch.
Pray for decent FPS

Level 8

Purely driven by spite now.
Franken-mod a $15 eGPU and run it via Pcie adaptor. Flash the vBIOS to do unnatural things.
Everything is overheating. Drill holes in case to improve airflow.
Still too hot; drag in desk fan. Point directly at case. Your PC now sounds like Darth Vader. Neat.
Decompile the games DLLs just to prove you can. Sneer at them.
No longer praying for FPS; now praying for no magic blue smoke.

Level 9

Buy an Xbox.