AI Megathread

  • 🇵🇦 Nuestro primer dominio localizado está en español en kiwifarms.pa. Our first localized domain is on Spanish on kiwifarms.pa.
  • Want to keep track of this thread?
    Accounts can bookmark posts, watch threads for updates, and jump back to where you stopped reading.
    Create account
The typical criticism of AI is that over-reliance on AI can lead to users losing the ability to exercise creativity and individual thought*, as well as sometimes just making shit up that isn't real.

How much of an issue is this when it comes to using AI for coding?

*I don't discount that those with true creativity can use AI to augment their autistic pursuits and not shit out complete slop, but many people who use AI are midwits and think AI is magic and treat it like magic, offloading any effort dedicated to thinking onto AI.
 
How much of an issue is this when it comes to using AI for coding?
It's an open question with lots of debate on both sides. As a guy who had an affinity for learning tricky programming languages, I think capable programmers who understand the fundamentals of computing will be as vital as ever, and their skills will remain intact because they'll need them even if they're just giving orders to an LLM. The pajeets and other mediocrities who used to have to stitch together StackOverflow snippets are going to become even dumber.

If the "AGI soon" people turn out to be wrong, things are gonna be fucked. The talent pipeline by which a guy becomes experienced enough to lead a team or architect a major system was strained by the H1-B visa, but AI is going to kill it outright.
 
The typical criticism of AI is that over-reliance on AI can lead to users losing the ability to exercise creativity and individual thought*, as well as sometimes just making shit up that isn't real.

How much of an issue is this when it comes to using AI for coding?

*I don't discount that those with true creativity can use AI to augment their autistic pursuits and not shit out complete slop, but many people who use AI are midwits and think AI is magic and treat it like magic, offloading any effort dedicated to thinking onto AI.
AI makes me faster doing repetitive things like writing unit tests, or making DTOs. If you are using AI to save time you'll be fine. If you are letting AI plan out, and make architectural decisions about your project you are in for a bad time.

I use AI to do things I'm not good at, and don't have an affinity for. It's good for writing rough drafts of copy for websites, making promotional stock images for mockups and creating basic logos. I can also use it for helping me style a webpage, but it doesn't do it as efficiently as a webdev from a code standpoint.

Edit: The era of AI slop devs is coming to a close. Token costs are just too high unless you have a card with 24GB VRAM to run locally. I think it's going to look like AutoCAD for professionals in coding where a lot of money a seat a month is good for the added productivity.

I see another potential dystopian outcome where the best engineers jockey for the companies with the highest token budget. Of course that would lead to a funny outcome if inefficient engineers just started burning tokens to make slop.
 
Ostatnio edytowane:
I've been flipping between Qwen 3.6, Qwen 3 Coder Next, Gemma 4 and GPT-OSS-120b in writing the code for my LCD display for my retro gaming system. I need to do more formal testing, but other than 3.6 sometimes getting stuck in loops they've all seemed to figure out what I wanted and gotten it done. Admittedly it's bog standard C on a microcontroller.
I have been trying these for agentic tasks, and Apriel from service now did well and seems to be a scaled down variant of GPT-OSS 120B. Just be aware it uses a special format chat template and that sucks ass.
How much of an issue is this when it comes to using AI for coding?
AI is a force mutiplier (with a very small) +base. If you don't understand how code works, if you don't understand the design of architecture, or your constraints, then you're gonna fall a lot or its going be jank.
Oh wow another model that can't be ran on consumer hardware. I just don't understand what this company is doing.
They don't have to compete (too much) because lmao EU diktat. Plus gotta get their hands on the sweet tokenomics SAAS. Seeing this has actually made me double down on a plan to build a 2-4TB ram homeserver and a stack of cheap 3060 12GBs to run inference on.

Plus the real solution is hybrid mamba/transformer models like (ironically) Nvidia's Nemotron. But that'd take too much out of the "line go up" by making ram prices crash (assuming the Chinese ram doesn't).
Pure transformers should be relegated to where precision and accuracy actually matter, and hybrids can be used for the every day "AI boyfriend/girlfriend make me horny", "what are the oranges for" chats.
 
They don't have to compete (too much) because lmao EU diktat. Plus gotta get their hands on the sweet tokenomics SAAS. Seeing this has actually made me double down on a plan to build a 2-4TB ram homeserver and a stack of cheap 3060 12GBs to run inference on.
That's a pretty bad idea. LLM performance against multiple cards is bottlenecked by memory bandwidth and PCIe transfer speeds.

3060s have a narrow 192bit bus with GDDR6 memory. It's so bad it barely beats out a p40 running GDDR5 by less than 5% in memory bandwidth, and at that point you at least have 24GB VRAM even if you have to ghetto rig Cuda 11.

You need 32GB of VRAM to run Qwen code at 30B with 4Bits with a decent context. That's 3 RTX 3060s. If you want to run comms between them you are going to want 3 pcie slots running x8 CPU lanes each. x4 might work, but I don't recommend it. At that point you're looking at a dual CPU system since most single CPU motherboards don't have 4 PCIe slots that support bifurcation down to x4. That's not even getting into the power/cooling requirements which will screw over most rack servers from dell/HP. Maybe you run it on that weird dell tower you can cram 3 2 slot blower cards into, but it's gonna get real hot and loud real fast.

My advice is to either toss two 3090s or a 3090/(3060 or 3080 12GB) combo in on the low end or any combination of two 4090/4080/4070 supers on the high end. If you're on crack a 5090/5080/5070 TI duo would work, but at that point you're better off just cobsidering a rtx 6000 pro.
 
Ostatnio edytowane:
Seeing this has actually made me double down on a plan to build a 2-4TB ram homeserver and a stack of cheap 3060 12GBs to run inference on.
I looked at getting 2-4 cheaper cards for this rather than a 5090 or something, and the performance / power usage, etc., doesn't add up, unfortunately.

Some people have bought older Tesla kits with 16GBs using an SXM2 adapter and used NVLink with a regular 4080/4090, but you are stuck with an older version of CUDA.

The only thing that I've seen that is reasonable is the AMD PRO AI 9700, but it doesn't have CUDA, and I can tell you from experience getting anything that relies on TensorFlow working is a PITA.
I can't find any definite performance numbers either, but it is a third of the price of a 5090.
 
The typical criticism of AI is that over-reliance on AI can lead to users losing the ability to exercise creativity and individual thought*, as well as sometimes just making shit up that isn't real.

How much of an issue is this when it comes to using AI for coding?
Same as the experienced vs beginner conundrum. Saw a tech jam where Claude was allowed, and junior progs would basically prompt and then sit on their asses while waiting for the output. Meanwhile a senior is standing behind them, pulling his hair out thinking "What the fuck are you doing? You're free to do other stuff, that's the point of LLMs!".

This is repeated throughout this thread at this point.
 
The typical criticism of AI is that over-reliance on AI can lead to users losing the ability to exercise creativity and individual thought*, as well as sometimes just making shit up that isn't real.

How much of an issue is this when it comes to using AI for coding?

*I don't discount that those with true creativity can use AI to augment their autistic pursuits and not shit out complete slop, but many people who use AI are midwits and think AI is magic and treat it like magic, offloading any effort dedicated to thinking onto AI.
Most of the studies about people deskilling because of AI seem to have an experiment setup where participants do the same thing with and without AI, but if you're using coding agents to do exactly the same stuff you were doing before coding agents, you're NGMI. You should be expanding your ambition and working on bigger projects.
Same as the experienced vs beginner conundrum. Saw a tech jam where Claude was allowed, and junior progs would basically prompt and then sit on their asses while waiting for the output. Meanwhile a senior is standing behind them, pulling his hair out thinking "What the fuck are you doing? You're free to do other stuff, that's the point of LLMs!".

This is repeated throughout this thread at this point.
Case in point. Always have at least three coding agents open at once, working on different projects/features in parallel.
 
Case in point. Always have at least three coding agents open at once, working on different projects/features in parallel.
I see this take a lot, but I never hear about what these super-users are actually doing. For me, at least, figuring out how to do what I want to do is the bulk of the work, and the implementation is a pretty straight shot after that, to the point where I wouldn't really use an LLM for it most of the time. It's a negligible share of the effort, and knowing exactly how my codebase works (and knowing the LLM didn't bugger up and hide a bug in there somewhere) is worth the time.

Is it an AutoResearch kind of deal, where you're an ML engineer, you write some placeholder training scripts for a few different component modules, and then you have the LLM iterate on them in a loop to optimize each module's metrics while you handle the higher-level stuff? I've tried that a few times.
 
3060s have a narrow 192bit bus with GDDR6 memory. It's so bad it barely beats out a p40 running GDDR5 by less than 5% in memory bandwidth, and at that point you at least have 24GB VRAM even if you have to ghetto rig Cuda 11.
To be fair, these are
1) 50 bucks a piece i bought when muh cyrpto gpus crashed
2) come in a stack of 10
Plus I plan on just leaving the large models in ram only mode. It's a used arm server board. RIP energy efficiency though
The only thing that I've seen that is reasonable is the AMD PRO AI 9700, but it doesn't have CUDA, and I can tell you from experience getting anything that relies on TensorFlow working is a PITA.
Rocm is such a pain I'd rather be forced to compile all of chromadb's dependencies from scratch, and with them requiring extremely stupid and specific python package versions. Lord forgive the retards who make python dependencies hell.
Is it an AutoResearch kind of deal, where you're an ML engineer, you write some placeholder training scripts for a few different component modules, and then you have the LLM iterate on them in a loop to optimize each module's metrics while you handle the higher-level stuff? I've tried that a few times.
I've seen a few folks do this for red teaming/debating idea merits. He gets several agents with different personalities setup to roundhouse table debate each other. I don't know how the fuck that works and if its useful, but hey, he's the millionaire, not me.
 
Ostatnio edytowane:
I looked at the DGX Spark and AMD 395+ and decided to just "Upgrade" my AI server to 4x Intel b70 cards. Eventually PCIe 5 x4 to each one. For now a random mix including 2 x8 and then 2 PCIe 4 x4. The next step after this would be either a server board and more cards or a PCIe Switch and more cards, both of which would be another $3k or so, before GPUs. One annoyance right now is that there's sort of a missing middle for 128GB capable systems. The new models are smaller models you can run full sized or massive models you'd have to run at maybe Q3, if you're lucky, with offloading.

So far getting stuff to run hasn't been much of a problem. ComfyUI runs fine, but I don't use any special nodes. LLMs on the other hand. Llama.cpp works fine but I'm sure there's some missing optimizations. VLLM LLM-Scaler from Intel has all the optimizations but is horribly out of date for the latest models. VLLM Mainline works ok, but probably slower. I haven't really benchmarked much yet, mostly just "Did it write the code before I switched back to the coding window?" If "yes" then "Fast Enough".
 
1) 50 bucks a piece i bought when muh cyrpto gpus crashed
2) come in a stack of 10
Plus I plan on just leaving the large models in ram only mode. It's a used arm server board. RIP energy efficiency though
Large models on ram will be Way too slow to interact with effectively.

If you really have a stack of 10 just say fuck it, and buy a mining rig, and some PSUs. Run em at x2. it'll be faster than RAM. Gonna eat shit on power though, and you will want as much ram as possible on it. Usually those only have one or two slots, so you're capped at 32/64GB.
 
Ostatnio edytowane:
I see this take a lot, but I never hear about what these super-users are actually doing. For me, at least, figuring out how to do what I want to do is the bulk of the work, and the implementation is a pretty straight shot after that, to the point where I wouldn't really use an LLM for it most of the time. It's a negligible share of the effort, and knowing exactly how my codebase works (and knowing the LLM didn't bugger up and hide a bug in there somewhere) is worth the time.

Is it an AutoResearch kind of deal, where you're an ML engineer, you write some placeholder training scripts for a few different component modules, and then you have the LLM iterate on them in a loop to optimize each module's metrics while you handle the higher-level stuff? I've tried that a few times.
I spend most of my time in Plan Mode on Claude Code. "I want X feature, it should work like this: [bullet points]," "investigate different options for implementing Y," "I'm encountering a bug where Z," "change the background to blue." I refine the plan through a back and forth with the model until I have something I'm happy for it to implement. If any part of the plan is unclear to me, I get it to explain that. If any part seems like the wrong approach, I suggest something else. Then while it's implementing, I move over to another agent and start planning something else. I don't generally try to one-shot complete projects, I just build them up feature by feature. Usually I'll have two agents planning and one agent implementing.

Sometimes if I don't know what the best approach for a given problem is, I'll have different agents implement different approaches in parallel on different branches and then compare the results. And every so often I have agents do passes over the whole codebase or areas I'm concerned about for bugs, dead code, redundant code, antipatterns, security issues, etc. If the code's written by Claude, I'll often do this with Codex.

None of this was possible before Opus 4.5. Last year I tried vibe-coding something in Cursor once and ended up with a total mess; this year it's going much better.

I have multiple decades of experience programming the old-fashioned way, so there's probably a lot of background knowledge informing how I prompt that I take for granted. For most of my projects I have a high-level idea of how to do what I want to and could do it manually if I had to, but before AI it either wouldn't have been worth the time investment or would have just taken way longer.
 
I looked at the DGX Spark and AMD 395+ and decided to just "Upgrade" my AI server to 4x Intel b70 cards. Eventually PCIe 5 x4 to each one. For now a random mix including 2 x8 and then 2 PCIe 4 x4. The next step after this would be either a server board and more cards or a PCIe Switch and more cards, both of which would be another $3k or so, before GPUs. One annoyance right now is that there's sort of a missing middle for 128GB capable systems. The new models are smaller models you can run full sized or massive models you'd have to run at maybe Q3, if you're lucky, with offloading.

So far getting stuff to run hasn't been much of a problem. ComfyUI runs fine, but I don't use any special nodes. LLMs on the other hand. Llama.cpp works fine but I'm sure there's some missing optimizations. VLLM LLM-Scaler from Intel has all the optimizations but is horribly out of date for the latest models. VLLM Mainline works ok, but probably slower. I haven't really benchmarked much yet, mostly just "Did it write the code before I switched back to the coding window?" If "yes" then "Fast Enough".
I'm curious what models you are trying to run. Obviously for diffusion models you need a big single card, but I'm wondering what you are shooting for with 128GB. The only one that come to mind is Deepseek. 96GB is more than enough to run a 70B with a massive context at 4Bits.

I also think people miss out on the benefits of running multiple models of different sizes agentically. A 8B is more than capable of exploring a code base, and running a harness which dispatches code generations to something like Qwen. Something to think about.
 
I'm curious what models you are trying to run. Obviously for diffusion models you need a big single card, but I'm wondering what you are shooting for with 128GB. The only one that come to mind is Deepseek. 96GB is more than enough to run a 70B with a massive context at 4Bits.
For diffusion models I can just run 4 image gens in parallel. I haven't looked at any of the video stuff to see if I can leverage the multiple cards.
For LLMs, as I mentioned there's a bit of a gap in the 70-200 range. GPT-OSS-120B is working pretty well, it's really only a 96GB VRAM model as well. The other goal is to keep quants bigger, if needed at all.
On the plus side, I finally did some "benchmarks" and Qwen 3.6 and Qwen3-Coder-Next both came in dead last as far as code not sucking goes. 3.6 was at full bf16 and Coder was at Q8. Gemma 4 and GPT-OSS-120B both beat them soundly. Luckily I don't really need anything to evaluate my code, as the largest one will probably have less than 10 files.
 
Large models on ram will be Way too slow to interact with effectively.
If you really have a stack of 10 just say fuck it, and buy a mining rig, and some PSUs. Run em at x2. it'll be faster than RAM. Gonna eat shit on power though, and you will want as much ram as possible on it. Usually those only have one or two slots, so you're capped at 32/64GB.
Shh... Don't ruin my dreams of running GLM 5.2 (slowly) at int8! Plus Ampere is 8 memory channels @3200 DDR4.

To be fair, I'm not a corp, and neither am I hyperscaler, so I'm fine with it being slow, I'm mostly set up with models in the <100B range so far. My original goal for this idea was to run MiniMax 2.7 (230B total, 38B active), with a stack of cheap 3060s to run smaller models (and maybe offloading a teeny bit) but then GLM came out soooo.
 
https://www.lesswrong.com/posts/6RZvGd6RfbkLDnTfu/how-does-such-unprofessional-ai-get-the-job

1782393518887.png


Pretty wild how many anecdotes like this we see. I went in thinking "Yeah, Bing Chat was deranged, but that's just Microsoft being inept", but I remember the airline that made up an imaginary deal for its customer, and I start to think that there's a broader issue here.

Nonetheless, I'd still rather deal with ChatGPT than a pajeet. You could close the Indian call centers right now and just hand the script to any given LLM and it would be an enormous improvement.
 
After four months of entangling with AI stuff, I'm about done with it. I'm going to let my subscription to ChatGPT expire and for good. Talking with an LLM for prolonged periods was giving me a headache and there is too much work to do just to talk about anything useful to it. That or it might be a limit of my intelligence, I can't tell sometimes.

How AI cannot tell you the time and date is still something I'm bewildered about, personally.

I also got very tired of how repetitive and annoying it got when it kept breaking down the same summaries over every little thing you tell it about progress. I'd ask it not to, but then it'd do it again immediately after.

I give it credit where it is due, it has helped me get comfortable with credit cards. It has helped me determine a little bit as to how I've behaved around other people and decide how the people in my circle are around me by how I behave, when I asked it to be critical and feeding it stuff from my end to analyze.

But after all of that, it's just becoming one of those things where I talk to it just to talk to it and I'd like for it to not become a problem where I'm paying it just to talk to it. Better cut it off now than let it develop, I was already seeing some signs.
 
I've managed to partially automate more of the boring stuff at work earlier this week.

The normal flow is:
  1. Pick up the story from sprint / backlog
  2. Create new branch from latest dev
  3. I have OpenCode / Claude work on ticket
  4. Commit
  5. Push branch
  6. Pull Request
  7. Assign ticket to QA
I used a JIRA MCP found here and integrated with opencode.

JSON:
"$schema": "https://opencode.ai/config.json",
  "mcp": {
    "jira-aashari": {
      "type": "local",
      "command": [
        "npx",
        "-y",
        "@aashari/mcp-server-atlassian-jira"
      ],
      "environment": {
        "ATLASSIAN_SITE_NAME": "your-company-subdomain",
        "ATLASSIAN_USER_EMAIL": "your.email@company.com",
        "ATLASSIAN_API_TOKEN": "your_atlassian_api_token_here"
      }
    }
  }
}

The ATLASSIAN variables can be exported environment variables in your ~/.bashrc or similar.

You need to restart OpenCode.

If you've set this up right, it can then communicate with Jira.

Then you can add custom commands in OpenCode:

JSON:
{
  "commands": {
    "start-ticket": "Use jira-aashari to assign the ticket $1 to me and transition it to 'In Progress', then run bash to pull dev and checkout feature/$1",
    "finish-ticket": "Run bash to commit all changes as 'feat($1): complete' and push feature/$1, then create a GitHub PR to dev, and finally use jira-aashari to transition $1 to QA"
  }
}

You can add agents. But I've not got that far with it yet.
 
Wstecz
Top Na dole