Kako je DeepSeek šokiral in poslal NASDAQ na jug

DeepSeek je šokiral Nasdaq. Delnice Nvidie, glavnega proizvajalca čipov za umetno inteligenco in glavnega zmagovalca lanskega hypa glede umetne inteligence, so v predtrgovanju padle za več kot 12 %. Ostale delnice na Nasdaqu so podobno usmerjene na jug. Obeta se divji teden na borzah.

Spodaj je dobra nit, kakšno revolucijo je povzročil kitajski DeepSeek. Kitajci so na podlagi inovativnih poenostavitev (ker zaradi sankcij niso imeli dostopa do najnaprednejših Nvidij in so morali delati na manj zmogljivih) močno zmanjšali procesorski čas in obremenitev procesorjev, kar posledično pomeni, da je mogoče podobne ali boljše rezultate glede modelov umetne inteligence doseči z manj procesorji in z manj zmogljivimi procesorji. Zadeva spominja na pogovor izpred časa z Markom Golobom glede njihove dejavnosti, kjer so Kitajci prav tako namesto perfekcije uporabili koristne poenostavitve, ki pa še vedno dajejo super rezultate, in s tem dosegli ogromne prihranke pri stroških.

Pomembno je, kaj ta revolucija pomeni glede pričakovanj o povpraševanju po naprednih zmogljivih čipih in kaj to pomeni glede projekcij o možnostih komercializacije modelov umetne inteligence. Kitajska podjetja so spet dokazala svojo disruptivnost in da lahko vse skopirajo in nato to naredijo boljše in nekajkrat ceneje. Strašljivo.

_____________

Let me break down why DeepSeek’s AI innovations are blowing people’s minds (and possibly threatening Nvidia’s $2T market cap) in simple terms…

0/ first off, shout out to @doodlestein who wrote the must-read on this here:

1/ First, some context: Right now, training top AI models is INSANELY expensive. OpenAI, Anthropic, etc. spend $100M+ just on compute. They need massive data centers with thousands of $40K GPUs. It’s like needing a whole power plant to run a factory.

2/ DeepSeek just showed up and said “LOL what if we did this for $5M instead?” And they didn’t just talk – they actually DID it. Their models match or beat GPT-4 and Claude on many tasks. The AI world is (as my teenagers say) shook.

3/ How? They rethought everything from the ground up. Traditional AI is like writing every number with 32 decimal places. DeepSeek was like “what if we just used 8? It’s still accurate enough!” Boom – 75% less memory needed.

4/ Then there’s their “multi-token” system. Normal AI reads like a first-grader: “The… cat… sat…” DeepSeek reads in whole phrases at once. 2x faster, 90% as accurate. When you’re processing billions of words, this MATTERS.

5/ But here’s the really clever bit: They built an “expert system.” Instead of one massive AI trying to know everything (like having one person be a doctor, lawyer, AND engineer), they have specialized experts that only wake up when needed.

6/ Traditional models? All 1.8 trillion parameters active ALL THE TIME. DeepSeek? 671B total but only 37B active at once. It’s like having a huge team but only calling in the experts you actually need for each task.

7/ The results are mind-blowing:
– Training cost: $100M → $5M
– GPUs needed: 100,000 → 2,000
– API costs: 95% cheaper
– Can run on gaming GPUs instead of data center hardware

8/ “But wait,” you might say, “there must be a catch!” That’s the wild part – it’s all open source. Anyone can check their work. The code is public. The technical papers explain everything. It’s not magic, just incredibly clever engineering.

9/ Why does this matter? Because it breaks the model of “only huge tech companies can play in AI.” You don’t need a billion-dollar data center anymore. A few good GPUs might do it.

10/ For Nvidia, this is scary. Their entire business model is built on selling super expensive GPUs with 90% margins. If everyone can suddenly do AI with regular gaming GPUs… well, you see the problem.

11/ And here’s the kicker: DeepSeek did this with a team of <200 people. Meanwhile, Meta has teams where the compensation alone exceeds DeepSeek’s entire training budget… and their models aren’t as good.

12/ This is a classic disruption story: Incumbents optimize existing processes, while disruptors rethink the fundamental approach. DeepSeek asked “what if we just did this smarter instead of throwing more hardware at it?”

13/ The implications are huge:
– AI development becomes more accessible
– Competition increases dramatically
– The “moats” of big tech companies look more like puddles
– Hardware requirements (and costs) plummet

14/ Of course, giants like OpenAI and Anthropic won’t stand still. They’re probably already implementing these innovations. But the efficiency genie is out of the bottle – there’s no going back to the “just throw more GPUs at it” approach.

15/ Final thought: This feels like one of those moments we’ll look back on as an inflection point. Like when PCs made mainframes less relevant, or when cloud computing changed everything.

AI is about to become a lot more accessible, and a lot less expensive. The question isn’t if this will disrupt the current players, but how fast.

/end

P.S. And yes, all this is available open source. You can literally try their models right now. We’re living in wild times! 🚀

Vir: Morgan Brown via X

En odgovor

  1. Naslednji korak je, da Kitajci pogruntajo, kako hitreje računati kriptovalute in sesujejo še ta trg.

    In mogoče bo to tudi končno na zahodu spodbudilo več ljudi, da bodo s svojo glavo začeli razvijati konkurenčne storitve in produkte namesto da bodo živeli od kapitala.

    Všeč mi je

  2. Pred desetletji, ko sem še delal za IBM Australia, sem prisostvoval konferenci z razvojniki iz Palo Alta iz Kalifornije. Šlo je za nove diskovne sisteme, ki so jih v Avstraliji že težko pričakovali. Eden od udeležencev je vprašal:

    Koliko ljudi dela na projektu?

    150.

    Koliko časa rabite?

    Leto in pol.

    Če vam damo še 15o ljudi, koliko časa bo trajalo?

    Dve leti in pol.

    Ta resnična zgodba kaže, da ni vse v kvantiteti. To so zelo dobro razumeli Rusi, ki so s superiornim razvojem SW-a kompenzirali svoj zaostanek v HW. Spomnim se pogovora v nekem nemškem podjetju, ko so nam (med nami je bil tudi ruski inžinir) predstavljali simulacijo sistema CAD. Naš ruski kolega se je samo nasmehnil, rekoč:

    “Res, da so naši procesorji 3 do 5 krat počasnejši, zato je pa naš SW stokrat hitrejši.”

    Liked by 1 person