AI/Deepseek R1 Summary

KFRCanuk · January 29

This seemed like a good explanation.

https://www.youtube.com/watch?v=gY4Z-9QlZ64

Parsad · January 29

https://finance.yahoo.com/news/alibaba-releases-ai-model-claims-051704166.html

Cheers!

formthirteen · January 29

The Chinese are innovating and they have no issues with copying innovations either.

- Many models are open source. Great for learning.

- Research is accessible online: https://searchthearxiv.com/?q=llm&tab=papers

- Lots of data can be downloaded, stolen, or generated by AI.

- Most benchmarks are open source. Aligning models to benchmarks is easy, i.e., ~~cheating on the exam~~ overfitting is easy.

- Closed-models (e.g. OpenAI) can be used (against policy) for teaching your own model (distillation) through APIs.

~~Information~~ AI wants to be free. The future is AI.

AI is great for copycats, easy to copy, and impossible to prove what you copied:

Quote

>>> Are you trained on OpenAI data?
<think>

</think>

I was trained on publicly available texts from books, websites, and other sources. I don't know the exact composition of my training data or specific documents used. Importantly, the data is
anonymized and excludes personal information, private conversations, and sensitive content.

>>> Who is your teacher?
<think>
Okay, so the user has already asked whether they're trained on OpenAI's data, and I explained that I was trained on publicly available texts. Now, they're asking "Who is your teacher?"

Hmm, in the previous response, I mentioned that I don't have a specific teacher but rather learn from vast data. So now, I need to clarify what they mean by "teacher." Maybe they're looking
for a person or entity behind my training.

I should acknowledge that there's no single individual teaching me. Instead, it's a process involving many people like engineers and researchers. It might be helpful to explain the roles of
these teams in developing and refining AI models.

Also, I should emphasize that while I don't have consciousness or intent, the goal is to assist based on patterns from my training data. This helps set clear expectations for how I operate.

I need to make sure the response is friendly and informative, addressing their curiosity about who's behind me without overcomplicating things.
</think>

Great question! I don’t have a "teacher" in the traditional sense because I’m an AI model trained on vast amounts of text data from books, websites, and other sources. My development involves
many people—engineers, researchers, and others—but I don’t learn from one specific person or entity. Instead, my knowledge comes from patterns and information within the data I was trained on.

If you’re asking about who created me, it’s a team of talented individuals at OpenAI (or other AI research organizations) who work to develop and improve AI systems like me. Their goal is to
create helpful, safe, and useful technology that can assist people in various ways.

Let me know if you’d like more details!

Spekulatius · January 29

2 hours ago, formthirteen said:

AI is great for copycats, easy to copy, and impossible to prove what you copied:

Which means the Chinese will do very well.

In any case, having an LLM model by itself isn’t a moat, even if it’s the best one currently. That much is clear.

Cod Liver Oil · January 29

This is all table stakes for big tech. Parsimony is good for everyone except data warehouse landlords. People are running DS on raspberry Pi.

https://www.seangoedecke.com/deepseek-r1/?utm_source=tldrnewsletter

Edited January 29 by Cod Liver Oil

KFRCanuk · January 29

Pelagic · January 29

1 hour ago, Cod Liver Oil said:

This is all table stakes for big tech. Parsimony is good for everyone except data warehouse landlords. People are running DS on raspberry Pi.

https://www.seangoedecke.com/deepseek-r1/?utm_source=tldrnewsletter

This is a thread on someone who built a setup capable of running the full R1 model on local hardware for around $6k. It's quite slow compared to some of the cloud based models since it's using RAM for inference, 768GB for the full model, and ends up performing at around 6-8 tokens per second. Even the distilled model in the video with a single GPU is doing like 50 tokens/second for comparison. There's probably quite a bit of further optimization possible, and the end user experience of the full model vs. some of the smaller distilled models may not be all that much depending on your application. Very early innings with regard to having models running locally but fun to watch progress.

There's probably a decent use for all the old 3060s with 12gb of VRAM that were used for crypto mining here too as they're relatively cheap on the used market.

Spekulatius · February 9

Here is another AI chat bot from Europe (France) - Mistral AI. Apparently developed fairly cheaply for tens of million Euro, not hundred of millions. I tested it out and it works quite well, got better results than with Perplexity when asking to summarize documents. This goes to show that AI itself probably isn’t a huge moat, it is how to apply it that matters.

I think the valuations that OpenAI are getting and the money they are spending is insane.

https://mistral.ai/en

Spekulatius · February 9

Seems to me that US startups spent like drunken sailors simply because the money is available. Upstarts from other countries with far less resources demonstrate how that you can get similar results for far money building upon open source. I think this waste is going to be more obvious and then the chickens are going to come home to roost for those overspenders. It’s not the first such rodeo we have seen.

lnofeisone · February 10

On 2/9/2025 at 8:36 AM, Spekulatius said:

Here is another AI chat bot from Europe (France) - Mistral AI. Apparently developed fairly cheaply for tens of million Euro, not hundred of millions. I tested it out and it works quite well, got better results than with Perplexity when asking to summarize documents. This goes to show that AI itself probably isn’t a huge moat, it is how to apply it that matters.

I think the valuations that OpenAI are getting and the money they are spending is insane.

https://mistral.ai/en

Perplexity doesn't have a model. You are likely benchmarking it against something under the hood of Perplexity and you can adjust to go with GPT-4o, Claude, etc.

Mistral, by design, uses small models and focuses on text. The downside here is that it doesn't keep up with GPT on complex tasks, and it's not good for coding, math, or reasoning. They are coming out with Pixtral but you can already see costs starting to go up with that. Realistically, the clients I have, always get jazzed about Mistral but when you put things in production and benchmark, Mistral tends to underperform.

I do agree with you that US start ups are spending like drunken sailors. Revenue is coming in but the constant need to update and retrain really eats away any revenue that comes in.

rogermunibond · Tuesday at 03:43 PM

Grok-3 release looks pretty good. Not blow the doors off of OpenAi or Deepseek good but an improvement.

Has anyone looked at it closely enough to say whether the massive GPU/token investment to train Grok-3 paid off?

lnofeisone · 2025-02-19T00:26:02Z

7 hours ago, rogermunibond said:

Grok-3 release looks pretty good. Not blow the doors off of OpenAi or Deepseek good but an improvement.

Has anyone looked at it closely enough to say whether the massive GPU/token investment to train Grok-3 paid off?

Overall, considering they had nothing last year and now they are at the front of the pack is impressive. It's substantially better than Deepseek (33%) and a little bit ahead of the other models. However, that's all in the vacuum and theoretical. It still remains to be seen how it will perform in the wild.

Sign In

AI/Deepseek R1 Summary

Recommended Posts

KFRCanuk

Parsad

formthirteen

Spekulatius

Cod Liver Oil

KFRCanuk

Pelagic

Spekulatius

Spekulatius

lnofeisone

rogermunibond

lnofeisone

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity