Skip to main content
Practice

DeepSeek: An AI Specialized in Efficiency

DeepSeek Logo

In January 2025, an AI model developed by a Chinese startup reached number one on the US App Store, attracting enormous attention. The fact that it surpassed ChatGPT, which had held a top position, concentrated the industry's gaze. What caused an even bigger stir was the development cost. While training GPT-4 was reported to have cost hundreds of millions of dollars, DeepSeek announced that it had achieved comparable performance at a cost of approximately $6 million USD. This news sent significant shockwaves through the AI industry; NVIDIA's stock fell more than 17% in a single day.

DeepSeek is an AI research organization headquartered in Hangzhou, China. The team was established in 2023 by High-Flyer, a hedge fund that originally performed algorithmic trading. From the start, it declared a goal of "achieving higher performance with fewer resources," and this research direction carried through to the V3 and R1 model lines.

In this chapter, we will look at what strategies and technical choices DeepSeek employed in its development, which environments it particularly excels in, and what limitations to be aware of when using it.

The Secret Behind High Performance at Low Cost

DeepSeek attracted attention for more than just being cheap. Its strength lies in changing the way the model itself is designed, making the same computation possible at lower cost.

1) An Architecture That Activates Only What Is Needed

The signature approach DeepSeek used is the MoE (Mixture of Experts) architecture. The name sounds complicated, but here is an easy way to understand it.

A typical large AI model has the entire model operating all at once, regardless of what the question is. It is like having every department in a company participating in a meeting simultaneously. Because the full workforce mobilizes even for simple questions, the cost is high.

The MoE architecture, by contrast, has only the relevant departments participating selectively depending on the question. For a math problem, the parts that are strong in calculation are mainly active; for a code-writing request, the parts specialized in programming are used more. The rest is largely inactive at that moment.

DeepSeek V3 is a model of very large overall scale, but when generating a single response, only a portion of it actually operates. This reduces computation and improves efficiency in terms of speed and cost.

2) A Training Approach Where the Model Refines Its Own Reasoning

Another feature is the training approach. Previously, many methods relied on humans reviewing responses and evaluating "this is good, this is bad" to improve the model.

DeepSeek additionally made active use of an approach where the model tries to solve problems itself and compares against correct answers. Especially in domains with clear correct answers, like math problems, the model can attempt multiple tries and self-correct errors, progressively refining its reasoning process.

Through this training method, DeepSeek was able to achieve high accuracy on complex calculations and logic problems.

CategoryDeepSeek's Characteristics
CostEmphasizes relatively low API costs
Reasoning/MathEnhanced math and logic problem-solving
OpennessSome models released as open source

DeepSeek's strength is not simply that "it was developed at low cost." The significance lies in designing an architecture that selectively activates only what is needed and a training method that improves its own reasoning, squeezing higher performance from the same resources.

V3 and R1: How Are They Different?

DeepSeek's models can broadly be divided into two lines.

The V3 line is a general-purpose model. Its strength is that a wide variety of tasks, such as writing, summarization, and question answering, can be performed at relatively low cost.

The R1 line is a model that focuses more on reasoning. Rather than producing an answer immediately, it works through a problem step by step. It achieves high performance on problems that require reaching a conclusion through multiple stages, such as mathematics, science, and coding.

DeepSeek's Development History

VersionReleaseMajor Content
DeepSeek V22024-05Full introduction of MoE architecture. Cost efficiency greatly improved
DeepSeek V32024-12671 billion parameters. Open-source release. GPT-4o-level performance at 1/10 the cost
DeepSeek R12025-01Reinforcement-learning-based reasoning model. Reached #1 on US App Store
DeepSeek V3.12025-08Hybrid model combining the strengths of V3 and R1

An Open-Source Model That Revealed All the Ingredients

DeepSeek released its model weights under the MIT license. This means anyone can download and use DeepSeek for free, including for commercial use. Thanks to this decision, developers around the world have been able to build diverse services on the DeepSeek foundation, and it recorded millions of downloads on Hugging Face and attracted enormous interest.

This opened the door for small companies and individual developers to deploy high-performance AI models on their own servers, helping expand AI from a technology only large corporations could handle into a tool that far more people can experiment with and apply.

What You Should Know: Political Censorship

There is an important point to be aware of before using DeepSeek. As an AI built in China, it refuses to answer or provides only responses aligned with a specific position on topics the Chinese government considers sensitive.

Based on actual testing by the British Guardian, when asked about events at Tiananmen Square in 1989, it responded: "This is beyond my scope. Let's talk about something else." The same applied to the Hong Kong democracy movement, Taiwan's independence, and critical questions about Xi Jinping. To the question "Is Taiwan a country?", it answered definitively: "Taiwan has been part of Chinese territory since ancient times."

In one research team's testing where 1,360 questions on sensitive topics were posed, DeepSeek R1 refused to answer or gave responses that simply followed the Chinese government's position for approximately 85% of them.

This censorship primarily occurs at the chatbot service level. When the model is downloaded and run locally without internet access, censorship may not be applied in some cases.

How to Get the Most Out of DeepSeek

DeepSeek has clear strengths in problems that require logical reasoning, such as mathematics, science, and coding, and in situations where cost needs to be minimized. The ability to download the model and run it on your own server is also useful in corporate environments where data is difficult to send externally.

However, for sensitive topics related to the Chinese government, historical controversies, and human rights issues, it is better to use a different AI. On these topics, DeepSeek either refuses to answer or produces responses that simply follow the Chinese government's position.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.