❤️ AZDIGI has officially updated to a new blog system. However, some posts may have incorrect or mismatched images. Please click the Report article button at the bottom of the post so AZDIGI can update as quickly as possible. Thank you!

Open Ollama, type ollama list and see hundreds of models available to pull. Llama, Qwen, DeepSeek, Gemma, Phi, Mistral… each with various sizes 1B, 7B, 14B, 70B. Not to mention quantization q4, q8, fp16. Beginners looking at this would definitely be overwhelmed.

This article will help you understand each popular model family in 2025, compare them with each other, and most importantly: choose which model suits your VPS configuration.

How to Read Model Names in Ollama

Before diving into comparisons, you need to know how to read model names. For example:

qwen2.5:7b-instruct-q4_K_M

This name consists of 4 parts:

  • qwen2.5 — Model family (model series, developed by Alibaba)
  • 7b — Number of parameters (7 billion parameters). The larger the number, the “smarter” the model but also the more RAM it consumes
  • instruct — Variant that has been fine-tuned for chat/instruction. If not present, it’s a base model (used for further training, not for chat)
  • q4_K_M — Quantization level. This is a compression technique to reduce RAM usage. q4 = 4-bit (lightest), q8 = 8-bit (balanced), fp16 = full precision (heaviest, highest quality)

Tip: When using Ollama, the default pull will be the q4_K_M version. This is a well-balanced quantization level between quality and RAM usage. You don’t need to change anything unless you have special requirements.

Most Popular Model Families Currently

Below are the 6 model families you’ll encounter most when self-hosting AI on VPS.

Comparing AI models running on VPS

Qwen 2.5 (Alibaba)

This is the model family I recommend most for Vietnamese users. Qwen 2.5 was trained by Alibaba with a very large multilingual dataset, including Vietnamese. The result is that it understands and responds in Vietnamese much better than other models of the same size.

  • Size: 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B
  • License: Apache 2.0 (comfortable for commercial use)
  • Strengths: Excellent Vietnamese, has dedicated Coder version for programming, diverse sizes
  • Weaknesses: Reasoning not as good as DeepSeek-R1 at the same size

Llama 3.x (Meta)

Llama is the “national” model family of the open-source AI community. Meta invested heavily and the community support is also the strongest. Most tools, frameworks, and tutorials use Llama as the default model.

  • Size: 1B, 3B, 8B, 70B, 405B
  • License: Llama Community License (free for most use cases, restricted if over 700 million MAU)
  • Strengths: All-rounder, large community, abundant documentation, stable
  • Weaknesses: Vietnamese not as good as Qwen, 70B+ versions are very heavy

DeepSeek-R1 (DeepSeek)

DeepSeek-R1 is a model specialized in reasoning. It uses chain-of-thought, meaning the model will “think step by step” before giving an answer. Very suitable for logic problems, analysis, and complex programming.

  • Size: 1.5B, 7B, 8B, 14B, 32B, 70B (distill versions)
  • License: MIT
  • Strengths: Superior reasoning, good coding, precise logical inference
  • Weaknesses: Longer output (due to chain-of-thought), slower than regular models of the same size

Gemma 3 (Google)

Google joined the open-weight game with Gemma. The highlight is very good quality for small sizes. If you have a low-configuration VPS but still want stable quality, Gemma is worth trying.

  • Size: 1B, 4B, 12B, 27B
  • License: Gemma License (similar to Apache, allows commercial use)
  • Strengths: Compact, high efficiency for size, multimodal (image support)
  • Weaknesses: Average Vietnamese, few size choices

Phi-3.5 / Phi-4 (Microsoft)

Microsoft focuses on the “small but mighty” approach. Phi-3.5 only has 3.8B parameters but beats many 7B models in benchmarks. If you only have a 4GB RAM VPS, this is a very worth considering choice.

  • Size: 3.8B (Phi-3.5), 14B (Phi-4)
  • License: MIT
  • Strengths: Extremely light, high quality for size, MIT license
  • Weaknesses: Few size choices, limited Vietnamese, smaller community

Mistral / Mixtral (Mistral AI)

French startup famous for Mixture of Experts (MoE) architecture. Mixtral 8x7B has 47B total parameters but only activates 13B per inference, so it’s fast while maintaining high quality.

  • Size: 7B (Mistral), 8x7B / 8x22B (Mixtral)
  • License: Apache 2.0
  • Strengths: Fast, efficient MoE architecture, good coding
  • Weaknesses: Mixtral needs a lot of RAM (though inference is fast), average Vietnamese

Overall Comparison Table

The table below compares models at the most popular sizes (7B-14B), running quantization q4_K_M on CPU:

Comparing AI models running on VPS
ModelSizeRAM needed (q4)LicenseVietnameseCodingReasoning
Qwen 2.5 7B7B~5.5 GBApache 2.0⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Llama 3.1 8B8B~6 GBLlama License⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
DeepSeek-R1 8B8B~6 GBMIT⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Gemma 3 12B12B~8 GBGemma License⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
Phi-3.53.8B~3 GBMIT⭐⭐⭐⭐⭐⭐⭐⭐
Mistral 7B7B~5.5 GBApache 2.0⭐⭐⭐⭐⭐⭐⭐⭐⭐
Qwen 2.5 14B14B~10 GBApache 2.0⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐
DeepSeek-R1 14B14B~10 GBMIT⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐⭐

Choosing Models by VPS Configuration

This is the most important part. No matter how good a model is, if your VPS doesn’t have enough RAM, it’s useless. Below are specific suggestions for each configuration level:

4GB RAM VPS: Works, but Choose Carefully

With 4GB RAM, you need to reserve about 1GB for the operating system and Ollama, leaving ~3GB for the model. Best choices:

  • Qwen 2.5 3B (q4_K_M) ~ 2.3 GB: Best Vietnamese quality in this segment
  • Phi-3.5 3.8B (q4_K_M) ~ 2.8 GB: Higher benchmarks but weaker Vietnamese
  • Gemma 3 1B (q4_K_M) ~ 1 GB: Extremely light, suitable for simple tasks

8GB RAM VPS: Sweet Spot for Beginners

8GB RAM is the level I recommend starting with if you’re serious about self-hosting AI. You can run most 7-8B models with good quality:

  • Qwen 2.5 7B (q4_K_M) ~ 5.5 GB: #1 choice if using Vietnamese
  • Llama 3.1 8B (q4_K_M) ~ 6 GB: All-rounder, abundant documentation and community
  • DeepSeek-R1 8B (q4_K_M) ~ 6 GB: If you need strong reasoning
  • Gemma 3 4B (q4_K_M) ~ 3 GB: Light, saves RAM for other applications

16GB RAM VPS: Step Up to a New Level

With 16GB, you unlock 14B models. This is where output quality starts to be truly impressive:

  • Qwen 2.5 14B (q4_K_M) ~ 10 GB: Best pick. Vietnamese + coding + reasoning all good
  • DeepSeek-R1 14B (q4_K_M) ~ 10 GB: Extremely strong reasoning at this level
  • Gemma 3 12B (q4_K_M) ~ 8 GB: Lighter, runs faster

32GB+ RAM VPS: Running Large Models

32GB RAM allows running 32B models on CPU, but speed will be slow. If you want a smooth experience at this size, you should have a GPU:

  • Qwen 2.5 32B (q4_K_M) ~ 22 GB: Quality close to commercial models
  • DeepSeek-R1 32B (q4_K_M) ~ 22 GB: Reasoning rivals much larger models
  • Gemma 3 27B (q4_K_M) ~ 18 GB: Good, slightly lighter

Note: The RAM numbers above are estimates for q4_K_M quantization models running on CPU. Actual usage may vary depending on context length and operating system. You should reserve at least 1-2 GB RAM for the system.

Choosing Models by Use Case

Besides hardware configuration, you should also choose models based on your main usage purpose:

Chat and Vietnamese Processing → Qwen 2.5

If you need a chatbot that responds in Vietnamese, writes Vietnamese content, or processes Vietnamese documents, then Qwen 2.5 is the clearest choice. At every size from 3B to 72B, Qwen’s Vietnamese capabilities are superior to competitors in the same class.

Programming / Coding → DeepSeek or Qwen 2.5 Coder

Need AI to help write code? Two top choices:

  • DeepSeek-R1: Good logical reasoning, great debugging, understands complex problems
  • Qwen 2.5 Coder: Specialized version for coding, supports many programming languages, has 7B version that runs comfortably on 8GB VPS

Reasoning and Analysis → DeepSeek-R1

Logic problems, data analysis, explaining complex issues? DeepSeek-R1 with chain-of-thought will give higher quality output. In return, responses will be longer and slower because the model needs to “think” before answering.

General Purpose / Multi-use → Llama 3.1

If you don’t have specific needs and want a model that can do everything reasonably well, Llama 3.1 8B is a safe choice. Large community, abundant documentation, easy to find solutions when encountering errors.

Real Test: Same Prompt, 4 Models Respond Differently

I tested with the same Vietnamese prompt on 4 popular models (all around 7-8B size, q4_K_M quantization) so you can see the differences more clearly:

Prompt: “Explain how DNS works in simple language, about 3-4 sentences.”

Qwen 2.5 7B: Natural response, correct Vietnamese grammar, clear explanation. No vocabulary errors or awkward sentences. This is output you can use immediately without much editing.

Llama 3.1 8B: Correct content but Vietnamese phrasing sounds a bit “machine translated”. Some phrases don’t sound natural. Still usable but needs review.

DeepSeek-R1 8B: Very detailed response, with long reasoning section before giving the main answer. Good content quality but much longer response than requested. Vietnamese at decent level.

Gemma 3 4B: Short and to the point. Vietnamese somewhat stiff but acceptable for a 4B model. Fastest response among the 4 models.

Conclusion: Which Model Should You Choose?

After testing and comparing, here are my recommendations:

  • Just starting, want simplicity: Just ollama pull qwen2.5:7b and use it. This is the best choice for most Vietnamese users.
  • Weak VPS (4GB RAM): ollama pull qwen2.5:3b or ollama pull phi3.5
  • Need coding assistant: ollama pull qwen2.5-coder:7b or ollama pull deepseek-r1:8b
  • Want highest quality (16GB RAM): ollama pull qwen2.5:14b
  • Want to try multiple models: Just pull 2-3 models and test with the same prompt. Ollama allows easy model switching, no need to commit to any single model.

Tip: You can install multiple models simultaneously on Ollama. Just need enough disk space. Ollama only loads models into RAM when you actually use them, and automatically frees up after a few minutes of non-use.

The AI model world is developing extremely fast. Every few months there are new, better models. But with what’s currently available, Qwen 2.5 is the most comprehensive choice for Vietnamese users self-hosting AI on VPS. Start with it, then try other models when you get familiar.

Share:
This article has been reviewed by AZDIGI Team

About the author

Trần Thắng

Trần Thắng

Expert at AZDIGI with years of experience in web hosting and system administration.

10+ years serving 80,000+ customers

Start your web project with AZDIGI