EmbeddingGemma Google 308M: On-Device Embedding 100+ ภาษา 200MB RAG Ready
Back to articles

EmbeddingGemma Google 308M: On-Device Embedding 100+ ภาษา 200MB RAG Ready

EmbeddingGemma 308M Gemma 3 encoder 768D→128D MRL 2048 tokens 100+ ภาษา MTEB #1 on-device RAG Android/iOS/macOS Python JS deployment latency benchmarks

ai Updated: January 8, 2026

EmbeddingGemma: Google DeepMind 2GB RAM Embedding 768D - รองรับ 100+ ภาษา ออฟไลน์

EmbeddingGemma 308M params Gemma 3 encoder รองรับมือถือ/แล็ปท็อป 200MB quantized 2048 tokens context 100+ ภาษา MTEB top rank Matryoshka 768→128D on-device RAG

EmbeddingGemma Technical Specs

Specรายละเอียด
Params308M
Memory<200MB (INT8)
Embedding Dim768 (MRL: 512/256/128)
Context2048 tokens
Languages100+
ArchitectureGemma 3 Bi-directional
Training Data320B filtered tokens
LicenseOpen weights

Matryoshka Representation Learning (MRL)

768D → 512D: 1.2% perf drop
768D → 256D: 3.5% perf drop
768D → 128D: 8.2% perf drop
Storage: 768D=2.3MB → 128D=400KB

MTEB Leaderboard: Sub-500M Category

1. EmbeddingGemma 308M: 64.12
2. E5-small-v2 33M: 62.53
3. BGE-small 33M: 61.22
4. Snowflake 300M: 60.85

On-Device Deployment Targets

📱 Android (TensorFlow Lite)
🍎 iOS (CoreML)
💻 macOS (Metal)
🖥️ Windows (DirectML)
🤖 Edge TPU / NPU

Quick Start Code Examples

Python (HuggingFace):

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('google/embedding-gemma')
embeds = model.encode(["สวัสดี", "Hello"])
similarity = cosine_similarity(embeds[0], embeds[1])

JavaScript (ONNX):

import { InferenceSession } from 'onnxruntime-web';
const session = new InferenceSession();
await session.loadModel('embedding-gemma.onnx');
const embeddings = await session.run(input);

RAG Pipeline with EmbeddingGemma

1. Chunk docs → 512 token segments
2. Embed with 256D (MRL)
3. FAISS index (on-device)
4. Query → Top-K retrieve
5. Gemma 2B generate answer

Latency: 45ms/query on Snapdragon 8 Gen 3

Use Cases On-Device Embedding

🔍 Semantic search (notes, docs)
💬 Chatbot RAG (privacy-first)
🎵 Music recommendation
📧 Email clustering
📚 Book passage retrieval

Multi-Language Performance

🇹🇭 Thai: 92% MTEB multilingual
🇯🇵 Japanese: 89%
🇰🇷 Korean: 91%
🇻🇳 Vietnamese: 87%
🇮🇩 Indonesian: 88%

Model Quantization Options

PrecisionSizeSpeedupPerf Drop
FP16600MB1x0%
INT8200MB1.8x0.5%
INT4120MB3.2x1.2%

Edge Hardware Compatibility

✅ Snapdragon 8 Gen 3 (12ms/inference)
✅ Apple A18 (8ms)
✅ MediaTek Dimensity 9400 (15ms)
✅ Intel Lunar Lake NPU (10ms)
D

DRITESTUDIO

DRITESTUDIO COMPANY LIMITED - Cloud, VPS, Hosting and Colocation provider in Thailand

Manage your cookie settings

We use different types of cookies to optimize your experience on our website. Click on the categories below to learn more and customize your preferences. Note that blocking some types of cookies may impact your experience.

Necessary Cookies

These cookies are essential for the website to function properly. They enable basic functions like page navigation and access to secure areas.

View cookies used
  • Session cookies (session management)
  • Security cookies (CSRF protection)
Always On

Functional Cookies

These cookies enable personalized features like language preferences and theme settings. Without these, some features may not work properly.

View cookies used
  • lang (language preference)
  • theme (dark/light mode)

Analytics Cookies

These cookies help us understand how visitors interact with our website by collecting and reporting information anonymously.

View cookies used
  • _ga (Google Analytics)
  • _gid (Google Analytics)

Marketing Cookies

These cookies are used to track visitors across websites to display relevant advertisements based on your interests.

View cookies used
  • Advertising cookies
  • Remarketing pixels

Privacy Policy