EmbeddingGemma: Google DeepMind 2GB RAM Embedding 768D - รองรับ 100+ ภาษา ออฟไลน์
EmbeddingGemma 308M params Gemma 3 encoder รองรับมือถือ/แล็ปท็อป 200MB quantized 2048 tokens context 100+ ภาษา MTEB top rank Matryoshka 768→128D on-device RAG
EmbeddingGemma Technical Specs
| Spec | รายละเอียด |
|---|---|
| Params | 308M |
| Memory | <200MB (INT8) |
| Embedding Dim | 768 (MRL: 512/256/128) |
| Context | 2048 tokens |
| Languages | 100+ |
| Architecture | Gemma 3 Bi-directional |
| Training Data | 320B filtered tokens |
| License | Open weights |
Matryoshka Representation Learning (MRL)
768D → 512D: 1.2% perf drop
768D → 256D: 3.5% perf drop
768D → 128D: 8.2% perf drop
Storage: 768D=2.3MB → 128D=400KB
MTEB Leaderboard: Sub-500M Category
1. EmbeddingGemma 308M: 64.12
2. E5-small-v2 33M: 62.53
3. BGE-small 33M: 61.22
4. Snowflake 300M: 60.85
On-Device Deployment Targets
📱 Android (TensorFlow Lite)
🍎 iOS (CoreML)
💻 macOS (Metal)
🖥️ Windows (DirectML)
🤖 Edge TPU / NPU
Quick Start Code Examples
Python (HuggingFace):
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('google/embedding-gemma')
embeds = model.encode(["สวัสดี", "Hello"])
similarity = cosine_similarity(embeds[0], embeds[1])
JavaScript (ONNX):
import { InferenceSession } from 'onnxruntime-web';
const session = new InferenceSession();
await session.loadModel('embedding-gemma.onnx');
const embeddings = await session.run(input);
RAG Pipeline with EmbeddingGemma
1. Chunk docs → 512 token segments
2. Embed with 256D (MRL)
3. FAISS index (on-device)
4. Query → Top-K retrieve
5. Gemma 2B generate answer
Latency: 45ms/query on Snapdragon 8 Gen 3
Use Cases On-Device Embedding
🔍 Semantic search (notes, docs)
💬 Chatbot RAG (privacy-first)
🎵 Music recommendation
📧 Email clustering
📚 Book passage retrieval
Multi-Language Performance
🇹🇭 Thai: 92% MTEB multilingual
🇯🇵 Japanese: 89%
🇰🇷 Korean: 91%
🇻🇳 Vietnamese: 87%
🇮🇩 Indonesian: 88%
Model Quantization Options
| Precision | Size | Speedup | Perf Drop |
|---|---|---|---|
| FP16 | 600MB | 1x | 0% |
| INT8 | 200MB | 1.8x | 0.5% |
| INT4 | 120MB | 3.2x | 1.2% |
Edge Hardware Compatibility
✅ Snapdragon 8 Gen 3 (12ms/inference)
✅ Apple A18 (8ms)
✅ MediaTek Dimensity 9400 (15ms)
✅ Intel Lunar Lake NPU (10ms)