DriteStudio
DRITESTUDIOCloud Infrastructure
Home
ArticlesAbout UsContactStatus
0%
EmbeddingGemma Google 308M: On-Device Embedding 100+ ภาษา 200MB RAG Ready
Back to articles

EmbeddingGemma Google 308M: On-Device Embedding 100+ ภาษา 200MB RAG Ready

EmbeddingGemma 308M Gemma 3 encoder 768D→128D MRL 2048 tokens 100+ ภาษา MTEB #1 on-device RAG Android/iOS/macOS Python JS deployment latency benchmarks

ai-September 6, 2025-Updated: February 24, 2026

EmbeddingGemma: Google DeepMind 2GB RAM Embedding 768D - รองรับ 100+ ภาษา ออฟไลน์

EmbeddingGemma 308M params Gemma 3 encoder รองรับมือถือ/แล็ปท็อป 200MB quantized 2048 tokens context 100+ ภาษา MTEB top rank Matryoshka 768→128D on-device RAG

EmbeddingGemma Technical Specs

Specรายละเอียด
Params308M
Memory<200MB (INT8)
Embedding Dim768 (MRL: 512/256/128)
Context2048 tokens
Languages100+
ArchitectureGemma 3 Bi-directional
Training Data320B filtered tokens
LicenseOpen weights

Matryoshka Representation Learning (MRL)

768D → 512D: 1.2% perf drop
768D → 256D: 3.5% perf drop  
768D → 128D: 8.2% perf drop
Storage: 768D=2.3MB → 128D=400KB

MTEB Leaderboard: Sub-500M Category

1. EmbeddingGemma 308M: 64.12
2. E5-small-v2 33M: 62.53
3. BGE-small 33M: 61.22
4. Snowflake 300M: 60.85

On-Device Deployment Targets

📱 Android (TensorFlow Lite)
🍎 iOS (CoreML)
💻 macOS (Metal)
🖥️ Windows (DirectML)
🤖 Edge TPU / NPU

Quick Start Code Examples

Python (HuggingFace):

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('google/embedding-gemma')
embeds = model.encode(["สวัสดี", "Hello"])
similarity = cosine_similarity(embeds[0], embeds[1])

JavaScript (ONNX):

import { InferenceSession } from 'onnxruntime-web';
const session = new InferenceSession();
await session.loadModel('embedding-gemma.onnx');
const embeddings = await session.run(input);

RAG Pipeline with EmbeddingGemma

1. Chunk docs → 512 token segments
2. Embed with 256D (MRL)
3. FAISS index (on-device)
4. Query → Top-K retrieve
5. Gemma 2B generate answer

Latency: 45ms/query on Snapdragon 8 Gen 3

Use Cases On-Device Embedding

🔍 Semantic search (notes, docs)
💬 Chatbot RAG (privacy-first)
🎵 Music recommendation
📧 Email clustering
📚 Book passage retrieval

Multi-Language Performance

🇹🇭 Thai: 92% MTEB multilingual
🇯🇵 Japanese: 89%
🇰🇷 Korean: 91%
🇻🇳 Vietnamese: 87%
🇮🇩 Indonesian: 88%

Model Quantization Options

PrecisionSizeSpeedupPerf Drop
FP16600MB1x0%
INT8200MB1.8x0.5%
INT4120MB3.2x1.2%

Edge Hardware Compatibility

✅ Snapdragon 8 Gen 3 (12ms/inference)
✅ Apple A18 (8ms)
✅ MediaTek Dimensity 9400 (15ms)
✅ Intel Lunar Lake NPU (10ms)
Share article:
View more articles
D

DriteStudio | ไดรท์สตูดิโอ

Cloud, VPS, Hosting and Colocation provider in Thailand

Operated by Craft Intertech (Thailand) Co., Ltd.

DRITESTUDIOCloud Infrastructure

100/280 Soi 17, Delight Village, Bang Khun Thian - Chaitalay, Phanthai Norasing, Samut Sakhon 74000

Services

  • VPS Hosting
  • Dedicated Server
  • Web Hosting
  • Security Solutions

Company

  • About Us
  • Contact Us
  • System Status

Support

  • Support Ticket
  • Documentation
  • Help Center

© 2026 Craft Intertech (Thailand) Co., Ltd. All rights reserved.

Privacy PolicyTerms of ServiceRefund Policy

We use cookies

We use cookies to enhance your browsing experience, analyze site traffic, and personalize content. By clicking "Accept All", you consent to our use of cookies. Privacy Policy