← Back to Projects
LLAMA3.2 Nepali 318M Model
A 318M parameter LLAMA3.2 model fine-tuned on a Nepali text dataset for generating coherent and contextually relevant Nepali text.
PythonPyTorchTransformersHugging FaceLLAMA3.2
Overview
A 318M parameter LLAMA3.2 model fine-tuned on a Nepali text dataset for generating coherent and contextually relevant Nepali text.
Resources
- Base Model: Hugging Face
- Chat Interface: Hugging Face Space
- Dataset: IRIISNEPAL/Nepali-Text-Corpus and nepberta
- Reference Book: Build a Large Language Model (From Scratch) by Sebastian Raschka, PhD
Installation
To install the required dependencies, run:
pip install datasets huggingface_hub matplotlib transformers torch --quiet
Usage Guide
1. Download Model Weights
from huggingface_hub import hf_hub_download
hf_hub_download(
repo_id="Aananda-giri/LLAMA3-Nepali",
filename="parameters_300m/model_pg_398000_steps.pth",
local_dir="./"
)
2. Load the Tokenizer
from transformers import PreTrainedTokenizerFast
tokenizer = PreTrainedTokenizerFast.from_pretrained("Aananda-giri/LLAMA3-Nepali")
tokenizer.save_pretrained("NepaliBPE")
3. Download Additional Scripts
import requests
res = requests.get("https://raw.githubusercontent.com/Aananda-giri/LLAMA3-Nepali/main/3.%20training_loop/previous_chapters.py")
with open('previous_chapters.py', 'w') as f:
f.write(res.text)
4. Load the Model
import torch
from previous_chapters import Llama3Model, ChatFormat, Tokenizer, generate_and_print_sample
# Initialize tokenizer
_tokenizer = Tokenizer("NepaliBPE/tokenizer.json")
chat_tokenizer = ChatFormat(_tokenizer)
# Define model configuration
LLAMA32_CONFIG = {
"vocab_size": 50006,
"context_length": 512,
"emb_dim": 1320,
"n_heads": 20,
"n_layers": 10,
"hidden_dim": 5280,
"n_kv_groups": 5,
"rope_base": 500_000.0,
"dtype": torch.bfloat16
}
5. Generate Text
# Generate text sample
generate_and_print_sample(
PROMPT="रामले भात",
tokenizer=_tokenizer,
chat_tokenizer=chat_tokenizer,
model=model,
device=device,
context_length=LLAMA32_CONFIG["context_length"]
)
Advanced Text Generation
from previous_chapters import generate_chat_optimized
import time
start_time = time.time()
output_text = generate_chat_optimized(
prompt="रामले भात",
tokenizer=tokenizer,
chat_tokenizer=chat_tokenizer,
model=model,
max_new_tokens=20,
context_size=512,
device=device,
temperature=0.3,
top_k=5,
repetition_penalty=1.2
)
print(f"time:{time.time() - start_time}\n output_text: {output_text}")
Technologies Used
PythonPyTorchTransformersHugging FaceLLAMA3.2
🚀 Happy coding and enjoy experimenting with LLAMA3.2 Nepali! 🤗🎉