LLAMA3.2 Nepali 318M Model

A 318M parameter LLAMA3.2 model fine-tuned on a Nepali text dataset for generating coherent and contextually relevant Nepali text.

PythonPyTorchTransformersHugging FaceLLAMA3.2

📁 View Code 🚀 Live Demo

Overview

A 318M parameter LLAMA3.2 model fine-tuned on a Nepali text dataset for generating coherent and contextually relevant Nepali text.

Resources

Base Model: Hugging Face
Chat Interface: Hugging Face Space
Dataset: IRIISNEPAL/Nepali-Text-Corpus and nepberta
Reference Book: Build a Large Language Model (From Scratch) by Sebastian Raschka, PhD

Installation

To install the required dependencies, run:

pip install datasets huggingface_hub matplotlib transformers torch --quiet

Usage Guide

1. Download Model Weights

from huggingface_hub import hf_hub_download

hf_hub_download(

repo_id="Aananda-giri/LLAMA3-Nepali",

filename="parameters_300m/model_pg_398000_steps.pth",

local_dir="./"

)

2. Load the Tokenizer

from transformers import PreTrainedTokenizerFast

tokenizer = PreTrainedTokenizerFast.from_pretrained("Aananda-giri/LLAMA3-Nepali")

tokenizer.save_pretrained("NepaliBPE")

3. Download Additional Scripts

import requests

res = requests.get("https://raw.githubusercontent.com/Aananda-giri/LLAMA3-Nepali/main/3.%20training_loop/previous_chapters.py")

with open('previous_chapters.py', 'w') as f:

f.write(res.text)

4. Load the Model

import torch

from previous_chapters import Llama3Model, ChatFormat, Tokenizer, generate_and_print_sample

# Initialize tokenizer

_tokenizer = Tokenizer("NepaliBPE/tokenizer.json")

chat_tokenizer = ChatFormat(_tokenizer)

# Define model configuration

LLAMA32_CONFIG = {

"vocab_size": 50006,

"context_length": 512,

"emb_dim": 1320,

"n_heads": 20,

"n_layers": 10,

"hidden_dim": 5280,

"n_kv_groups": 5,

"rope_base": 500_000.0,

"dtype": torch.bfloat16

}

5. Generate Text

# Generate text sample

generate_and_print_sample(

PROMPT="रामले भात",

tokenizer=_tokenizer,

chat_tokenizer=chat_tokenizer,

model=model,

device=device,

context_length=LLAMA32_CONFIG["context_length"]

)

Advanced Text Generation

from previous_chapters import generate_chat_optimized

import time

start_time = time.time()

output_text = generate_chat_optimized(

prompt="रामले भात",

tokenizer=tokenizer,

chat_tokenizer=chat_tokenizer,

model=model,

max_new_tokens=20,

context_size=512,

device=device,

temperature=0.3,

top_k=5,

repetition_penalty=1.2

)

print(f"time:{time.time() - start_time}\n output_text: {output_text}")

Technologies Used

PythonPyTorchTransformersHugging FaceLLAMA3.2

Links

GitHub Repository →

Live Demo →

🚀 Happy coding and enjoy experimenting with LLAMA3.2 Nepali! 🤗🎉