Building Edge AI Infrastructure with KVM, openSUSE, and Ollama

26. Aug 2025 | Rudraksh Karpe | CC-BY-SA-3.0

Community Contribution

Edge AI infrastructure is transforming how we deploy machine learning workloads, bringing computation closer to data sources while maintaining privacy and reducing latency. This comprehensive guide demonstrates building an edge analytics platform using KVM virtualization, openSUSE Leap (15.6), K3s, and Ollama for local AI inference.

Our architecture leverages a KVM homelab infrastructure originally set-up by my Google Summer of Code Mentor. This set-up was built to create specialized AI nodes in a distributed cluster, with Longhorn providing shared storage for models and application data. Each component is chosen for reliability, scalability, and edge-specific requirements.

Prerequisites and Architecture Overview

This setup requires:

KVM hypervisor with existing infrastructure
Minimum 8GB RAM per VM (16GB recommended for larger models)
Network storage for distributed file system
Basic Kubernetes and networking knowledge

The final architecture includes multiple specialized nodes, distributed storage, monitoring, and load balancing for high-availability AI inference.

VM Foundation Setup

Creating the Edge AI Node

Start with a clean VM deployment using established automation tools:

cd ~geeko/bin/v-i
sudo ./viDeploy -c ./VM-K3s.cfg -n edge-ai-01

System Configuration

Complete the openSUSE installation with consistent settings across all nodes:

Installation Settings:

Keyboard: US
Timezone: UTC
Root password: gsoc (consistent across cluster)

Network Configuration:

# Configure static networking
cd /etc/sysconfig/network
cp ifcfg-eth1 ifcfg-eth0
vi ifcfg-eth0

Edit network configuration:

STARTMODE=auto
BOOTPROTO=static
IPADDR=172.xx.xxx.xx/24

Set hostname and disable firewall for edge deployment:

hostnamectl set-hostname edge-ai-01
echo "172.xx.xxx.xx edge-ai-01.local edge-ai-01" >> /etc/hosts
systemctl disable --now firewalld
systemctl restart network

Essential Package Installation

Install required components for Kubernetes and distributed storage:

zypper refresh
zypper install -y open-iscsi kernel-default e2fsprogs xfsprogs apparmor-parser
systemctl enable --now iscsid

Storage Configuration for Longhorn

Prepare dedicated storage for distributed AI workloads:

lsblk
fdisk /dev/vdb
# Create new GPT partition table and primary partition
mkfs.ext4 /dev/vdb1
mkdir -p /var/lib/longhorn
echo "/dev/vdb1 /var/lib/longhorn ext4 defaults 0 0" >> /etc/fstab
mount -a
systemctl reboot

Kubernetes Cluster Integration

Joining the Edge AI Cluster

Access your Rancher management interface to create a dedicated AI cluster:

Navigate to Rancher WebUI: http://172.16.200.15
Create → Custom cluster
Name: edge-ai-cluster
Select K3s version
Copy and execute registration command:

curl -fsSL https://get.k3s.io | K3S_URL=https://172.xx.xxx.xx:6443 K3S_TOKEN=your-token sh -

Verify cluster connectivity:

kubectl get nodes
kubectl get pods --all-namespaces

Ollama Installation and Configuration

Installing Ollama

Deploy Ollama for local LLM inference:

curl -fsSL https://ollama.com/install.sh | sh
systemctl enable --now ollama

Cluster Access Configuration

Configure Ollama for distributed access:

mkdir -p /etc/systemd/system/ollama.service.d
vi /etc/systemd/system/ollama.service.d/override.conf

Add cluster binding:

[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

Apply configuration:

systemctl daemon-reload
systemctl restart ollama

Model Deployment Strategy

Deploy models based on hardware capabilities:

# For 8GB RAM nodes - quantized models
ollama pull phi3

# For 16GB+ RAM nodes - higher precision
ollama pull phi3

# Verify installation
ollama list

Quantized models (q4_K_M) reduce memory usage by ~75% while maintaining performance for edge analytics.

Edge Analytics Platform Deployment

Repository Setup

Clone the Edge Analytics ecosystem:

git clone https://github.com/rudrakshkarpe/Edge-analytics-ecosystem-workloads-openSUSE.git
cd Edge-analytics-ecosystem-workloads-openSUSE

Configuration for Cluster Deployment

Update Kubernetes manifests for distributed deployment:

vi k8s-deployment/streamlit-app-deployment.yaml

Modify Ollama endpoint:

- name: OLLAMA_BASE_URL
  value: "http://172.xx.xxx.xx:11434"

Application Deployment

Deploy in correct order for dependency resolution:

kubectl apply -f k8s-deployment/namespace.yaml
kubectl apply -f k8s-deployment/storage.yaml
kubectl apply -f k8s-deployment/streamlit-app-deployment.yaml
kubectl apply -f k8s-deployment/ingress.yaml

Verify deployment status:

kubectl get pods -n edge-analytics
kubectl logs -f deployment/edge-analytics-app -n edge-analytics

Distributed Storage with Longhorn

Longhorn Deployment

Deploy distributed storage system:

kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml

Wait for all pods to be running:

kubectl get pods -n longhorn-system -w

Configure Default Storage Class

Set Longhorn as default for persistent volumes:

kubectl patch storageclass longhorn -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

Multi-Node Scaling and Specialization

Additional Node Deployment

Scale the cluster with specialized nodes:

Node IP Assignment:

edge-ai-02: 172.16.220.11
edge-ai-03: 172.16.220.12

Node Labeling for Workload Distribution

Label nodes based on capabilities:

# GPU-enabled nodes
kubectl label node edge-ai-02 node-type=gpu-inference

# CPU-optimized nodes
kubectl label node edge-ai-03 node-type=cpu-inference

Specialized Model Deployment

Deploy appropriate models per node type:

# GPU nodes - larger models
ssh root@172.16.220.11
ollama pull phi3

# CPU nodes - optimized quantized models
ssh root@172.16.220.12
ollama pull phi3

Production Monitoring and Operations

Monitoring Stack Deployment

Deploy comprehensive observability:

kubectl apply -f k8s-deployment/monitoring.yaml

Service Access

For development and testing access:

# Edge Analytics application
kubectl port-forward svc/edge-analytics-service 8501:8501 -n edge-analytics

# Prometheus metrics
kubectl port-forward svc/prometheus 9090:9090 -n monitoring

# Grafana dashboards  
kubectl port-forward svc/grafana 3000:3000 -n monitoring

Operational Commands

Model Management:

# Check model status across cluster
kubectl exec -it daemonset/ollama -n edge-analytics -- ollama list

# Update models cluster-wide
kubectl exec -it daemonset/ollama -n edge-analytics -- ollama pull llama3:latest

Scaling Operations:

# Horizontal scaling
kubectl scale deployment edge-analytics-app --replicas=3 -n edge-analytics

# Resource monitoring
kubectl top nodes
kubectl top pods -n edge-analytics

Access Points and Integration

Service URLs:

Edge Analytics UI: http://172.xx.xxx.xx:8501
Rancher Management: http://172.16.200.15
Prometheus Metrics: http://172.xx.xxx.xx:9090
Grafana Dashboards: http://172.xx.xxx.xx:3000 (admin/admin)

Key Advantages of This Architecture

Privacy-First: All AI inference happens locally, ensuring data never leaves your infrastructure
Scalable: Kubernetes orchestration enables easy horizontal scaling as workloads grow
Resilient: Distributed storage and multi-node deployment provide high availability
Cost-Effective: Utilizes existing hardware infrastructure without cloud dependencies
Flexible: Support for various model sizes and quantization levels based on hardware

Troubleshooting Common Issues

VM Connectivity:

virsh list --all
virsh console edge-ai-01

Kubernetes Issues:

kubectl describe node edge-ai-01
kubectl get events --sort-by=.metadata.creationTimestamp

Ollama Service Problems:

systemctl status ollama
journalctl -u ollama -f
curl http://172.xx.xxx.xx:11434/api/tags

This edge AI infrastructure provides a robust foundation for deploying local LLMs with enterprise-grade reliability, enabling organizations to leverage AI capabilities while maintaining complete control over their data and compute resources.

For advanced configurations and additional features, explore the complete repository documentation and consider integrating with external tools like vector databases for enhanced RAG capabilities.

Huge shoutout to my mentors Bryan Gartner, Terry Smith, Ann Davis for making this set-up possible.

Categories: Tutorials openSUSE Edge Computing AI Infrastructure Google Summer of Code

Tags: KVM openSUSE openSUSE Leap Ollama Kubernetes Edge AI Virtualization K3s Longhorn Local LLM