Edge AI using LLMs icon

Edge AI using LLMs

Active ProjectMachine Learning

This project explores the deployment of Large Language Models (LLMs) on the NVIDIA Orin Nano Super — a powerful yet compact edge computing platform. The goal is to assess performance, latency, and real-world usability of AI inference workloads in constrained environments. It combines hardware acceleration with software optimization to demonstrate intelligent edge solutions capable of processing natural language locally, without relying on cloud infrastructure. This research provides foundational insight for building secure, low-latency AI systems for robotics, IoT, and offline applications.

5
Total Feedback
Dec 2
Last Updated

Key Features

HardwareAI/MLSoftware

Tech Stack

LLMNVIDIA Jetson Orin NanoONNX Runtime / TensorRTPython+5 more

Image Gallery

11 images
projects/1764717498301-bkfn9.png
EdgeSense - Chat Window
projects/1764717497364-xkuh3.png
EdgeSense - Available and Download Models
projects/1764717496487-yscylb.png
EdgeSense - System Performance
projects/1764717495732-h2pred.png
EdgeSense - Setting Page
projects/1764717494947-7zzugr.png
EdgeSense - API Server
projects/1764717493751-ds8a7r.png
EdgeSense - MCP Server
WorkingTextandResources.png
v0.2.0 Working Resourced and Selectable models
OllamaChatInterface.png
v0.1.0 Ollama front end with markdown rendering
IMG_8356.jpeg
V0.1.0 Ollama front end working
IMG_8355.jpeg
TensorFlow and Ollama install complete
projects/1749938069629-zymbrj.jpeg
Nvidia Jetson Orin Nano

Loading updates...

Project Roadmap

Loading timeline...

Upcoming Features

4
As a user, I want to be able to securely connect to the Orin Nano via SSH and RDC to facilitate remote installation, testing, and debugging. This will affect the system's network security configurations and require updates to the deployment script (`deploy_and_run.sh`) to automate the setup of these services.Planned
Medium Priority
As a user, I will be able to enhance my research by using a new search function that integrates with the existing AI model. This feature will allow me to input queries that the AI can use to retrieve and provide additional contextual data from external sources, improving the depth and relevance of the responses I receive. This will affect the main application interface, specifically by introducing a new search bar component on the homepage and displaying search results in a designated section alongside the existing model outputs.Planned
Medium Priority
As a user, I want to be able to add, view, and edit textual documentation for each model because it will provide additional context and insights, making model management more informative and efficient. This will affect the model management interface, specifically introducing a new component for text input and display within the model entries.Planned
Medium Priority
New button on home page. Completed
High Priority

Known Issues

0
No known issues

Documents & Files

2 files
EdgeSense v0.1 MacOS Application
Install Document - ReadMe

Project Challenges

Challenge 1: Real-time System Monitoring
The first challenge was getting live CPU, GPU, and RAM data from the Jetson into the browser. The tegrastats utility provides this information, but it's a command-line tool.

Challenge 2: Managing Memory on a Constrained Device
The Jetson Orin Nano is powerful, but its 8 GB of RAM can be quickly consumed by larger language models. Early in testing, I found the system would become unresponsive or even freeze if I tried to load a model that was too large for the available memory.

Challenge 3: Smoothly Streaming Chat Responses
I wanted the chat to feel interactive, with the model's response appearing token by token, just like in popular applications. Ollama's API supports streaming, but handling this correctly on the frontend was tricky. Initial implementations resulted in garbled text or the entire response appearing at once.

Project Solutions & Learnings

Learning: I learned that for security reasons, a browser cannot directly execute local commands. The solution was to create the Python Flask "stats helper" API. This reinforced the principle of using a simple, dedicated microservice to bridge gaps between different parts of a system.

Learning: I gained a much deeper understanding of how to work with streaming APIs in React. It required careful state management to append new chunks of data to the existing message and re-render the component efficiently, creating the smooth, "typing" effect I was aiming for.

Learning: Resource management is paramount on edge devices. This led to the implementation of the "RAM guard-rail" feature. Before allowing a user to chat, the frontend fetches the model's size from the Ollama API and checks it against the free system RAM reported by the stats helper. If there isn't enough memory (with a safety margin), the chat input is disabled. This simple check dramatically improved the stability and user experience of the application.