Introduction

A few years ago, the first AI systems were developed. But back then, they struggled with basic concepts and failed to understand context. However, things have changed dramatically in the past years. We switched to transformer-based architecture which allowed us to create models that "store" and connect information across billions of parameters.

Visual models

One of the most useful AI systems currently available are Vision Large Language Models (Vision LLMs) which represent a new class of systems designed for real world operations.

Unlike traditional technologies, their deep visual understanding and continuous monitoring enable information gathering at such level businesses have never had access to before.

Whether you are in healthcare, manufacturing, or even primary sectors such as mining, Vision LLMs unlock pattern recognition that past systems are fundamentally unable to provide.

Our approach

Most big-tech AI is trained on absurd volumes of data, basically learning it to know everything. It's true that this type of AI can reason well but at what cost? Our approach to AI is quite the opposite. Even though it's true that we use the same principles - but on a smaller, more practical scale. Our goal isn't building general purpose AI that tries to answer everything. Instead, we focus on specialized compact models designed for niche specific applications and problems.

Why it matters?

Imagine using a massive AI model that understands nearly every concept, instead of a neatly designed, niche-specific model build by us. Large models require significantly more compute and since AI works in cycles (sending data, evaluating, sending again, etc.) they execute on far slower that smaller models. Of course you can reason that huge AI can help you with all the tasks in your company but ask yourself one question. What's the purpose having one not specialized "person" doing all the specialized parts of the work? Its the same like why lawyers write legal documents and not code. They can certainly do it, over time, but will it be better than from a developer? We think you get the point.

Training

In an ideal world, we should train every model from scratch for each client to make it trully specific. But full training requires enormous amounts of data, computing power, and time - which makes it too expensive and inefficient for most use cases. Instead, we rely on fine-tuning. Bassically we learn existing pre-trained models to adapt and perform specialized tasks. We choose high-quality base model which we customize to achieve nearly the same performance as training a model from scratch. There are some limitations, of course, but it's not something you will notice when we deploy it. So, how we do it?

Through QLoRA

Or in full, Quantized Low-Rank Adaptation which is our go to method when making custom models. Basically we fine-tune a model using only a portion of the parameters, with up to 90% of the precision of full fine-tuning. Here's how it works in simpler terms: We take a base model, "freeze" most of it, reduce its size through quantization (either 4-bit or 8-bit), and then we add and train layers to learn model the new information.

Adding RAG

To further enhance the behavior of some of the models we add RAG (Retrieval-Augmented Generation). It allows the model to pull data from a verified source - company documents (photos, data, databases, etc.) which creates a "validation barrier" for the model. This means that if your model is finance-oriented, with RAG it will never show you other than finance-oriented outputs.

Custom AI