RAGged Edge Box:
A private, single-user
AI-based IR system

Presenter Notes

Project page

https://textualization.com/ragged/
Preview VM available for download

Presenter Notes

Screenshots: (1) Upload

Upload The Art of Feature Engineering

Presenter Notes

Screenshots: (2) Upload results

4 minutes later...

Presenter Notes

Screenshots: (3) Query

When it is advisable to use feature normalization?

Presenter Notes

Screenshots: (4) Answered

Half a minute

Presenter Notes

Screenshots: (5) Other passages

Answer from here

Presenter Notes

Attendee personas

Scientist [Sci]
- Research outreach without hosting expensive servers
Potential user [Usr]
- Values privacy, don't want to pay server fees
Potential re-user [Tech]
- Likes the VM technology for a different stack

Presenter Notes

Attendee personas (cont.)

Potential ISV [Tech$]
- Likes to build (and sell) solutions on top
- Knows PHP
Potential contributor [Tech]
- Values Free software
- Interested in AI

Presenter Notes

This talk

~~RAGged Edge Box demo [User]~~
AI Concepts (RAG, LLMs, Embeddings) [User, Tech]
RAG Concepts (IR, chunk, prompt) [User, Tech]
RAGged Edge Box (concept, advantages) [User]
RAGged Edge Box Architecture [Tech]
Enabling Technology Bits (ONNX, PHP Semantic Search) [Sci, Tech]
Extension Points [Sci, Tech$]
VM Packaging [Tech]
RAGged Edge Box as a Platform [Tech$]

Presenter Notes

AI Concepts
[User, Tech]

Presenter Notes

What is RAG

Retrieval Augmented Generation: combine LLMs with existing information
Retrieve information
- Search engines, including embedding-based ones
Give it to the LLM as input
- In the prompt
Ask the LLM to do things with that information
- For example, to answer a question

Presenter Notes

What are LLMs

Cellphone autocomplete is a language model
It "knows" things
- I really want to...
  - eat empanadas
  - learn about LLMs
A large language model is similar but having seen tons of text
- Some of those texts contain exams and their answers
- Or directly "instructions" to make the LLM more useful

Presenter Notes

What are Embeddings

Encode information by projecting it to a fixed space
In the case of RAG and "semantic search" the input is a text span, the output is a fixed vector of floating point numbers (e.g., 384 numbers)
The hope is that text conveying similar meaning will be represented by vectors close to each other in the Euclidean space

Presenter Notes

Embeddings Issues

Many open questions
- How to generate these embeddings?
  - Siamese neural networks
- What type of semantic information is being captured?
- How big is the span of text to make sense to compare the generated vectors?
- What if one of the texts is short (like a question) and the other is long (like a paragraph)?
  - Asymmetric embeddings

Presenter Notes

RAG Concepts
[User, Tech]

Presenter Notes

Answer extraction using LLMs

LLMs can do many things (most of them badly)
Answer extraction is one they are good at. Given the text:

This text talks about many things. Among them is how 42 is a number featured in the Hitchhiker's Guide to the Galaxy, the comedy work by Douglas Adams.

We can ask the LLM to extract the answer to the question "Why people are obsessed with the number 42?"

People are obsessed with the number 42 because it is featured in the Hitchhiker's Guide to the Galaxy, a comedy work by Douglas Adams.

Presenter Notes

RAG vs 'Hallucinations'

Asked GPT-4 "Why people are obsessed with the number 42?"

The obsession with the number 42 largely comes from its significance in the popular science fiction novel "The Hitchhiker's Guide to the Galaxy" by Douglas Adams. In the story, a group of hyper-intelligent beings builds a supercomputer named Deep Thought to calculate the "Answer to the Ultimate Question of Life, the Universe, and Everything." After much anticipation, Deep Thought reveals that the answer is simply the number 42.

Since the publication of the book, ...

Presenter Notes

IR

Information retrieval has been around since the 1950s
- One of the reasons we wanted computers, to begin with
GenAI is great but it hasn't improved IR as much
- RAG hinges on good IR
- If the information to answer the question is not retrieved, there is not much the LLM can do about it
- That doesn't mean the LLM will admit defeat, getting it to say "I don't know" is a tall order
It's the GenAI revolution, not the IR revolution

Presenter Notes

IR SOTA

Currently we have IR systems using:
- keyword search
- complex search queries (keywords plus operators)
- embeddings
The best performance uses a combination of these approaches

Presenter Notes

Chunk

The information provided to the LLM in a RAG system is usually a segment of the relevant document
- This segment is called a "chunk" of the document
The full document cannot be processed by the LLM that have a maximum processing size including the input information and the answer output
- Local LLMs are between 400 words to 1500 words
- Commercial LLMs available through APIs can process full books

Presenter Notes

Chunk Issues

Chunk size:
- The size should be large enough to answer questions
- But small enough to fit into the LLM input and be semantically coherent to produce viable embeddings
Multi-chunk processing
- Provide multiple chunks to the LLM at once
- That might exhaust the input and confuse the LLM
- Some questions need information from multiple sources

Presenter Notes

Prompt

How to structure the input is called the "prompt" of the LLM
Different LLMs need different prompts
- They can be sensitive to minuscule changes (like a carriage return character at the end of the prompt)

Presenter Notes

RAG Prompt Example

1 Use the following pieces of context to answer the question 
2 at the end. If you don't know the answer, just say that you 
3 don't know, don't try to make up an answer.
4
5 {context}
6
7 Question: {question}
8 Answer:

Presenter Notes

RAGged Edge Box
[User]

Presenter Notes

RAGged Edge Box

RAGged Edge Box is a RAG system implemented as edge computing
- There is no back-end, nothing runs "in the cloud"
Self-contained virtual machine with a bare-bones linux set-up with the key components for RAG:
- An embedding model and associated execution code
- A local LLM and its associated server
- A standalone search engine supporting both keywords and embeddings
- Web-accessible software to upload documents, index them and query them

Presenter Notes

Privacy

Searching in a document collection should not involve giving access to the documents to a third party
"Cloud" is a misnomer: it is just someone else's computer

Presenter Notes

Technical Sovereignty

Solutions relying on APIs hosted in other countries are not particular soverign
Solutions relying on expensive hardware are not particular soverign

Presenter Notes

Against Planned Obsolescence

Tired of tools that no longer work after a few months?
- System Python upgrades break virtual environments
VM life cycle independent of the operating system life cycle
- Download a tool that will remain useful for years
Dependencies packed in long terms storage solutions:
- Docker
- Debian stable
- Composer

Presenter Notes

RAGged Edge Box Architecture [Tech]

Presenter Notes

Architecture

from PHP Semantic Search:

Presenter Notes

Enabling Technology Bits
[Sci, Tech]

Presenter Notes

ONNX

https://onnx.ai/
Open Neural Network Exchange
Specify the neural network graph in a vendor-independent manner
Train using any framework, execute using a different framework
- Execute without large dependecies
Open source ONNX runtime created and maintained by Microsoft
- Runtime phones home without special parameters
Packed for PHP by Andrew Kane: https://packagist.org/packages/ankane/onnxruntime

Presenter Notes

llama.cpp

https://github.com/ggerganov/llama.cpp
Transformer implementation in C++
High performance execution on CPU
llama-server allows for local API calls
Supports LLMs in gguf format
- Allows for mixed execution in low-RAM GPUs

Presenter Notes

Why PHP

Large installed code bases (Wikimedia, WordPress, Nextcloud)
Used by nearly 80% of all websites
Brainpower available worldwide
A PHP install is simpler and smaller than a Python/Java/etc install
- parsing/compilation/execution/cleanup is a functional paradigm. Side effects go to the DB
11th most popular programming language in 2023

Presenter Notes

PHP Semantic Search classes

A subcontract with EvoluData, a Quebec company specialized in PHP solutions.
- Funded through an AI innovation grant from the Quebec government
https://packagist.org/packages/textualization/semantic-search
- KeywordIndex.php Okapi BM25 implemented on top of SQLite3 text search
- VectorIndex.php embedding search using SQLite3 Vector Search (FAISS) extension
Support for HyDE and local LLM embeddings

Presenter Notes

Example

1\Textualization\SemanticSearch\Ingester::ingest([ 
2   "location"=>"index.db",
3   "class"=>"\\Textualization\\SemanticSearch\\VectorIndex" 
4 ], [], "docs.jsonl");

Presenter Notes

Sentence Transformers (Embeddings)

Presenter Notes

Sentence Transformers (Embeddings)

1use \Textualization\SentenceTransphormers\SentenceRopherta;
2
3$model = new SentenceRopherta();
4$emb = $model->embeddings("Text");
5
6// alt. using the semantic search classes
7
8$e = \Textualization\SemanticSearch\SentenceTransphormerEmbedder();
9$emb = $e->encode("Text");

Presenter Notes

Reverse Engineering Huggingface Components

This was by far the most time consuming aspect of the work last year

Presenter Notes

Extending RAGged Edge Box
[Sci, Tech]

Presenter Notes

New IR

Subclass Index.php
- on GitHub

Presenter Notes

New LLM

Put it in download and change the box/launch-llama.sh script to use it

Presenter Notes

Better document handling

Change site/upload.php

Presenter Notes

VM Packaging
[Tech]

Presenter Notes

Generating Virtual Box images programmatically

Project structure (on GitHub)
Dockerfile
make-image.sh

Presenter Notes

RAGged Edge Box as a Platform
[Tech]

Presenter Notes

Business

My hope is that the project enables ISVs to adapt the PHP code for customer specific needs:
- Specific document segmentation and detagging
- Improved IR using faceted search
- Handling additional file formats
- Plugging in more performat IR engines (e.g., Manticore/MariaDB vector)
Not all GenAI money should find its way back to Nvidia/Microsoft

Presenter Notes

Status

Full automatic VM creation
Missing functionality:
- Deletion
- Hybrid embeddings + keywords
- Keywords most probably doesn't work due to chunk size
- Upgrade
- API

Presenter Notes

Multilinguality

The original project for PHP Semantic Classes was in French
- A complex tokenizer handling 100 languages (SentencePiece) was also migrated to PHP:
https://packagist.org/packages/textualization/sentencepiece
Multilingual embeddings (supporting English, Spanish, French +100 languages):
- https://huggingface.co/intfloat/multilingual-e5-small
- The current VM does not have these requirements installed

Presenter Notes

Multilinguality: Needs

We need a small local LLM that can do multilingual answer extraction
- Or at least in Spanish
- Ideas?

Presenter Notes

Contributing to the Project

https://github.com/Textualization/the-ragged-edge-box/
https://textualization.com/ragged/
Non-trivial PR will be listed as vetted ISVs

Presenter Notes

Other Announcements

Apache UIMA-CPP
GSoC
llama.cpp Annotator: https://github.com/Textualization/LlmAnnotator

Presenter Notes

Conclusions

It is time to go back to the P in NLP
- Natural Language Processing
Successful LLM deployments need a lot of programming and smarts outside the LLM bits
The RAGged Edge Box project allows new players versed in traditional programming to join the field

Table of Contents	t
Exposé	ESC
Presenter View	p
Source Files	s
Slide Numbers	n
Toggle screen blanking	b
Show/hide next slide	c
Notes	2
Help	h

Table of Contents

Help