What does X mean?
Gen AI: This can mean 2 things: General Artificial Intelligence or Generative Artificial Intelligence. This is a misleading play on words.
General Artificial Intelligence is the focus of the philosophical debates, theories about World Models, and many more.
Generative Artificial Intelligence is our current technology. It specifically refers to the underlying technological insights that led to the transformer architecture that supports all of the AI currently available. Specifically, it’s the generative part of that technology that generates new text from your inputs
AI Breakdown:
AI is used to describe a multitude of technologies, many of which are not considered AI by data scientists and decision data scientists
Most AI is still Machine Learning. A decades (some can even say centuries) old discipline that works with numeric data like spreadsheets, quarterly reports, anomaly detection in industrial machinery, and yes, the majority of robotics
Gen AI, and the technology of Large Language Models, are part of NLP: Natural Language Processing. Itself, a decades old field. It is the ability to process text in a way that can be used mathematically as numbers instead of words.
Machine Learning vs Generative AI
Machine learning is limited by what it’s seen. It cannot categorize text into a new category it was not trained on. It expects tomorrow to be the same as today, and today to be the same as yesterday. Do not mistake this as a reason not to use it, but instead as a thoughtful consideration.
Generative AI can work directly on text, be it code or news articles, and with its vast training sources, it overcomes these limitations by having “already seen” huge numbers of possibilities.
Generative AI is not good with numerical data, like stock market analysis, predicting when a machine will fail, or the success of a sales promotion.
Choosing the right tool, and making sure you have good data, is a universal constant for all AI
LLM: Large Language Models are the core technology of all Generative AI. When you hear of a “model”, this is what is being referred to. It was a breakthrough discovery by Google in 2013 that unlocked the ability for neural networks to effectively engage with text.
RAG: Retrieval Augmented Generation is a fusion of an LLM with information retrieval. An LLM, just like Machine Learning, only knows what it has seen. Internal documents, unique to your company, have not been seen by it.
Information Retrieval is a search engine. It can be simple, or complex. Google built its business off of perfecting information retrieval.
A search engine retrieves documents you have and adds them to the prompt you entered.
The LLM’s capabilities allow it to partially adapt to this information and provide a more accurate and more tailored answer.
It can reduce hallucinations, but it can also increase hallucinations.
Many RAG based solutions fail because the retrieval system is not tailored to the problem. The LLM cannot fix missing data or overcome bad data.
Grounding: What domain specific information is added through a RAG process.
When you type in a question or prompt about a document, some or all of that document is fetched and added after your question or prompt.
This is done behind the scenes by a RAG process, often using code based processes in Python with tools like LangChain
Context: Grounding + explaining the grounding and context of the solution.
Only providing parts of documents (often called Chunks) can increase hallucinations because the documents are added with no context about how to use them or what they mean.
Think of it in human terms: if you saw only parts of documents for the first time, how likely are you to be able to answer a question?
Context provides additional text around the grounding to explain what it is, how to use it, when it matters, and why it’s meaningful.
Chunks: Parts of a document that has been broken up through some type of algorithmic process
When you load all your documents into a RAG based solution, only parts of those documents are retrieved based on the prompt or question
Most Chunking is done by the number of characters and/or “tokens”. Often around 300 words long.
These are of fixed length and slide across your documents to break them up.
The idea is that chunks will have enough information to satisfy specific requests made to the Large Language Model
Chunking strategy is critical to making sure you are getting all the information needed to support an answer. 300 words at a time is often not enough.
Tokens: A token is a piece of a word. Similar to how languages have a “root” of a word, like “spend”, and suffixes like “-ing”, “-s”. Tokens break up a word into these combinations and assigned a numerical value in a large language model. Through that, they can be processed using the neural network in the Large Language Model. The tokens the LLM produces are then converted back from this numerical value into the characters that make up words.
Vector Database
Vector Search
Full Text Search
Hybrid Search
Dense Search