What You Will Learn
- What Retrieval-Augmented Generation means and why it matters.
- How text embeddings help AI understand semantic meaning.
- Why Azure AI Search is useful as the retrieval layer in RAG.
- How to manually chunk documents, generate embeddings, store vectors, and retrieve relevant content.
- How to use retrieved content to generate grounded AI responses.
Introduction
Modern AI applications are no longer limited to answering questions based only on pre-trained knowledge. Today, businesses want AI systems that can search through documents, understand internal knowledge, and generate grounded responses based on real data.
This is where RAG, or Retrieval-Augmented Generation, becomes important.
Before we understand Azure AI Search, we first need to understand what RAG is and why it has become such an important concept in modern AI applications.
What is RAG?
RAG is an AI approach that combines the power of Large Language Models with external data sources. Instead of relying only on the information the AI model was trained on, RAG first retrieves relevant information from documents, databases, or search indexes and then uses that information to generate a more accurate and context-aware response.
This approach is especially useful for enterprise applications where AI needs access to up-to-date business data, internal documents, PDFs, knowledge bases, or SharePoint content. Rather than guessing an answer, the AI can retrieve the most relevant information and generate responses based on actual data.
- Retrieve relevant information from a data source.
- Augment the AI prompt with that information.
- Generate a smarter and more accurate response.
This is where Azure AI Search becomes extremely important, as it helps power the retrieval layer of a RAG-based solution.
What is Azure AI Search?
Azure AI Search is a managed search service from Microsoft Azure. You send it structured documents, define which fields should be searchable, filterable, or retrievable, and then call an API to return ranked results.
It is commonly used for knowledge bases, product catalogs, internal portals, document discovery, and retrieval-augmented generation patterns.
The simplest mental model is this:
Why Start with the Free Tier?
The Free tier is useful for learning, demos, tutorials, and small proof-of-concept work. As of May 15, 2026, Microsoft documents the Free tier as an always-free service that does not consume Azure free account credits, provides 50 MB of storage, and allows one free search service per Azure subscription.
When working with the Free tier of Azure AI Search, it is important to understand some limitations, especially when building modern RAG solutions with vector search.
Although Azure AI Search supports vector search capabilities, some advanced AI enrichment and integrated vectorization scenarios are more suitable for higher pricing tiers. In many learning or proof-of-concept scenarios, developers manually generate embeddings using services such as Azure OpenAI embedding models and then upload those vectors into Azure AI Search. That is exactly what we will do in this blog.
Understanding Text Embeddings
Before continuing, it is important to understand what a text embedding model actually does.
A text embedding model converts human-readable text into numerical representations called vectors or embeddings. These vectors capture the semantic meaning of the text rather than just matching keywords.
For example, consider these two sentences:
How do I reset my password?
I forgot my account password.
Even though the wording is different, an embedding model understands that both sentences are contextually related and generates similar vector representations for them.
This allows Azure AI Search to perform semantic similarity searches instead of relying only on exact keyword matching.
Typical RAG Workflow
In a typical RAG workflow:
- Documents are split into smaller chunks.
- Each chunk is converted into embeddings using an embedding model.
- The embeddings are stored inside Azure AI Search.
- User queries are also converted into embeddings.
- Azure AI Search finds the most semantically relevant content using vector similarity search.
- The retrieved content is sent to the Large Language Model to generate the final response.
What We Will Build
Now that we understand the basics of RAG and why embeddings are important, let us get our feet wet and build a complete manual Vector RAG pipeline.
In this blog, we will not rely on fully automated indexing or integrated vectorization. Instead, we will manually build each step of the process so we can clearly understand how the different components work together.
Azure Blob Storage
Stores the original source documents.
Azure OpenAI
Generates text embeddings and final AI responses.
Azure AI Search
Stores vector data and performs similarity searches.
Python
Connects the complete pipeline together.
Final Architecture
Step 1 — Why Manual RAG is Valuable
Before jumping into frameworks and prebuilt AI tools, it is important to understand how RAG actually works internally.
For this guide, imagine building an AI assistant that can answer questions from religious teachings such as the Bhagavad Gita and the Bible.
For example:
- What does the Bhagavad Gita teach about selfless action?
- What does the Bible say about forgiveness?
- Compare teachings about duty and compassion.
Instead of training a custom AI model from scratch, we can:
- Store religious documents inside Azure Blob Storage.
- Convert the text into embeddings using Azure OpenAI.
- Store the vectors inside Azure AI Search.
- Retrieve the most semantically relevant passages.
- Send those passages to GPT to generate grounded answers.
This allows the AI to answer based on actual teachings rather than relying only on the model’s memory.
Prerequisites
1. Azure Blob Storage
Used to store the source documents for indexing and retrieval.
Documentation: Introduction to Azure Blob Storage
Brief steps: Create a Storage Account > Open the Storage Account > Create a Blob Container > Upload sample documents.
2. Azure OpenAI
Used to generate text embeddings and AI responses.
You will need an embedding model deployment such as text-embedding-3-small or text-embedding-3-large, and a chat model deployment such as gpt-4o-mini or gpt-4.
Documentation: Azure OpenAI documentation
Brief steps: Create Azure OpenAI Resource > Deploy Embedding Model > Deploy Chat Model > Copy Endpoint and API Key.
3. Azure AI Search
Used to store vector embeddings and perform similarity searches.
Documentation: What is Azure AI Search?
Brief steps: Create Azure AI Search Resource > Select Free Tier if suitable > Copy Search Endpoint and Admin Key.
4. Python
Used to build the complete RAG pipeline.
Download: Python downloads
Brief steps: Install Python > Open Terminal or VS Code > Install required packages.
pip install openai azure-search-documents azure-storage-blob python-dotenv
Step 2 — Deploy the Embedding Model
Go to Azure AI Foundry, open Models + Endpoints, and deploy the embedding model:
text-embedding-3-small
Recommended deployment type for many demos:
Global Standard
Why Embeddings Matter
Embeddings convert text into mathematical vectors. For example:
Duty without attachment
becomes a vector that looks conceptually like this:
[0.0234, -0.8831, 0.1922, ...]
These numbers represent semantic meaning, which allows similar ideas to be found even when the exact words are different.
Step 3 — Create the Azure AI Search Vector Index
Go to Azure AI Search, open Indexes, and add a new index.
Field 1 — id
| Property | Value |
|---|---|
| Name | id |
| Type | String |
| Key | true |
| Retrievable | true |
Field 2 — content
| Property | Value |
|---|---|
| Name | content |
| Type | String |
| Searchable | true |
| Retrievable | true |
Field 3 — contentVector
| Property | Value |
|---|---|
| Name | contentVector |
| Type | Collection(Edm.Single) |
| Dimensions | 1536 |
| Vector Profile | vector-profile |
text-embedding-3-small embedding model commonly uses 1536 dimensions. The Azure AI Search index dimensions must match the vector dimensions generated by your embedding model deployment.
Step 4 — Create the Vector Profile
Create a vector profile called:
vector-profile
Use the algorithm:
HNSW
What is HNSW?
HNSW stands for Hierarchical Navigable Small World. It is a nearest-neighbor vector search algorithm used to find semantically similar vectors quickly.
Step 5 — Upload Documents to Blob Storage
Create a container called:
documents
Upload your sample files, such as:
- Bible markdown files
- Bhagavad Gita documents
- TXT files
- PDF files
- DOCX files
Step 6 — Build the Indexing Script
Create a file called:
index_documents.py
This script will follow this flow:
Complete Indexing Script
import uuid
from openai import AzureOpenAI
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential
from azure.storage.blob import BlobServiceClient
# =========================================================
# AZURE OPENAI
# =========================================================
openai_client = AzureOpenAI(
api_key="YOUR_OPENAI_KEY",
api_version="2024-02-01",
azure_endpoint="https://YOUR_OPENAI_RESOURCE.openai.azure.com"
)
# =========================================================
# AZURE SEARCH
# =========================================================
search_client = SearchClient(
endpoint="https://YOUR_SEARCH_NAME.search.windows.net",
index_name="documents-index",
credential=AzureKeyCredential("YOUR_SEARCH_ADMIN_KEY")
)
# =========================================================
# AZURE BLOB STORAGE
# =========================================================
blob_connection_string = "YOUR_CONNECTION_STRING"
blob_service_client = BlobServiceClient.from_connection_string(
blob_connection_string
)
container_client = blob_service_client.get_container_client(
"documents"
)
# =========================================================
# CHUNKING FUNCTION
# =========================================================
def chunk_text(text, chunk_size=500, overlap=100):
chunks = []
start = 0
while start < len(text):
end = start + chunk_size
chunk = text[start:end]
chunks.append(chunk)
start += chunk_size - overlap
return chunks
# =========================================================
# PROCESS DOCUMENTS
# =========================================================
documents = []
blobs = container_client.list_blobs()
for blob in blobs:
print(f"Processing blob: {blob.name}")
try:
blob_client = container_client.get_blob_client(blob.name)
downloaded_blob = blob_client.download_blob()
document_text = downloaded_blob.readall().decode("utf-8")
chunks = chunk_text(document_text)
print(f"Total chunks: {len(chunks)}")
for chunk in chunks:
if not chunk.strip():
continue
embedding_response = openai_client.embeddings.create(
input=chunk,
model="text-embedding-3-small"
)
vector = embedding_response.data[0].embedding
documents.append({
"id": str(uuid.uuid4()),
"content": chunk,
"contentVector": vector
})
except Exception as e:
print(f"Error processing {blob.name}: {str(e)}")
# =========================================================
# UPLOAD TO AZURE SEARCH
# =========================================================
if documents:
print(f"Uploading {len(documents)} chunks to Azure Search...")
result = search_client.upload_documents(documents)
print("Upload completed.")
else:
print("No documents found to upload.")
Understanding Chunking
Chunking splits large documents into smaller semantic sections.
Without chunking:
Entire Bible → One Vector
This usually produces poor retrieval quality because the vector represents too much information at once.
With chunking:
Bible
→ Hundreds of chunks
→ Better semantic retrieval
Why Overlap Matters
Overlap preserves context between chunks.
Chunk 1 ends at verse 5
Chunk 2 starts slightly before verse 5
This improves retrieval continuity and reduces the chance that important context is lost at chunk boundaries.
Step 7 — Verify the Upload
Run the indexing script:
python index_documents.py
You should see output similar to:
Upload completed.
This means vectors were successfully uploaded into Azure AI Search.
Step 8 — Build the Retrieval Script
Create a file called:
search_vectors.py
This script will follow this flow:
Complete Retrieval Script
from openai import AzureOpenAI
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential
# =========================================================
# AZURE OPENAI
# =========================================================
openai_client = AzureOpenAI(
api_key="YOUR_OPENAI_KEY",
api_version="2024-02-01",
azure_endpoint="https://YOUR_OPENAI_RESOURCE.openai.azure.com"
)
# =========================================================
# AZURE SEARCH
# =========================================================
search_client = SearchClient(
endpoint="https://YOUR_SEARCH_NAME.search.windows.net",
index_name="documents-index",
credential=AzureKeyCredential("YOUR_SEARCH_ADMIN_KEY")
)
# =========================================================
# USER QUESTION
# =========================================================
query = "What does Bhagavad Gita teach about selfless action?"
# =========================================================
# CREATE QUERY EMBEDDING
# =========================================================
embedding_response = openai_client.embeddings.create(
input=query,
model="text-embedding-3-small"
)
query_vector = embedding_response.data[0].embedding
# =========================================================
# VECTOR SEARCH
# =========================================================
results = search_client.search(
search_text="",
vector_queries=[
{
"kind": "vector",
"vector": query_vector,
"fields": "contentVector",
"k_nearest_neighbors": 3
}
],
select=["content"]
)
# =========================================================
# PRINT RESULTS
# =========================================================
print("\nTop Semantic Matches:\n")
for result in results:
print(result["content"])
print("\n----------------------\n")
How Semantic Search Works
Imagine we store this text:
Perform your duties without attachment to outcomes
The user asks:
What is selfless action?
Even though the wording is different, the meaning is similar:
selfless action ≈ detached duty
The embeddings become mathematically close. This is the core idea behind semantic vector search.
Full RAG Flow
Example RAG Prompt
Context:
"You have a right to perform your prescribed duties..."
Question:
What is karma yoga?
Example GPT Response
Karma yoga refers to selfless action performed without attachment to outcomes.
What You Successfully Built
By following this approach, you now have the foundation of a manual Vector RAG pipeline:
- Blob ingestion pipeline
- Chunking pipeline
- Embedding generation
- Vector database
- Semantic search
- Retrieval engine
- Azure-based RAG infrastructure
Next Steps
Once the manual pipeline is working, you can extend it further by building:
- A Copilot Studio integration that calls your retrieval API.
- A web application where users can ask questions over uploaded documents.
- A SharePoint document assistant that indexes files from SharePoint libraries.
- A hybrid search experience combining keyword search and vector search.
- A production-grade API layer with authentication, logging, and monitoring.
- Improved document extraction for PDF, DOCX, HTML, and scanned documents.
Conclusion
RAG has become one of the most important patterns in modern AI applications because it allows AI systems to answer questions using real, current, and domain-specific data.
Azure AI Search plays a key role in this architecture by providing the retrieval layer that helps find relevant content quickly. By manually generating embeddings, storing vectors, and performing similarity search, you gain a strong understanding of how RAG works behind the scenes.
This foundation will help you confidently move toward more advanced enterprise scenarios such as SharePoint-based knowledge assistants, Copilot Studio integrations, and production-ready AI search experiences.
References
- Try Azure AI Search for free — Microsoft Learn
- Vector search in Azure AI Search — Microsoft Learn
- Vector indexes in Azure AI Search — Microsoft Learn
- Generate embeddings with Azure OpenAI — Microsoft Learn
- Integrated vectorization in Azure AI Search — Microsoft Learn
- Introduction to Azure Blob Storage — Microsoft Learn