Building a Manual Vector RAG Pipeline Using Azure AI Search Free Tier

What You Will Learn

What Retrieval-Augmented Generation means and why it matters.
How text embeddings help AI understand semantic meaning.
Why Azure AI Search is useful as the retrieval layer in RAG.
How to manually chunk documents, generate embeddings, store vectors, and retrieve relevant content.
How to use retrieved content to generate grounded AI responses.

Introduction

Modern AI applications are no longer limited to answering questions based only on pre-trained knowledge. Today, businesses want AI systems that can search through documents, understand internal knowledge, and generate grounded responses based on real data.

This is where RAG, or Retrieval-Augmented Generation, becomes important.

Before we understand Azure AI Search, we first need to understand what RAG is and why it has become such an important concept in modern AI applications.

What is RAG?

RAG is an AI approach that combines the power of Large Language Models with external data sources. Instead of relying only on the information the AI model was trained on, RAG first retrieves relevant information from documents, databases, or search indexes and then uses that information to generate a more accurate and context-aware response.

This approach is especially useful for enterprise applications where AI needs access to up-to-date business data, internal documents, PDFs, knowledge bases, or SharePoint content. Rather than guessing an answer, the AI can retrieve the most relevant information and generate responses based on actual data.

In simple terms, RAG follows three steps:

Retrieve relevant information from a data source.
Augment the AI prompt with that information.
Generate a smarter and more accurate response.

This is where Azure AI Search becomes extremely important, as it helps power the retrieval layer of a RAG-based solution.

What is Azure AI Search?

Azure AI Search is a managed search service from Microsoft Azure. You send it structured documents, define which fields should be searchable, filterable, or retrievable, and then call an API to return ranked results.

It is commonly used for knowledge bases, product catalogs, internal portals, document discovery, and retrieval-augmented generation patterns.

The simplest mental model is this:

Your application owns the original data, and Azure AI Search owns the optimized index that makes that data easy to search.

Why Start with the Free Tier?

The Free tier is useful for learning, demos, tutorials, and small proof-of-concept work. As of May 15, 2026, Microsoft documents the Free tier as an always-free service that does not consume Azure free account credits, provides 50 MB of storage, and allows one free search service per Azure subscription.

Important: The Free tier is great for learning, but production workloads usually require a higher tier such as Basic or Standard, depending on scale, storage, authentication, performance, and feature requirements.

When working with the Free tier of Azure AI Search, it is important to understand some limitations, especially when building modern RAG solutions with vector search.

Although Azure AI Search supports vector search capabilities, some advanced AI enrichment and integrated vectorization scenarios are more suitable for higher pricing tiers. In many learning or proof-of-concept scenarios, developers manually generate embeddings using services such as Azure OpenAI embedding models and then upload those vectors into Azure AI Search. That is exactly what we will do in this blog.

Understanding Text Embeddings

Before continuing, it is important to understand what a text embedding model actually does.

A text embedding model converts human-readable text into numerical representations called vectors or embeddings. These vectors capture the semantic meaning of the text rather than just matching keywords.

For example, consider these two sentences:

How do I reset my password?
I forgot my account password.

Even though the wording is different, an embedding model understands that both sentences are contextually related and generates similar vector representations for them.

This allows Azure AI Search to perform semantic similarity searches instead of relying only on exact keyword matching.

Typical RAG Workflow

In a typical RAG workflow:

Documents are split into smaller chunks.
Each chunk is converted into embeddings using an embedding model.
The embeddings are stored inside Azure AI Search.
User queries are also converted into embeddings.
Azure AI Search finds the most semantically relevant content using vector similarity search.
The retrieved content is sent to the Large Language Model to generate the final response.

What We Will Build

Now that we understand the basics of RAG and why embeddings are important, let us get our feet wet and build a complete manual Vector RAG pipeline.

In this blog, we will not rely on fully automated indexing or integrated vectorization. Instead, we will manually build each step of the process so we can clearly understand how the different components work together.

Azure Blob Storage

Stores the original source documents.

Azure OpenAI

Generates text embeddings and final AI responses.

Azure AI Search

Stores vector data and performs similarity searches.

Python

Connects the complete pipeline together.

Final Architecture

Documents (Bible / Bhagavad Gita) ↓ Azure Blob Storage ↓ Python Script ↓ Chunking ↓ Azure OpenAI Embeddings ↓ Azure AI Search Vector Database ↓ Semantic Vector Search ↓ Retrieved Chunks ↓ GPT Response

Step 1 — Why Manual RAG is Valuable

Before jumping into frameworks and prebuilt AI tools, it is important to understand how RAG actually works internally.

For this guide, imagine building an AI assistant that can answer questions from religious teachings such as the Bhagavad Gita and the Bible.

For example:

What does the Bhagavad Gita teach about selfless action?
What does the Bible say about forgiveness?
Compare teachings about duty and compassion.

Instead of training a custom AI model from scratch, we can:

Store religious documents inside Azure Blob Storage.
Convert the text into embeddings using Azure OpenAI.
Store the vectors inside Azure AI Search.
Retrieve the most semantically relevant passages.
Send those passages to GPT to generate grounded answers.

This allows the AI to answer based on actual teachings rather than relying only on the model’s memory.

Prerequisites

Note: This blog assumes you already understand the basics of Azure and know how to deploy and configure Azure resources. If you are new to Azure, start with the official Azure training at Microsoft Learn for Azure.

1. Azure Blob Storage

Used to store the source documents for indexing and retrieval.

Documentation: Introduction to Azure Blob Storage

Brief steps: Create a Storage Account > Open the Storage Account > Create a Blob Container > Upload sample documents.

2. Azure OpenAI

Used to generate text embeddings and AI responses.

You will need an embedding model deployment such as text-embedding-3-small or text-embedding-3-large, and a chat model deployment such as gpt-4o-mini or gpt-4.

Documentation: Azure OpenAI documentation

Brief steps: Create Azure OpenAI Resource > Deploy Embedding Model > Deploy Chat Model > Copy Endpoint and API Key.

3. Azure AI Search

Used to store vector embeddings and perform similarity searches.

Documentation: What is Azure AI Search?

Brief steps: Create Azure AI Search Resource > Select Free Tier if suitable > Copy Search Endpoint and Admin Key.

4. Python

Used to build the complete RAG pipeline.

Download: Python downloads

Brief steps: Install Python > Open Terminal or VS Code > Install required packages.

pip install openai azure-search-documents azure-storage-blob python-dotenv

Step 2 — Deploy the Embedding Model

Go to Azure AI Foundry, open Models + Endpoints, and deploy the embedding model:

text-embedding-3-small

Recommended deployment type for many demos:

Global Standard

Why Embeddings Matter

Embeddings convert text into mathematical vectors. For example:

Duty without attachment

becomes a vector that looks conceptually like this:

[0.0234, -0.8831, 0.1922, ...]

These numbers represent semantic meaning, which allows similar ideas to be found even when the exact words are different.

Step 3 — Create the Azure AI Search Vector Index

Go to Azure AI Search, open Indexes, and add a new index.

Field 1 — id

Property	Value
Name	id
Type	String
Key	true
Retrievable	true

Field 2 — content

Property	Value
Name	content
Type	String
Searchable	true
Retrievable	true

Field 3 — contentVector

Property	Value
Name	contentVector
Type	Collection(Edm.Single)
Dimensions	1536
Vector Profile	vector-profile

Why 1536? The text-embedding-3-small embedding model commonly uses 1536 dimensions. The Azure AI Search index dimensions must match the vector dimensions generated by your embedding model deployment.

Step 4 — Create the Vector Profile

Create a vector profile called:

vector-profile

Use the algorithm:

HNSW

What is HNSW?

HNSW stands for Hierarchical Navigable Small World. It is a nearest-neighbor vector search algorithm used to find semantically similar vectors quickly.

Step 5 — Upload Documents to Blob Storage

Create a container called:

documents

Upload your sample files, such as:

Bible markdown files
Bhagavad Gita documents
TXT files
PDF files
DOCX files

Note: The sample Python script below assumes UTF-8 text-based files. If you upload PDFs or DOCX files, you will need additional extraction logic before chunking the content.

Step 6 — Build the Indexing Script

Create a file called:

index_documents.py

This script will follow this flow:

Blob Storage → Read Documents → Chunk Text → Generate Embeddings → Upload Vectors

Complete Indexing Script

import uuid
from openai import AzureOpenAI
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential
from azure.storage.blob import BlobServiceClient

# =========================================================
# AZURE OPENAI
# =========================================================
openai_client = AzureOpenAI(
    api_key="YOUR_OPENAI_KEY",
    api_version="2024-02-01",
    azure_endpoint="https://YOUR_OPENAI_RESOURCE.openai.azure.com"
)

# =========================================================
# AZURE SEARCH
# =========================================================
search_client = SearchClient(
    endpoint="https://YOUR_SEARCH_NAME.search.windows.net",
    index_name="documents-index",
    credential=AzureKeyCredential("YOUR_SEARCH_ADMIN_KEY")
)

# =========================================================
# AZURE BLOB STORAGE
# =========================================================
blob_connection_string = "YOUR_CONNECTION_STRING"

blob_service_client = BlobServiceClient.from_connection_string(
    blob_connection_string
)

container_client = blob_service_client.get_container_client(
    "documents"
)

# =========================================================
# CHUNKING FUNCTION
# =========================================================
def chunk_text(text, chunk_size=500, overlap=100):
    chunks = []
    start = 0

    while start < len(text):
        end = start + chunk_size
        chunk = text[start:end]
        chunks.append(chunk)
        start += chunk_size - overlap

    return chunks

# =========================================================
# PROCESS DOCUMENTS
# =========================================================
documents = []
blobs = container_client.list_blobs()

for blob in blobs:
    print(f"Processing blob: {blob.name}")

    try:
        blob_client = container_client.get_blob_client(blob.name)
        downloaded_blob = blob_client.download_blob()
        document_text = downloaded_blob.readall().decode("utf-8")

        chunks = chunk_text(document_text)

        print(f"Total chunks: {len(chunks)}")

        for chunk in chunks:
            if not chunk.strip():
                continue

            embedding_response = openai_client.embeddings.create(
                input=chunk,
                model="text-embedding-3-small"
            )

            vector = embedding_response.data[0].embedding

            documents.append({
                "id": str(uuid.uuid4()),
                "content": chunk,
                "contentVector": vector
            })

    except Exception as e:
        print(f"Error processing {blob.name}: {str(e)}")

# =========================================================
# UPLOAD TO AZURE SEARCH
# =========================================================
if documents:
    print(f"Uploading {len(documents)} chunks to Azure Search...")
    result = search_client.upload_documents(documents)
    print("Upload completed.")
else:
    print("No documents found to upload.")

Understanding Chunking

Chunking splits large documents into smaller semantic sections.

Without chunking:

Entire Bible → One Vector

This usually produces poor retrieval quality because the vector represents too much information at once.

With chunking:

Bible
→ Hundreds of chunks
→ Better semantic retrieval

Why Overlap Matters

Overlap preserves context between chunks.

Chunk 1 ends at verse 5
Chunk 2 starts slightly before verse 5

This improves retrieval continuity and reduces the chance that important context is lost at chunk boundaries.

Step 7 — Verify the Upload

Run the indexing script:

python index_documents.py

You should see output similar to:

Upload completed.

This means vectors were successfully uploaded into Azure AI Search.

Step 8 — Build the Retrieval Script

Create a file called:

search_vectors.py

This script will follow this flow:

Question → Embedding → Vector Search → Retrieve Relevant Chunks

Complete Retrieval Script

from openai import AzureOpenAI
from azure.search.documents import SearchClient
from azure.core.credentials import AzureKeyCredential

# =========================================================
# AZURE OPENAI
# =========================================================
openai_client = AzureOpenAI(
    api_key="YOUR_OPENAI_KEY",
    api_version="2024-02-01",
    azure_endpoint="https://YOUR_OPENAI_RESOURCE.openai.azure.com"
)

# =========================================================
# AZURE SEARCH
# =========================================================
search_client = SearchClient(
    endpoint="https://YOUR_SEARCH_NAME.search.windows.net",
    index_name="documents-index",
    credential=AzureKeyCredential("YOUR_SEARCH_ADMIN_KEY")
)

# =========================================================
# USER QUESTION
# =========================================================
query = "What does Bhagavad Gita teach about selfless action?"

# =========================================================
# CREATE QUERY EMBEDDING
# =========================================================
embedding_response = openai_client.embeddings.create(
    input=query,
    model="text-embedding-3-small"
)

query_vector = embedding_response.data[0].embedding

# =========================================================
# VECTOR SEARCH
# =========================================================
results = search_client.search(
    search_text="",
    vector_queries=[
        {
            "kind": "vector",
            "vector": query_vector,
            "fields": "contentVector",
            "k_nearest_neighbors": 3
        }
    ],
    select=["content"]
)

# =========================================================
# PRINT RESULTS
# =========================================================
print("\nTop Semantic Matches:\n")

for result in results:
    print(result["content"])
    print("\n----------------------\n")

How Semantic Search Works

Imagine we store this text:

Perform your duties without attachment to outcomes

The user asks:

What is selfless action?

Even though the wording is different, the meaning is similar:

selfless action ≈ detached duty

The embeddings become mathematically close. This is the core idea behind semantic vector search.

Full RAG Flow

User Question ↓ Embedding Model ↓ Vector Search ↓ Relevant Chunks Retrieved ↓ Send Chunks to GPT ↓ Grounded Final Answer

Example RAG Prompt

Context:
"You have a right to perform your prescribed duties..."

Question:
What is karma yoga?

Example GPT Response

Karma yoga refers to selfless action performed without attachment to outcomes.

What You Successfully Built

By following this approach, you now have the foundation of a manual Vector RAG pipeline:

Blob ingestion pipeline
Chunking pipeline
Embedding generation
Vector database
Semantic search
Retrieval engine
Azure-based RAG infrastructure

Next Steps

Once the manual pipeline is working, you can extend it further by building:

A Copilot Studio integration that calls your retrieval API.
A web application where users can ask questions over uploaded documents.
A SharePoint document assistant that indexes files from SharePoint libraries.
A hybrid search experience combining keyword search and vector search.
A production-grade API layer with authentication, logging, and monitoring.
Improved document extraction for PDF, DOCX, HTML, and scanned documents.

Conclusion

RAG has become one of the most important patterns in modern AI applications because it allows AI systems to answer questions using real, current, and domain-specific data.

Azure AI Search plays a key role in this architecture by providing the retrieval layer that helps find relevant content quickly. By manually generating embeddings, storing vectors, and performing similarity search, you gain a strong understanding of how RAG works behind the scenes.

This foundation will help you confidently move toward more advanced enterprise scenarios such as SharePoint-based knowledge assistants, Copilot Studio integrations, and production-ready AI search experiences.

What You Will Learn

Introduction

What is RAG?

What is Azure AI Search?

Why Start with the Free Tier?

Understanding Text Embeddings

Typical RAG Workflow

What We Will Build

Azure Blob Storage

Azure OpenAI

Azure AI Search

Python

Final Architecture

Step 1 — Why Manual RAG is Valuable

Prerequisites

1. Azure Blob Storage

2. Azure OpenAI

3. Azure AI Search

4. Python

Step 2 — Deploy the Embedding Model

Why Embeddings Matter

Step 3 — Create the Azure AI Search Vector Index

Field 1 — id

Field 2 — content

Field 3 — contentVector

Step 4 — Create the Vector Profile

What is HNSW?

Step 5 — Upload Documents to Blob Storage

Step 6 — Build the Indexing Script

Complete Indexing Script

Understanding Chunking

Why Overlap Matters

Step 7 — Verify the Upload

Step 8 — Build the Retrieval Script

Complete Retrieval Script

How Semantic Search Works

Full RAG Flow

Example RAG Prompt

Example GPT Response

What You Successfully Built

Next Steps

Conclusion

References