Similarity Searching a Vectorstore from FlashBlade S3 |

In Part 1 of this series, we leveraged LangChain’s built-in capabilities to load in data from a FlashBlade® S3 bucket and reviewed various settings to optimize throughput performance. In this article, we’ll review the next steps of taking the data residing on FlashBlade, chunking it up, embedding it, loading a vectorstore, persisting it to storage for easy future retrievals, and demonstrating how choices we made in chunking affect the accuracy of similarity search.

Step 1: Chunking

We last left off using LangChain’s S3DirectoryLoader to load documents into memory for processing. We now have to take the contents of those documents and split them into chunks for quicker retrieval later on in the chatbot pipeline. This can be done seamlessly by leveraging S3DirectoryLoader’s load_and_split() function.

In order to use this function, we’ll need to define our text splitter and an appropriate chunk size and chunk overlap. I set the chunk_size to something really small to illustrate what the output will look like below—normally you would set this to something optimized for your environment’s compute capabilities, data set contents, and desired Q&A output accuracy. Play around with the chunk size and chunk overlap values and you’ll see how this affects the accuracy of the similarity search results—something we’ll cover later on in this article.

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(

    chunk_size = 30,
    chunk_overlap = 0
)

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(

chunk_size = 30,

chunk_overlap = 0

)

documents = loader.load_and_split(text_splitter)

1	documents = loader.load_and_split(text_splitter)

The code above will result in an output similar to this, showing our data was chunked successfully into more document objects:

[Document(page_content=’I have many leather-bound books’, 
metadata={‘source’: ‘s3://flashblade-bucket/anchorman.txt’}),
Document(page_content=’and my apartment smells of rich mahogany.’, 
metadata={‘source’: ‘s3://flashblade-bucket/anchorman.txt’}),
Document(page_content=’I award you no points, and may’, 
metadata={‘source’: ‘s3://flashblade-bucket/billymadison.txt’}),
Document(page_content=’God have mercy on your soul.’, 
metadata={‘source’: ‘s3://flashblade-bucket/billymadison.txt’})]

[Document(page_content=‘I have many leather-bound books’,

metadata={‘source’: ‘s3://flashblade-bucket/anchorman.txt’}),

Document(page_content=‘and my apartment smells of rich mahogany.’,

metadata={‘source’: ‘s3://flashblade-bucket/anchorman.txt’}),

Document(page_content=‘I award you no points, and may’,

metadata={‘source’: ‘s3://flashblade-bucket/billymadison.txt’}),

Document(page_content=‘God have mercy on your soul.’,

metadata={‘source’: ‘s3://flashblade-bucket/billymadison.txt’})]

Step 2: Embedding and Storing to a Vectorstore

The next step will be to embed the chunks in preparation for storing in a vectorstore. Embedding is an important process in which a vector of values is created to represent a chunk of text—this gives us the ability to not just search for singular words but also the surrounding words for better context.

This step varies depending on your LLM of choice. For example, the code would be different if we’re leveraging OpenAI’s API or if we’re using local models sourced from Hugging Face. Both of these (among many others) are supported in LangChain. Since most enterprises are unable to send their proprietary data to OpenAI for various legal reasons, let’s work on using local LLMs from Hugging Face.

We need to import LangChain’s HuggingFaceEmbeddings code where we’ll pass in our preferred sentence transformer model that will handle the embedding logic. In the example below, I picked the most popular sentence transformer model, but there are others available that have different pros and cons. Using a different model from Hugging Face is as simple as changing the model name.

from langchain.embeddings import HuggingFaceEmbeddings

#model names can be found on https://huggingface.co/models
model_name = “sentence-transformers/all-mpnet-base-v2”

embeddings = HuggingFaceEmbeddings(model_name = model_name)

from langchain.embeddings import HuggingFaceEmbeddings

#model names can be found on https://huggingface.co/models

model_name = “sentence-transformers/all-mpnet-base-v2”

embeddings = HuggingFaceEmbeddings(model_name = model_name)

Now that we have our embedding instructions ready, let’s create our vectorstore and finally glue our chunks, embedding logic, and vectorstore pieces together. There are many different vectorstore technologies available. In this tutorial, we’ll leverage FAISS from Meta due to its ease of deployment, scalability, and search performance.

Let’s install the CPU version of FAISS for this tutorial (a GPU version is also available):

pip install faiss-cpu

1	pip install faiss–cpu

And now our Python code to create the vectorstore:

from langchain.vectorstores.faiss import FAISS

#create the FAISS vectorstore
vectorstore = FAISS.from_documents(documents, embeddings)

from langchain.vectorstores.faiss import FAISS

#create the FAISS vectorstore

vectorstore = FAISS.from_documents(documents, embeddings)

We now have a FAISS vectorstore loaded with embedding representations of our chunked documents and are ready for similarity searching.

But before we start searching, we’ve got one more important step to do. Right now, that vectorstore is in memory, and we would need to redo the above steps every time we launched the application. Let’s see how we can persist this to storage and recall it back.

ANALYST REPORT,

Top Storage Recommendations
to Support Generative AI

Read the Report

Step 3: Persisting the Vectorstore to FlashBlade

In this step, we’re going to be leveraging pickle in conjunction with boto3 to store the vectorstore variable to a FlashBlade S3 bucket. Pickle is a Python module that converts a Python object to a byte stream and boto3 will take the byte stream and handle the transmission of that data to FlashBlade.

import boto3
import pickle

#prep boto3 for sending data to FlashBlade S3 bucket
s3_client = boto3.client(

    “s3″,
    aws_access_key_id=”FB User Access Key” ,
    aws_secret_access_key=”FB User Secret Key”,
    endpoint_url=”https://FB Data VIP”
)

#use pickle to create vectorstore file and send to FlashBlade via boto3
pickle_byte_obj = pickle.dumps(vectorstore)
bucket = “FlashBlade Bucket Name”
key = “vectorstore.pkl”

s3_client.put_object(Body=pickle_byte_obj, Bucket=bucket, Key=key)

import boto3

import pickle

#prep boto3 for sending data to FlashBlade S3 bucket

s3_client = boto3.client(

“s3”,

aws_access_key_id=“FB User Access Key” ,

aws_secret_access_key=“FB User Secret Key”,

endpoint_url=“https://FB Data VIP”

)

#use pickle to create vectorstore file and send to FlashBlade via boto3

pickle_byte_obj = pickle.dumps(vectorstore)

bucket = “FlashBlade Bucket Name”

key = “vectorstore.pkl”

s3_client.put_object(Body=pickle_byte_obj, Bucket=bucket, Key=key)

Now when we want to load our vectorstore in production, we can just use the following code to use our vectorstore instead of having to reload/chunk/embed/vectorstore every time:

response = s3_client.get_object(

    Bucket=”FlashBlade Bucket Name”,&nbsp;
    Key=”vectorstore.pkl”
)


body = response[‘Body’].read()
vectorstore = pickle.loads(body)

response = s3_client.get_object(

Bucket=“FlashBlade Bucket Name”, 

Key=“vectorstore.pkl”

)

body = response[‘Body’].read()

vectorstore = pickle.loads(body)

Step 4: Querying the Vectorstore

We’re finally at an important stage of the chatbot pipeline: testing the vectorstore for accuracy on the chunks of documents it returns based on a query. For a FAISS vectorstore, there are several search methodologies such as similarity search and max marginal relevance search, each with synchronous and asynchronous versions, as well as the option to display the relevancy scores of each document chunk returned. We’ll use the simple similarity search call to demonstrate our vectorstore is working:

query = “What does my apartment smell like?”

#k specifies the number of documents to return, default is 4
docs = vectorstore.similarity_search(query, k=2)

print(docs)

[Document(page_content=’books and my apartment smells’, metadata={‘source’: ‘s3://flashblade-bucket/anchorman.txt’}), Document(page_content=’of rich mahogany.’, metadata={‘source’: ‘s3://flashblade-bucket/anchorman.txt’})]

query = “What does my apartment smell like?”

#k specifies the number of documents to return, default is 4

docs = vectorstore.similarity_search(query, k=2)

print(docs)

[Document(page_content=‘books and my apartment smells’, metadata={‘source’: ‘s3://flashblade-bucket/anchorman.txt’}), Document(page_content=‘of rich mahogany.’, metadata={‘source’: ‘s3://flashblade-bucket/anchorman.txt’})]

The vectorstore similarity search worked and returned enough content that our answer resides in. But notice how chunking and setting a k-value could affect our results… we chunked to 30 characters earlier (a very small value for demonstration purposes) and set our similarity search k-value to 2 so it would return two chunks. If we had set k=1, we would not have gotten our correct context (the “rich mahogany” text). Alternatively, if we increased our chunk size and overlap to larger values and left k=1, we would have received our correct context. This is why it’s important to find the correct balance of chunk size, overlap, and k-value to make sure it’s large enough to get the full context but efficiently small enough as to not have to load a ton of text into our LLM in the following tutorials.

Stay Tuned for More Tutorials

Let’s review what we’ve accomplished from Part 1 and Part 2 of this blog series so far. We’ve set up a LangChain environment that pulled documents from a FlashBlade S3 bucket into memory, reviewed various data movement tools for performance considerations, chunked and embedded those documents, created and loaded a FAISS vectorstore with our chunked embeddings, persisted the vectorstore into a pkl file that was stored to a FlashBlade S3 bucket, showed how to retrieve that vectorstore pkl file from FlashBlade back into memory for usage, queried the vectorstore, and received a document that contained the answer to our question.

In our next blog post in the series, we’ll cover:

Logging, tracing, and debugging a chain

Passing the relevant document into an LLM chain for inference where we’ll receive a definitive answer and not just a chunk of documentation

Blog Home

How to Build a LangChain Chatbot Pt. 2: Similarity Searching a Vectorstore from FlashBlade S3

Step 1: Chunking

Step 2: Embedding and Storing to a Vectorstore

Top Storage Recommendations
to Support Generative AI

Step 3: Persisting the Vectorstore to FlashBlade

Step 4: Querying the Vectorstore

Stay Tuned for More Tutorials

Read More from This Series

Part 1: How We Got Here – Virtualization Solves Enterprise Challenges

Part 2: The Helmsman Arrives! – A History of Containers and Kubernetes

Part 3: The Challenge – Bringing Proven Enterprise Data Storage to Kubernetes

Part 4: The Solution – Comprehensive Data Storage Infrastructure with Portworx and Pure

Support Generative AI

Introducing SQL Server 2025: Enterprise-ready AI

How to Create Customised Billing Reports for IT Departments and MSPs with Pure Fusion and AI DevOps

From Storage to Stream: A Comparison of Leader Election in Portworx, Kafka, and Raft

Pure Storage FlashBlade Augments Its Fast Object Store with S3 over RDMA for AI/ML Workflows

Top Stories

Introducing SQL Server 2025: Enterprise-ready AI

How to Create Customised Billing Reports for IT Departments and MSPs with Pure Fusion and AI DevOps

From Storage to Stream: A Comparison of Leader Election in Portworx, Kafka, and Raft

Pure Storage FlashBlade Augments Its Fast Object Store with S3 over RDMA for AI/ML Workflows

Using T-SQL Snapshot Backup: Seeding Availability Groups

How to Build a LangChain Chatbot Pt. 2: Similarity Searching a Vectorstore from FlashBlade S3

Step 1: Chunking

Step 2: Embedding and Storing to a Vectorstore

Top Storage Recommendationsto Support Generative AI

Step 3: Persisting the Vectorstore to FlashBlade

Step 4: Querying the Vectorstore

Stay Tuned for More Tutorials

Read More from This Series

Part 1: How We Got Here – Virtualization Solves Enterprise Challenges

Part 2: The Helmsman Arrives! – A History of Containers and Kubernetes

Part 3: The Challenge – Bringing Proven Enterprise Data Storage to Kubernetes

Part 4: The Solution – Comprehensive Data Storage Infrastructure with Portworx and Pure

Support Generative AI

Related Stories

Top Stories

Top Storage Recommendations
to Support Generative AI