Manage AVS indexes
Overviewโ
AVS offers an extensive set of configuration options for tuning the performance, storage, and recall
of your index. This guide offers examples of the most common configurations to consider. For a complete
set of configuration options you can either review the types package in our Python documentation or
review the help output by running asvec index create --help
.
Index Modesโ
A key configuration setting for your index is the mode you want your index to operate in. There are two specific modes to utilize.
- Distributed - This is the default index type. It keeps your index searchable while you stream in changes to your index. Use this index type if you're not sure what mode you need.
- Standalone - Standalone indexing builds your entire index in memory. Once your index is finished being built stand alone it will be automatically set to distributed.
Use a standalone index if you are building an index from scratch, or rebuilding an index for a new embedding model. This builds the index much faster before automatically making the index available for search.
Fixed configurationsโ
The following configuration parameters affect the storage and graph of your index. They cannot be updated once the index is created. You will need to create a new index if you wish to change any of these values.
- Description
- Python
- asvec
Parameter | Description |
---|---|
Namespace | Namespace for creating the index, which must already exist in your Aerospike cluster. See configuring Aerospike Database for details about setting up a namespace for your index. |
Sets | Specifies where your data is stored in Aerospike Database. Uses the null set by default. |
Index name | Name of the index. Used primarily for performing searches. |
Dimensions | Number of dimensions (length) of the vector. See generating vector embeddings for details about determining the number of dimensions in your vector. |
Vector distance metric | Distance calculation used by your index. Options include: SQUARED EUCLIDEAN โ Reports squared Euclidean (L2) distance, avoiding the square root. This is still monotonic as the Euclidean distance, but if exact distances are needed, an additional square root of the result is needed. COSINE - Measures the cosine of the angle between two vectors to determine how similar their direction is, without regarding magnitude. DOT PRODUCT - Takes into account angle similarity and vector magnitude. MANHATTAN - Sums the absolute values of the differences between vector components. HAMMING - Measures the number of dimensions where vectors differ. |
Vector field | Field name of the vector in the record. See Adding records to your index for details. |
Mode | Indicates if the initial index should be built standalone. |
avs_client.index_create(
namespace="avs-index",
name="search-space",
vector_field="img_vector",
dimensions=768,
vector_distance_metric=types.VectorDistanceMetric.COSINE,
sets="index-set",
mode=types.IndexMode.STANDALONE,
)
asvec index create \
--index-name search-space \
--namespace avs-index \
--set index-set \
--vector-field img_vector \
--dimension 768 \
--distance-metric COSINE \
--index-mode STANDALONE
The value of vector field is limited to 15 characters. Using a name longer than 15 characters will cause errors.
HNSW parametersโ
Adjusting HNSW parameters is the primary way to improve the recall of your searches. Keep in mind improving recall comes at the expense of higher latency, and more storage requirements. Because these paramaters affect the graph you can not update them after creating your index.
- Description
- Python
- asvec
Parameter | Description | Default |
---|---|---|
M (max edges) | Number of bidirectional links created per level during construction. Larger values lead to higher recall but slower construction and larger storage requirements. | 16 |
EF | Size of the dynamic list for the nearest neighbors (candidates) during the search phase. Larger values lead to higher recall but slower searches and higher resource utilization. | 100 |
EF construction | Size of the dynamic list for the nearest neighbors (candidates) during index construction. Larger values lead to higher recall but slower index build and higher resource utilization. | 100 |
avs_client.index_create(
namespace="avs-index",
name="search-space",
vector_field="img_vector",
dimensions=768,
vector_distance_metric=types.VectorDistanceMetric.COSINE,
sets="index-set",
mode=types.Mode.STANDALONE,
namespace="avs-index",
name="search-space",
index_params=types.HnswParams(
m=32,
ef_construction=200,
ef=400,
),
)
asvec index create \
--index-name search-space \
--namespace avs-index \
--set index-set \
--vector-field img_vector \
--dimension 768 \
--distance-metric COSINE \
--index-mode STANDALONE
--hnsw-m 16 \
--hnsw-ef 100 \
--hnsw-ef-construction 10000 \
Dynamic Configurationsโ
The following items do not affect index construction or storage, and can be modified as updates after an index is created and can be used to change the performance of your index by adjusting the cache and healer settings.
Cachingโ
The primary way to reduce the latency of your searches is to adjust the cache settings of your index.
- Description
- Python
- asvec
Parameter | Description | Default |
---|---|---|
Index cache max entries | Maximum number of index records held in the cache. | 2,000,000 |
Index cache expiry | Cache expiration time in milliseconds. Set to -1 to never expire. | -1 (no expiry) |
Record cache max entries | Maximum number of vector records held in the cache. | 0 (off) |
Record cache expire | Seconds to keep record values in cache. (-1 for infinity) | 0 (off) |
avs_client.index_update(
namespace="avs-index",
name="search-space",
hnsw_update_params=types.HnswIndexUpdate(
index_caching_params=types.HnswCachingParams(
max_entries=2000000,
expiry=-1,
),
record_caching_params=types.HnswCachingParams(
max_entries=2000000,
expiry=-1,
),
),
)
asvec index update \
--index-name search-space \
--namespace avs-index \
--hnsw-index-cache-expiry -1
--hnsw-index-cache-max-entries 2000000
--hnsw-record-cache-expiry -1
--hnsw-record-cache-max-entries 2000000
Healerโ
These values control the healing process associated with your index on indexer
or mixed nodes.
The values tell the cluster how aggressively to manage the healing process for a particular index.
By default these values have minimal impact (<20%) on throughput and query performance.
- Description
- Python
- asvec
Parameter | Description | Default |
---|---|---|
Parallelism | Specifies additional threads used by the healer on a single node. Increasing this will increase the amount of CPU spent towards healing. | 1 |
Schedule | A cron expression for running the healer process, set to run every fifteen minutes by default. | Every 15 minutes |
avs_client.index_update(
namespace="avs-index",
name="search-space",
hnsw_update_params=types.HnswIndexUpdate(
healer_params=types.HnswHealerParams(
parallelism=1,
schedule="*/15 * * * *",
),
),
)
asvec index update \
--index-name search-space \
--namespace avs-index \
--hnsw-healer-parallelism 1 \
--hnsw-healer-schedule "*/15 * * * *"
Additional configurationโ
The following are additional configurations that can be helpful.
- Description
- Python
- asvec
Parameter | Description | Default |
---|---|---|
Labels | Stores information about the index (for example, the model used to create the vector embedding). It is not relevant to search behavior. | N/A |
avs_client.index_update(
namespace="avs-index",
name="search-space",
index_labels={"model-used": "CLIP"},
)
asvec index update \
--index-name search-space \
--namespace avs-index \
--index-labels model=CLIP
Dropping an indexโ
If you no longer need to search across your data, you can drop an index to free up storage. The delete process is handled by the healer based on the original configuration of the index:
You can back up index data. See backup and restore instructions.
- Python
- asvec
avs_client.index_drop(
namespace="avs-index",
name="search-space",
)
asvec index drop --index-name INDEX_NAME --namespace NAMESPACE
Dropping an index only frees up space for the index records which is handled gradually by the healer. To delete all underlying vector records, you must delete individual records or delete the set that stores your data.
Troubleshootingโ
We recommend that you use the asvec CLI tool for troubleshooting your index.
Standalone indexing is not finishingโ
Standalone indexing requires the following two things to succeed:
- A standalone indexer node role available in your cluster, and
- Sufficient resources on that node to fit the index in memory.
If either of these conditions are not met the index status will not become READY
.
- Verify that there is a
STANDALONE INDEXER
node.
asvec node ls
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Nodes โ
โโโโโฌโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ NODE โ ROLES โ ENDPOINT โ CLUSTER ID โ VERSION โ VISIBLE NODES โ
โโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 1 โ 37627287442312 โ [INDEX_QUERY] โ 34.56.201.26:5000 โ 1685524865657087021 โ 1.1.0 โ { โ
โ โ โ โ โ โ โ 37912119677832: [34.123.26.109:5000] โ
โ โ โ โ โ โ โ 37956902785928: [34.133.135.181:5000] โ
โ โ โ โ โ โ โ 39450770215816: [35.225.89.37:5000] โ
โ โ โ โ โ โ โ } โ
โโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโค โโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 2 โ 37912119677832 โ [INDEX_QUERY] โ 34.123.26.109:5000 โ โ 1.1.0 โ { โ
โ โ โ โ โ โ โ 37627287442312: [34.56.201.26:5000] โ
โ โ โ โ โ โ โ 37956902785928: [34.133.135.181:5000] โ
โ โ โ โ โ โ โ 39450770215816: [35.225.89.37:5000] โ
โ โ โ โ โ โ โ } โ
โโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโค โโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 3 โ 37956902785928 โ [STANDALONE_INDEXER] โ 34.133.135.181:5000 โ โ 1.1.0 โ { โ
โ โ โ โ โ โ โ 37627287442312: [34.56.201.26:5000] โ
โ โ โ โ โ โ โ 37912119677832: [34.123.26.109:5000] โ
โ โ โ โ โ โ โ 39450770215816: [35.225.89.37:5000] โ
โ โ โ โ โ โ โ } โ
โโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโค โโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ 4 โ 39450770215816 โ [INDEXER INDEX_UPDATE] โ 35.225.89.37:5000 โ โ 1.1.0 โ { โ
โ โ โ โ โ โ โ 37627287442312: [34.56.201.26:5000] โ
โ โ โ โ โ โ โ 37912119677832: [34.123.26.109:5000] โ
โ โ โ โ โ โ โ 37956902785928: [34.133.135.181:5000] โ
โ โ โ โ โ โ โ } โ
โฐโโโโดโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
- If you have a
STANDALONE-INDEXER
node, use the following command to confirm the status of your index. Check theSTATUS
column forNOT-READY
.
asvec index ls
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Indexes โ
โโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโค
โ โ NAME โ NAMESPACE โ SET โ FIELD โ DIMENSIONS โ DISTANCE METRIC โ UNMERGED โ VECTOR RECORDS โ SIZE โ UNMERGED % โ MODE* โ STATUS โ
โโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโค
โ 1 โ sift-128-euclidean_Idx โ avs-data โ sift-128-euclidean โ HDF_embedding โ 128 โ SQUARED_EUCLIDEAN โ 0 โ 0 โ 0 โ 0 โ STANDALONE โNOT-READY โ
โฐโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโฏ
- When the status says READY, run the following command to switch your index mode to
DISTRIBUTED
to preserve your index updates.
asvec index update \
--index-name <INDEX-NAME> \
--namespace <NAMESPACE> \
--index-mode DISTRIBUTED
- After updating to distributed mode, monitor the status of your index by checking to see how much remains unmerged.
asvec index ls --seeds 35.225.89.37
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ Indexes โ
โโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโค
โ โ NAME โ NAMESPACE โ SET โ FIELD โ DIMENSIONS โ DISTANCE METRIC โ UNMERGED โ VECTOR RECORDS โ SIZE โ UNMERGED % โ MODE* โ STATUS โ
โโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโค
โ 1 โ sift-128-euclidean_Idx โ avs-data โ sift-128-euclidean โ HDF_embedding โ 128 โ SQUARED_EUCLIDEAN โ 464334 โ 548302 โ 549.43 MB โ 81.56% โ DISTRIBUTED โ READY โ
โฐโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโฏ
Switching between index modesโ
As shown in the previous example, switching from STANDALONE
mode to DISTRIBUTED
mode hands over index construction
to the index healer, resulting in an eventually consistent index state. Switching from DISTRIBUTED
to STANDALONE
forces an entire rebuild of your index.
Switching from DISTRIBUTED
to STANDALONE
takes your index offline for searching and can result in a poorer quality index. Aerospike recommends creating a new index and switching over to it instead.