Skip to content

builder

Bluemi edited this page Sep 5, 2025 · 1 revision

Graph Builder

EvenRegularGraphBuilder

The class EvenRegularGraphBuilder is used to construct search graphs.

class deglib.builder.EvenRegularGraphBuilder(graph: MutableGraph, rng: Mt19937 | None = None, optimization_target: OptimizationTarget = OptimizationTarget.LowLID, extend_k: int = 0, extend_eps: float = 0.1, improve_k: int = 0, improve_eps: float = 0.001, max_path_length: int = 5, swap_tries: int = 0, additional_swap_tries: int = 0)

Bases: object

Constructs an EvenRegularGraphBuilder for building and optimizing a regular graph.

This class provides functionality to incrementally build and optimize a graph structure by adding/removing vertices and improving edge connections through various strategies.

add_entry(external_label: int | Iterable[int] | ndarray, feature: ndarray)

Provide the builder with new entries to be added to the graph during the build process.

Can add a single entry or a batch of entries. For batch processing, the feature array should have shape [N, D] where N is the number of entries and D is the dimensionality.

  • Parameters:
    • external_label – The label(s) that name the added vertex/vertices. For batch processing, should be a list or array with length N matching the number of features
    • feature – The feature vector(s) to be added. Shape [D] for single entry or [N, D] for batch
  • Raises:
    • InvalidShapeException – If feature array has invalid shape (not 1D or 2D)
    • AssertionError – If number of features doesn’t match number of labels in batch processing

remove_entry(external_label: int)

Command the builder to remove a vertex from the graph as quickly as possible.

  • Parameters: external_label (int) – The external label that identifies the graph vertex to remove

get_num_new_entries() → int

Get the number of entries that will be added to the graph.

  • Returns: Number of entries queued for addition
  • Return type: int

get_num_remove_entries() → int

Get the number of entries that will be removed from the graph.

  • Returns: Number of entries queued for removal
  • Return type: int

set_thread_count(thread_count: int)

Set the number of threads used to extend the graph during building.

When the thread count is greater than 1 and the optimization target is not StreamingData, the builder will utilize multiple threads to add elements to the graph in parallel. By default, all available CPU cores/threads are used unless specified. Note: The order in which elements are added is not guaranteed when using multiple threads.

  • Parameters: thread_count (int) – Number of threads to use for graph extension

set_batch_size(tasks_per_batch: int, task_size: int)

Set the batch size parameters for parallel graph construction.

The builder processes elements in batches to minimize synchronization between threads. The total batch size is calculated as: batch_size = thread_count * tasks_per_batch * task_size

Effects of thread count and batch size: : - thread_count = 1 and batch_size = 1: low throughput, medium latency, order of elements is guaranteed

  • thread_count > 1 and batch_size = 1: high throughput, low latency, order of elements is not guaranteed
  • thread_count > 1 and batch_size > 1: highest throughput, highest latency, order of elements is not guaranteed

Note: The optimization target StreamingData always uses a thread count of 1.

  • Parameters:
    • tasks_per_batch (int) – Number of tasks for each thread in one batch (default: 32).
    • task_size (int) – Number of elements each thread processes in one task (default: 10).

get_batch_size() → int

Get the current batch size used for parallel graph construction.

  • Returns: The current batch size
  • Return type: int

build(callback: Callable[[BuilderStatus], None] | str | None = None, infinite: bool = False)

Build the graph. This could be run on a separate thread in an infinite loop. Call stop() to end this process.

  • Parameters:
    • callback – The callback that is called after each step of the build process. A BuilderStatus is the only argument to the function. If None nothing is printed. If callback is the string “progress”, a simple progress bar is printed to stdout.
    • infinite – If set to True, blocks indefinitely, until the stop() function is called. Can be used, if build() is run in a separate thread.

stop()

Stop the build process.

The build process can only be stopped between steps, not during step execution.

build_from_data()

deglib.builder.build_from_data(data: ndarray, labels: Iterable[int] | None = None, edges_per_vertex: int = 32, capacity: int = -1, metric: Metric = Metric.L2, rng: Mt19937 | None = None, optimization_target: OptimizationTarget = OptimizationTarget.LowLID, extend_k: int = 0, extend_eps: float = 0.2, improve_k: int = 0, improve_eps: float = 0.001, max_path_length: int = 5, swap_tries: int = 0, additional_swap_tries: int = 0, callback: Callable[[BuilderStatus], None] | str | None = None) → SizeBoundedGraph

Create a new graph built from the given data using an EvenRegularGraphBuilder.

This is a convenience function that creates a graph, builder, adds all data entries, and builds the complete graph in one call. Infinite building is not supported.

  • Parameters:
    • data (np.ndarray) – Feature data with shape [N, D] where N is number of samples, D is dimensionality
    • labels (Iterable *[*int ] | None) – Labels for each data entry with shape [N]. If None, labels 0 to N-1 are created. If numpy array, should have dtype uint32
    • edges_per_vertex (int) – Number of edges per vertex in the graph
    • capacity (int) – Maximum number of vertices the graph can hold. If <= 0, defaults to data.shape[0]
    • metric (Metric) – Distance metric for measuring feature similarity
    • rng (Mt19937 | None) – Random number generator. If None, a new Mt19937 will be created
    • optimization_target (OptimizationTarget) – Optimization strategy based on data distribution characteristics
    • extend_k (int) – Number of neighbors to consider when extending the graph
    • extend_eps (float) – Epsilon value for neighbor search during graph extension
    • improve_k (int) – Number of neighbors to consider when improving the graph
    • improve_eps (float) – Epsilon value for neighbor search during graph improvement
    • max_path_length (int) – Maximum number of edge swaps in a single improvement attempt
    • swap_tries (int) – Number of improvement attempts per build step
    • additional_swap_tries (int) – Additional improvement attempts after a successful improvement
    • callback (Callable [ *[*deglib_cpp.BuilderStatus ] , None ] | str | None) – Callback function for build progress reporting. If “progress”, shows progress bar
  • Returns: The constructed and optimized graph
  • Return type: SizeBoundedGraph

Clone this wiki locally