# Graph Builder ## EvenRegularGraphBuilder The class EvenRegularGraphBuilder is used to construct search graphs. ### *class* deglib.builder.EvenRegularGraphBuilder(graph: [MutableGraph](graphs/mutable_graph.md#deglib.graph.MutableGraph), rng: Mt19937 | None = None, optimization_target: OptimizationTarget = OptimizationTarget.LowLID, extend_k: int = 0, extend_eps: float = 0.1, improve_k: int = 0, improve_eps: float = 0.001, max_path_length: int = 5, swap_tries: int = 0, additional_swap_tries: int = 0) Bases: `object` Constructs an EvenRegularGraphBuilder for building and optimizing a regular graph. This class provides functionality to incrementally build and optimize a graph structure by adding/removing vertices and improving edge connections through various strategies. #### add_entry(external_label: int | Iterable[int] | ndarray, feature: ndarray) Provide the builder with new entries to be added to the graph during the build process. Can add a single entry or a batch of entries. For batch processing, the feature array should have shape [N, D] where N is the number of entries and D is the dimensionality. * **Parameters:** * **external_label** – The label(s) that name the added vertex/vertices. For batch processing, should be a list or array with length N matching the number of features * **feature** – The feature vector(s) to be added. Shape [D] for single entry or [N, D] for batch * **Raises:** * **InvalidShapeException** – If feature array has invalid shape (not 1D or 2D) * **AssertionError** – If number of features doesn’t match number of labels in batch processing #### remove_entry(external_label: int) Command the builder to remove a vertex from the graph as quickly as possible. * **Parameters:** **external_label** (*int*) – The external label that identifies the graph vertex to remove #### get_num_new_entries() → int Get the number of entries that will be added to the graph. * **Returns:** Number of entries queued for addition * **Return type:** int #### get_num_remove_entries() → int Get the number of entries that will be removed from the graph. * **Returns:** Number of entries queued for removal * **Return type:** int #### set_thread_count(thread_count: int) Set the number of threads used to extend the graph during building. When the thread count is greater than 1 and the optimization target is not StreamingData, the builder will utilize multiple threads to add elements to the graph in parallel. By default, all available CPU cores/threads are used unless specified. Note: The order in which elements are added is not guaranteed when using multiple threads. * **Parameters:** **thread_count** (*int*) – Number of threads to use for graph extension #### set_batch_size(tasks_per_batch: int, task_size: int) Set the batch size parameters for parallel graph construction. The builder processes elements in batches to minimize synchronization between threads. The total batch size is calculated as: batch_size = thread_count \* tasks_per_batch \* task_size Effects of thread count and batch size: : - thread_count = 1 and batch_size = 1: low throughput, medium latency, order of elements is guaranteed - thread_count > 1 and batch_size = 1: high throughput, low latency, order of elements is not guaranteed - thread_count > 1 and batch_size > 1: highest throughput, highest latency, order of elements is not guaranteed Note: The optimization target StreamingData always uses a thread count of 1. * **Parameters:** * **tasks_per_batch** (*int*) – Number of tasks for each thread in one batch (default: 32). * **task_size** (*int*) – Number of elements each thread processes in one task (default: 10). #### get_batch_size() → int Get the current batch size used for parallel graph construction. * **Returns:** The current batch size * **Return type:** int #### build(callback: Callable[[BuilderStatus], None] | str | None = None, infinite: bool = False) Build the graph. This could be run on a separate thread in an infinite loop. Call stop() to end this process. * **Parameters:** * **callback** – The callback that is called after each step of the build process. A BuilderStatus is the only argument to the function. If None nothing is printed. If callback is the string “progress”, a simple progress bar is printed to stdout. * **infinite** – If set to True, blocks indefinitely, until the stop() function is called. Can be used, if build() is run in a separate thread. #### stop() Stop the build process. The build process can only be stopped between steps, not during step execution. ## build_from_data() ### deglib.builder.build_from_data(data: ndarray, labels: Iterable[int] | None = None, edges_per_vertex: int = 32, capacity: int = -1, metric: Metric = Metric.L2, rng: Mt19937 | None = None, optimization_target: OptimizationTarget = OptimizationTarget.LowLID, extend_k: int = 0, extend_eps: float = 0.2, improve_k: int = 0, improve_eps: float = 0.001, max_path_length: int = 5, swap_tries: int = 0, additional_swap_tries: int = 0, callback: Callable[[BuilderStatus], None] | str | None = None) → [SizeBoundedGraph](graphs/size_bounded_graph.md#deglib.graph.SizeBoundedGraph) Create a new graph built from the given data using an EvenRegularGraphBuilder. This is a convenience function that creates a graph, builder, adds all data entries, and builds the complete graph in one call. Infinite building is not supported. * **Parameters:** * **data** (*np.ndarray*) – Feature data with shape [N, D] where N is number of samples, D is dimensionality * **labels** (*Iterable* *[**int* *]* *|* *None*) – Labels for each data entry with shape [N]. If None, labels 0 to N-1 are created. If numpy array, should have dtype uint32 * **edges_per_vertex** (*int*) – Number of edges per vertex in the graph * **capacity** (*int*) – Maximum number of vertices the graph can hold. If <= 0, defaults to data.shape[0] * **metric** (*Metric*) – Distance metric for measuring feature similarity * **rng** (*Mt19937* *|* *None*) – Random number generator. If None, a new Mt19937 will be created * **optimization_target** (*OptimizationTarget*) – Optimization strategy based on data distribution characteristics * **extend_k** (*int*) – Number of neighbors to consider when extending the graph * **extend_eps** (*float*) – Epsilon value for neighbor search during graph extension * **improve_k** (*int*) – Number of neighbors to consider when improving the graph * **improve_eps** (*float*) – Epsilon value for neighbor search during graph improvement * **max_path_length** (*int*) – Maximum number of edge swaps in a single improvement attempt * **swap_tries** (*int*) – Number of improvement attempts per build step * **additional_swap_tries** (*int*) – Additional improvement attempts after a successful improvement * **callback** (*Callable* *[* *[**deglib_cpp.BuilderStatus* *]* *,* *None* *]* *|* *str* *|* *None*) – Callback function for build progress reporting. If “progress”, shows progress bar * **Returns:** The constructed and optimized graph * **Return type:** [SizeBoundedGraph](graphs/size_bounded_graph.md#deglib.graph.SizeBoundedGraph)