By Akash Rawat
This packet has full kd tree implementation in python programming language along with a naive search algorithm.
The Python program implements the insertion of data into the K-d tree (Kd tree creation). Then, searches nearest - k neighbors to the coordinates provides as queries. The tree creation function works recursively. While insertion data is divided into two parts for the left and right subtree of nodes, this program uses median as dividing criteria. Axis used to divide the data is selected based on higher spread along the axis. Data is kept only in the leaf nodes, internal nodes provide searching mechanics. Alpha value denotes the number of values in each node. For searching, at the first step, we reach a leaf and add estimate results from this leaf. Then, we traverse backward and use kd-tree properties to reach the final answer without going through all the values.
The packet also includes a naive search algorithm, where the program uses a coordinate geometry formula to calculate distances of all the points from the query point, and then select k data values with minimum distances. kd tree search is much faster than the naive algorithm, difference can easily be observed in the case of big dataset. In K-d tree search, we don't need to calculate the distance of each point, even we don't need to go through each data entry while searching (some subtrees are skipped using tree property).
Dataset has the first column as dataID, second as X-axis and the last column denotes y-axis.
TO RUN: python3 KDtree.py (Packet includes usage video screenshot)
Note: Alpha value is the maximum number of data points in each leaf node, and k value is the number of nearest elements to be found nearest to query coordinate.