Selection of target features for classification and cluster analysis of object relationship structures
Abstract
The article addresses the problem of assessing the quality of clustering objects with an arbitrary (non-spherical) configuration shape. Categories (high, low) are assigned to objects based on their distribution density in local regions of the feature space. A subset of boundary objects relative to objects of another category is determined. The concept of a metaobject is introduced for the case of multimodal data distribution. Feature stability values for objects from two classes corresponding to different modalities are used to describe a metaobject. Stability values across a set of class pairs form a unified space of new features for describing metaobjects. General constraints on the set of their admissible values are explained by the properties of membership functions in fuzzy logic theory. To analyze the clustering quality of metaobjects, they are divided into categories by distribution density. The uniqueness of the number and composition of groups with a non-spherical configuration is ensured by utilizing the connectivity relations of objects of the same category within a system of intersecting hyperspheres. The centers of the hyperspheres are the group objects. At least one boundary object lies at the intersection of two or more hyperspheres. To calculate intra-group proximity and inter-group difference measures, distances to the minimum coverage prototypes within the group and to a subset of boundary objects are used. Clustering quality estimates for each group are derived from the ratio of these measures. A condition on the estimate values for determining group compactness is proposed. A method for assessing the clustering quality of objects with an arbitrary configuration shape is presented, considering distribution density and placement topology. Topological properties are expressed through the connectivity relation of objects depending on their density category. Using real data, their mapping into a unified feature space is demonstrated, followed by an evaluation of the cluster analysis quality for metaobjects. The technology for forming a unified feature space for describing metaobjects can be useful in problems with missing data and in the presence of multimodal distribution.
Full Text:
PDF (Russian)References
N.A. Ignatev, B.Kh., Akbarov, “Estimation of the proximity of structures of relations of objects of the training sample on manifolds of sets of latent features”, Tomsk State University Journal of Control and Computer Science, no. 65, pp. 69–78. doi: 10.17223/19988605/65/7, 2023. [RUS]
E. R. Navruzov, “On the formation of a precedent base for solving information security problems”, RSUH Bulletin. Computer Science. Information Security. Mathematics., no. 3, pp. 66–84. doi: 10.28995/2686-679X-2022-3-66-84,2022. [RUS]
J. Kleinberg, An Impossibility Theorem for Clustering. https://www.cs.cornell.edu/home/kleinber/nips15.pdf.
N.A. Ignatyev, “Structure Choice for Relations between Objects in Metric Classification Algorithms”, Pattern Recognition and Image Analysis, vol. 28, no. 4, pp. 590–597, 2018
N.A. Ignatev, E.N. Zguralskaya, “Cluster analysis using learning based on connectivity and distribution density relations”, Tomsk State University Journal of Control and Computer Science, no. 68, pp. 66–74. doi: 10.17223/19988605/68/7, 2024. [RUS]
Y. Zhu, K.M. Ting, M.J. Carman, “Density-ratio based clustering for discovering clusters with varying densities”, Pattern Recognition, vol. 60, pp. 983–997, 2016.
R. R. Aidagulov, S. T. Glavatsky, Mikhalev, “Averaging Methods in Big Data Clustering Problems”, Intelligent Systems: Theory and Applications, no. 25(4), pp. 12–18, 2021. [RUS]
V.G. Bulatov, V.P. Alekseev, K.V. Vorontsov, “Determination of the Number of Topics Intrinsically: Is It Possible?”, https://arxiv.org/pdf/2406.10402 , 2024.
E.V. Sivogolovko, “Methodology for assessing the quality of clear clustering”, Computer tools in education, no. 4, pp. 14–31, 2011. [RUS]
https://archive.ics.uci.edu/dataset/186/wine+quality, free. language: English (accessed November 30, 2025).
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность Monetec 2026 СНЭ
ISSN: 2307-8162