TY - JOUR TI - Decentralized online clustering for supporting autonomic management of distributed systems DO - https://doi.org/doi:10.7282/T32J6BXR PY - 2010 AB - Distributed computational infrastructures, as well as the applications and services that they support, are increasingly becoming an integral part of society and affecting every aspect of life. As a result, ensuring their efficient and robust operation is critical. However, the scale and overall complexity of these systems is growing at an alarming rate (current data centers contain tens to hundreds of thousands of computing and storage devices running complex applications), making the management of these systems extremely challenging and rapidly exceeding human capability. The large quantities of distributed system data, in the form of user and component interaction and status events, contain meaningful information that can be used to infer the states of different components or of the system as a whole. Accurate and timely knowledge of these states is essential for verifying the correctness and efficiency of the operation of the system, as well as for discovering specific situations of interest, such as anomalies or faults, that require the application of appropriate management actions. Autonomic systems/applications must therefore be able to effectively process the large amounts of distributed data and to characterize operational states in a robust, accurate and timely manner. Although highly accurate, centralized approaches for distributed system management are infeasible in general because of the costs of centralization in terms of infrastructure, fault tolerance, and responsiveness. Since data is naturally distributed and the collective computing power of networked elements (ranging from sensor and device networks to supercomputer clusters and multi-organization grids) is enough to be harnessed for value added, system-level services, online and decentralized approaches for monitoring, data analysis, and self-management are not only feasible, but also quite attractive.This work is based on the premise of realizing and applying online data analysis, exploiting the collective computing resources of distributed systems for supporting autonomic management capabilities. Specifically, we propose and develop decentralized online clustering as a data analysis mechanism and infrastructure, evaluate its accuracy and performance with respect to other known clustering methods, and apply it to the following autonomic management problems: 1) System profiling and outlier detection from distributed data, 2) definition, autonomic adaptation, and application of management policies, and 3) VM provisioning and energy management in data centers. KW - Electrical and Computer Engineering KW - Distributed databases--Management KW - Autonomic computing LA - eng ER -