Media of University of Babylon | PhD Dissertation at the College of Information Technology Discusses the Optimal K Value Estimation for the K-Means Algorithm

PhD Dissertation at the College of Information Technology Discusses the Optimal K Value Estimation for the K-Means Algorithm

By : Duhaa Fadill Abbas

Date : 14/3/2025

PhD Dissertation at the College of Information Technology Discusses the Optimal K Value Estimation for the K-Means Algorithm in Data Stream Clustering

Duhaa Fadill Abbas
The Software Department at the College of Information Technology, University of Babylon, held a PhD dissertation defense titled "Optimal K Value Estimation for the K-Means Algorithm in Data Stream Clustering." The dissertation was presented by researcher Abeer Mahmoud Hassan Ahmed under the supervision of Dr. Saad Talib Hassoun on Thursday, March 13, 2025, in the College Conference Hall.

The dissertation addresses a fundamental challenge in data stream clustering, focusing on enhancing the K-Means algorithm, which is widely used due to its simplicity and efficiency. The research proposes novel models to determine the optimal K value for clustering continuous data streams, which are characterized by high velocity, large volume, heterogeneity, and real-time generation from multiple sources.

Proposed Models and Methodologies

The study introduces advanced clustering techniques, including:

- Adaptive Dynamic Diameter and Boundary Threshold (ADDBT), which utilizes a Probability Density Function (PDF) instead of the conventional Gaussian distribution.
- Prototype Multi-Channel (P-M-C) Model, which estimates the optimal K value based on prototype selection and frequency distribution.
- Experimental Evaluation and Real-World Data Collection

The proposed models were evaluated using eight streaming datasets, including four benchmark datasets:

- Iris Dataset
- Household Electricity Consumption Stream
- Global Traffic Signal Data Stream
- KDD Cup Data Stream

Additionally, real-time sensor-based data were collected and analyzed, including:

- Weather data recorded for 10 months in Hilla, Iraq.
- Human physiological data collected using wearable sensors over four months.
- Child health data monitored over four months across six children.
- Medical patient data gathered from 150 individuals in a pharmacy setting.
- Key Findings and Contributions

The results demonstrate significant improvements in clustering quality compared to traditional algorithms such as CMFT Kernel-K-Means, CluStream, and D-Stream. The Silhouette Score metric indicated an average enhancement of 10%, with performance gains ranging from 45% to 200% when compared to KM-STREEM++, IAPKM, and other baseline approaches.

This research presents a major advancement in real-time data clustering, offering robust, scalable, and adaptive techniques that enhance clustering accuracy for dynamic, high-velocity data streams. The findings contribute to various fields, including Artificial Intelligence, Internet of Things (IoT), medical data analysis, and environmental monitoring, providing a foundation for intelligent data-driven decision-making systems.

ت	اسماء اعضاء لجنة المناقشة	اللقب العلمي	الاختصاص الدقيق	مكان العمل	المنصب
1	د. نضال خضير العبادي	استاذ	معالجة صور	جامعة المستقبل	رئيساً
2	د. رفاه محمد كاظم	استاذ	حوسبة ضبابية	جامعة بابل / قسم الدراسات و التخطيط	عضوا
3	د. عماد عيسى عبدالكريم	استاذ	ذكاء اصطناعي	الجامعة المستنصرية / كلية التربية	عضوا
4	د. احمد حبيب سعيد	استاذ مساعد	انظمة معلومات و تقنيات ويب	جامعة بابل / كلية تكنولوجيا المعلومات	عضوا
5	د.صفا سعد عباس	استاذ مساعد	وسائط متعددة و امنية المعلومات	جامعة بابل / كلية تكنولوجيا المعلومات	عضوا
6	د. سعد طالب حسون	استاذ	محاكاة الحاسوب	جامعة بابل / كلية تكنولوجيا المعلومات	عضوا و مشرفا

photo:

Scientific branch news

Events