Fast Distributed Outlier Detection in Mixed-Attribute Data Sets

Otey, Matthew Eric; Ghoting, Amol; Parthasarathy, Srinivasan
May 2006
Data Mining & Knowledge Discovery;May2006, Vol. 12 Issue 2/3, p203
Academic Journal
Efficiently detecting outliers or anomalies is an important problem in many areas of science, medicine and information technology. Applications range from data cleaning to clinical diagnosis, from detecting anomalous defects in materials to fraud and intrusion detection. Over the past decade, researchers in data mining and statistics have addressed the problem of outlier detection using both parametric and non-parametric approaches in a centralized setting. However, there are still several challenges that must be addressed. First, most approaches to date have focused on detecting outliers in a continuous attribute space. However, almost all real-world data sets contain a mixture of categorical and continuous attributes. Categorical attributes are typically ignored or incorrectly modeled by existing approaches, resulting in a significant loss of information. Second, there have not been any general-purpose distributed outlier detection algorithms. Most distributed detection algorithms are designed with a specific domain (e.g. sensor networks) in mind. Third, the data sets being analyzed may be streaming or otherwise dynamic in nature. Such data sets are prone to concept drift, and models of the data must be dynamic as well. To address these challenges, we present a tunable algorithm for distributed outlier detection in dynamic mixed-attribute data sets.


Related Articles

  • KNOWLEDGE MANAGEMENT AS A FOUNDATION FOR DECISION SUPPORT SYSTEMS. Jones, Kiku // Journal of Computer Information Systems;Summer2006, Vol. 46 Issue 4, p116 

    The state of the world today has demanded better decision making from organizations. It is imperative that these decisions be made based on facts about the current decision and information about past similar decisions. In order to do this, organizations have to capture and share knowledge found...

  • Decision Support System for Medical Diagnosis Using Data Mining. Kumar, D. Senthil; Sathyadevi, G.; Sivanesh, S. // International Journal of Computer Science Issues (IJCSI);May2011, Vol. 8 Issue 3, p147 

    The healthcare industry collects a huge amount of data which is not properly mined and not put to the optimum use. Discovery of these hidden patterns and relationships often goes unexploited. Our research focuses on this aspect of Medical diagnosis by learning pattern through the collected data...

  • The Technical Framework of Multi-source Remote Sensing Data Mining. Huili Gong; Jing Li; Wenji Zhao; Songmei Zhang // Journal of Systems Science & Information;Sep2004, Vol. 2 Issue 3, p509 

    With the delivery of a great deal remote sensing data to land from Landsat constantly. Remote Sensing Satellite Ground Station accumulates abundant satellite remote sensing data. For lack of effective data mining (DM) and knowledge Discovery from Databases (KDD technique) to these data, most...

  • Reconstruction of reflection characteristics of 3D objects in a monostatic optical location system. Labunets, L.; Lukin, D.; Chervyakov, A. // Journal of Communications Technology & Electronics;Dec2012, Vol. 57 Issue 12, p1265 

    A technique for data mining the reflection characteristics of 3D objects of monostatic optical location systems is discussed. The time profile of the pulse scattering cross section (SCS) of a target has been corrected via the deconvolution method. The continuous and 'discontinuous' components of...

  • A Novelty Approach for Finding Frequent Itemsets in Horizontal and Vertical Layout-HVCFPMINETREE.  // International Journal of Computer Applications;Nov2010, Vol. 10, p20 

    The article presents a novel approach which aimed to design FPTree algorithm as Horizontal and Vertical Compact Frequent Itemset Pattern Mining Tree (HVCFPMINETREE). It discusses data mining techniques and the itemsets needed for the conversion process of the tree structure. It also illustrates...

  • Efficient Algorithms for Mining and Incremental Update of Maximal Frequent Sequences. Kao, Ben; Minghua Zhang; Chi-Lap Yip; Cheung, David W. // Data Mining & Knowledge Discovery;Mar2005, Vol. 10 Issue 2, p87 

    We study two problems: (1) mining frequent sequences from a transactional database, and (2) incremental update of frequent sequences when the underlying database changes over time. We review existing sequence mining algorithms including GSP, PrefixSpan, SPADE, and ISM. We point out the large...

  • THE STUDY AND VERIFICATION OF MATHEMATICAL MODELING FOR CUSTOMER PURCHASING BEHAVIOR. Sung-Shun Weng; Ruey-Kei Chiu; Ben-Jeng Wang; Sheng-Hung Su // Journal of Computer Information Systems;Winter2006/2007, Vol. 47 Issue 2, p46 

    Recent information technology developments have forced companies to face stiff competition while the lines between industries have disappeared, giving rise to the possibility of cooperative partnerships. Current industries have already entered the era of cross-industry competition, and only by...

  • INTELLIGENCE behind DATA. Turney, Michael // Canadian Underwriter;May2003, Vol. 70 Issue 5, p44 

    Presents information on the enterprise intelligence data mining used by insurance companies. Benefits of data mining to insurers; Obstacles to the development of data mining solutions; Areas included in the approach to enterprise intelligence.

  • Scalable Clustering Algorithms with Balancing Constraints. Banerjee, Arindam; Ghose, Joydeep // Data Mining & Knowledge Discovery;Nov2006, Vol. 13 Issue 3, p365 

    Clustering methods for data-mining problems must be extremely scalable. In addition, several data mining applications demand that the clusters obtained be balanced, i.e., of approximately the same size or importance. In this paper, we propose a general framework for scalable, balanced...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics