TABLE OF CONTENTS (some of the links are under construction; to be finished soon) List of Figures...............................................xxiii List of Tables.................................................xxix Foreword....................................................xxxvii Preface.......................................................xxxix Acknowledgments................................................xlvii Chapter 1 A COMMON LOGIC APPROACH TO DATA MINING AND PATTERN RECOGNITION, by A. Zakrevskij.........................1 Click here for the abstract of this Chapter in PDF format 1. Introduction..............................................2 1.1 Using Decision Functions..........................2 1.2 Characteristic Features of the New Approach.......4 2. Data and Knowledge........................................6 2.1 General Definitions...............................6 2.2 Data and Knowledge Representation the Case of Boolean Attributes............9 2.3 Data and Knowledge Representation the Case of Multi-Valued Attributes......10 3. Data Mining – Inductive Inference........................12 3.1 Extracting Knowledge from the Boolean Space of Attributes............................12 3.2 The Screening Effect.............................18 3.3 Inductive Inference from Partial Data............20 3.4 The Case of Multi-Valued Attributes..............21 4. Knowledge Analysis and Transformations...................23 4.1 Testing for Consistency..........................23 4.2 Simplification...................................27 5. Pattern Recognition – Deductive Inference................28 5.1 Recognition in the Boolean Space.................28 5.2 Appreciating the Asymmetry in Implicative Regularities...31 5.3 Deductive Inference in Finite Predicates.........34 5.4 Pattern Recognition in the Space of Multi-Valued Attributes.......................36 6. Some Applications........................................38 7. Conclusions..............................................40 References.......................................................41 Author’s Biographical Statement..................................43 Chapter 2 THE ONE CLAUSE AT A TIME (OCAT) APPROACH TO DATA MINING AND KNOWLEDGE DISCOVERY, by E. Triantaphyllou........................45 Click here for the abstract of this Chapter in PDF format 1. Introduction.............................................46 2. Some Background Information..............................49 3. Definitions and Terminology..............................52 4. The One Clause at a Time (OCAT) Approach.................54 4.1 Data Binarization................................54 4.2 The One Clause at a Time (OCAT) Concept..........58 4.3 A Branch-and-Bound Approach for Inferring Clauses........................59 4.4 Inference of the Clauses for the Illustrative Example.................62 4.5 A Polynomial Time Heuristic for Inferring Clauses........................65 5. A Guided Learning Approach...............................70 6. The Rejectability Graph of Two Collections of Examples...72 6.1 The Definition of the Rejectability Graph................72 6.2 Properties of the Rejectability Graph....................74 6.3 On the Minimum Clique Cover of the Rejectability Graph...............76 7. Problem Decomposition....................................77 7.1 Connected Components.....................................77 7.2 Clique Cover.............................................78 8. An Example of Using the Rejectability Graph..............79 9. Conclusions..............................................82 References.......................................................83 Author’s Biographical Statement..................................87 Chapter 3 AN INCREMENTAL LEARNING ALGORITHM FOR INFERRING LOGICAL RULES FROM EXAMPLES IN THE FRAMEWORK OF THE COMMON REASONING PROCESS, by X. Naidenova.........................................89 Click here for the abstract of this Chapter in PDF format 1. Introduction.............................................90 2. A Model of Rule-Based Logical Inference..................96 2.1 Rules Acquired from Experts or Rules of the First Type...........................97 2.2 Structure of the Knowledge Base..................98 2.3 Reasoning Operations for Using Logical Rules of the First Type..........................100 2.4 An Example of the Reasoning Process.............102 3. Inductive Inference of Implicative Rules From Examples..103 3.1 The Concept of a Good Classification Test.......103 3.2 The Characterization of Classification Tests....105 3.3 An Approach for Constructing Good Irredundant Tests.......................106 3.4 Structure of Data for Inferring Good Diagnostic Tests........................107 3.5 The Duality of Good Diagnostic Tests............109 3.6 Generation of Dual Objects with the Use of Lattice Operations...................110 3.7 Inductive Rules for Constructing Elements of a Dual Lattice..........................111 3.8 Special Reasoning Operations for Constructing Elements of a Dual Lattice..............112 3.8.1 The Generalization Rule.........................112 3.8.2 The Diagnostic Rule.............................113 3.8.3 The Concept of an Essential Example.............114 4. Algorithms for Constructing All Good Maximally Redundant Tests..................115 4.1 NIAGaRa: A Non-Incremental Algorithm for Constructing All Good Maximally Redundant Tests.....115 4.2 Decomposition of Inferring Good Classification Tests into Subtasks.....................122 4.2.1 Forming the Subtasks............................123 4.2.2 Reducing the Subtasks...........................125 4.2.3 Choosing Examples and Values for the Formation of Subtasks.............................127 4.2.4 An Approach for Incremental Algorithms..........129 4.3 DIAGaRa: An Algorithm for Inferring All GMRTs with the Decomposition into Subtasks of the First Kind..........................130 4.3.1 The Basic Recursive Algorithm for Solving a Subtask Of the First Kind.......................130 4.3.2 An Approach for Forming the Set STGOOD..........131 4.3.3 The Estimation of the Number of Subtasks to Be Solved...............................131 4.3.4 CASCADE: Incrementally Inferring GMRTs Based on the Procedure DIAGaRa..........132 4.4 INGOMAR: An Incremental Algorithm for Inferring All GMRTs.....................132 5. Conclusions.............................................138 Acknowledgments.................................................138 Appendix........................................................139 References......................................................143 Author’s Biographical Statement.................................147 Chapter 4 DISCOVERING RULES THAT GOVERN MONOTONE PHENOMENA, by V.I. Torvik and E. Triantaphyllou.................149 Click here for the abstract of this Chapter in PDF format 1. Introduction............................................150 2. Background Information..................................152 2.1 Problem Descriptions............................152 2.2 Hierarchical Decomposition of Variables.........155 2.3 Some Key Properties of Monotone Boolean Functions...............................157 2.4 Existing Approaches to Problem 1................160 2.5 An Existing Approach to Problem 2...............162 2.6 Existing Approaches to Problem 3................162 2.7 Stochastic Models for Problem 3.................162 3. Inference Objectives and Methodology....................165 3.1 The Inference Objective for Problem 1...........165 3.2 The Inference Objective for Problem 2...........166 3.3 The Inference Objective for Problem 3...........166 3.4 Incremental Updates for the Fixed Misclassification Probability Model.......................167 3.5 Selection Criteria for Problem 1................167 3.6 Selection Criteria for Problems 2.1, 2.2, and 2.3..............168 3.7 Selection Criterion for Problem 3...............169 4. Experimental Results....................................174 4.1 Experimental Results for Problem 1..............174 4.2 Experimental Results for Problem 2..............176 4.3 Experimental Results for Problem 3..............179 5. Summary and Discussion..................................183 5.1 Summary of the Research Findings................183 5.2 Significance of the Research Findings...........186 5.3 Future Research Directions......................187 6. Concluding Remarks......................................187 References......................................................188 Authors’ Biographical Statements................................191 Chapter 5 LEARNING LOGIC FORMULAS AND RELATED ERROR DISTRIBUTIONS, by G. Felici, F. Sun, and K. Truemper............193 Click here for the abstract of this Chapter in PDF format 1. Introduction............................................194 2. Logic Data and Separating Set...........................197 2.1 Logic Data......................................197 2.2 Separating Set..................................198 3. Problem Formulation.....................................200 3.1 Logic Variables.................................201 3.2 Separation Condition for Records in A...........201 3.3 Separation Condition for Records in B...........201 3.4 Selecting a Largest Subset......................202 3.5 Selecting a Separating Vector...................203 3.6 Simplification for 0/1 Records..................204 4. Implementation of Solution Algorithm....................204 5. Leibniz System..........................................205 6. Simple-Minded Control of Classification Errors..........206 7. Separations for Voting Process..........................207 8. Probability Distribution of Vote-Total..................208 8.1 Mean and Variance for ZA........................209 8.2 Random Variables Yi.............................211 8.3 Distribution for Y..............................212 8.4 Distribution for ZA.............................213 8.5 Probabilities of Classification Errors..........213 8.6 Summary of Algorithm............................216 9. Computational Results...................................216 9.1 Breast Cancer Diagnosis.........................218 9.2 Australian Credit Card..........................219 9.3 Congressional Voting............................219 9.4 Diabetes Diagnosis..............................219 9.5 Heart Disease Diagnosis.........................220 9.6 Boston Housing..................................221 10. Conclusions.............................................221 References......................................................222 Authors’ Biographical Statements................................226 Chapter 6 FEATURE SELECTION FOR DATA MINING by V. de Angelis, G. Felici, and G. Mancinelli..................227 Click here for the abstract of this Chapter in PDF format 1. Introduction............................................228 2. The Many Routes to Feature Selection....................229 2.1 Filter Methods..................................232 2.2 Wrapper Methods.................................234 3. Feature Selection as a Subgraph Selection Problem.......237 4. Basic IP Formulation and Variants.......................238 5. Computational Experience................................241 5.1 Test on Generated Data..........................242 5.2 An Application..................................246 6. Conclusions.............................................248 References......................................................249 Authors’ Biographical Statements................................252 Chapter 7 TRANSFORMATION OF RATIONAL AND SET DATA TO LOGIC DATA, by S. Bartnikowski, M. Granberry, J. Mugan, and K. Truemper.......................................253 Click here for the abstract of this Chapter in PDF format 1. Introduction............................................254 1.1 Transformation of Set Data..............................254 1.2 Transformation of Rational Data.........................254 1.3 Computational Results...................................256 1.4 Entropy-Based Approaches................................257 1.5 Bottom-up Methods.......................................258 1.6 Other Approaches........................................258 2. Definitions.............................................259 2.1 Unknown Values..........................................259 2.2 Records.................................................260 2.3 Populations.............................................260 2.4 DNF Formulas............................................260 2.5 Clash Condition.........................................261 3. Overview of Transformation Process......................262 4. Set Data to Logic Data..................................262 4.1 Case of Element Entries.................................262 4.2 Case of Set Entries.....................................264 5. Rational Data to Logic Data.............................264 6. Initial Markers.........................................265 6.1 Class Values............................................265 6.2 Smoothed Class Values...................................266 6.3 Selection of Standard Deviation.........................266 6.4 Definition of Markers...................................269 6.5 Evaluation of Markers...................................271 7. Additional Markers......................................271 7.1 Critical Interval.......................................272 7.2 Attractiveness of Pattern Change........................272 7.3 Selection of Marker.....................................273 8. Computational Results...................................274 9. Summary.................................................275 References......................................................276 Authors’ Biographical Statements................................278 Chapter 8 DATA FARMING: CONCEPTS AND METHODS, by A. Kusiak................279 Click here for the abstract of this Chapter in PDF format 1. Introduction............................................280 2. Data Farming Methods....................................281 2.1 Feature Evaluation..............................282 2.2 Data Transformation.............................282 2.2.1 Filling in Missing Values.......................282 2.2.2 Discretization..................................283 2.2.3 Feature Content Modification....................283 2.2.4 Feature Transformation..........................286 2.2.5 Data Evolution..................................289 2.3 Knowledge Transformation........................290 2.4 Outcome Definition..............................295 2.5 Feature Definition..............................297 3. The Data Farming Process................................298 4. A Case Study............................................299 5. Conclusions.............................................301 References......................................................302 Author’s Biographical Statement.................................304 Chapter 9 RULE INDUCTION THROUGH DISCRETE SUPPORT VECTOR DECISION TREES, by C. Orsenigo and C. Vercellis..........305 Click here for the abstract of this Chapter in PDF format 1. Introduction............................................306 2. Linear Support Vector Machines..........................308 3. Discrete Support Vector Machines with Minimum Features..312 4. A Sequential LP-based Heuristic for Problems LDVM and FDVM..................314 5. Building a Minimum Features Discrete Support Vector Decision Tree....................316 6. Discussion and Validation of the Proposed Classifier....319 7. Conclusions.............................................322 References......................................................324 Authors’ Biographical Statements................................326 Chapter 10 MULTI-ATTRIBUTE DECISION TREES AND DECISION RULES, by J.-Y. Lee and S. Olafsson....................327 Click here for the abstract of this Chapter in PDF format 1. Introduction............................................328 2. Decision Tree Induction.................................329 2.1 Attribute Evaluation Rules......................330 2.2 Entropy-Based Algorithms........................332 2.3 Other Issues in Decision Tree Induction.........333 3. Multi-Attribute Decision Trees..........................334 3.1 Accounting for Interactions between Attributes..334 3.2 Second Order Decision Tree Induction............335 3.3 The SODI Algorithm..............................339 4. An Illustrative Example.................................334 5. Numerical Analysis......................................347 6. Conclusions.............................................349 Appendix: Detailed Model Comparison.............................351 References......................................................355 Authors’ Biographical Statements................................358 Chapter 11 KNOWLEDGE ACQUISITION AND UNCERTAINTY IN FAULT DIAGNOSIS: A ROUGH SETS PERSPECTIVE, by L.-Y. Zhai, L.-P. Khoo, and S.-C. Fok........................359 Click here for the abstract of this Chapter in PDF format 1. Introduction............................................360 2. An Overview of Knowledge Discovery and Uncertainty......361 2.1 Knowledge Acquisition and Machine Learning......361 2.1.1 Knowledge Representation........................361 2.1.2 Knowledge Acquisition...........................362 2.1.3 Machine Learning and Automated Knowledge Extraction....................362 2.1.4 Inductive Learning Techniques for Automated Knowledge Extraction............................364 2.2 Uncertainties in Fault Diagnosis................366 2.2.1 Inconsistent Data...............................366 2.2.2 Incomplete Data.................................367 2.2.3 Noisy Data......................................368 2.3 Traditional Techniques for Handling Uncertainty.369 2.3.1 MYCIN’s Model of Certainty Factors..............369 2.3.2 Bayesian Probability Theory.....................370 2.3.3 The Dempster-Shafer Theory of Belief Functions..371 2.3.4 The Fuzzy Sets Theory...........................372 2.3.5 Comparison of Traditional Approaches for Handling Uncertainty....................373 2.4 The Rough Sets Approach.........................374 2.4.1 Introductory Remarks............................374 2.4.2 Rough Sets and Fuzzy Sets.......................375 2.4.3 Development of Rough Set Theory.................376 2.4.4 Strengths of Rough Sets Theory and Its Applications in Fault Diagnosis.........376 3. Rough Sets Theory in Classification and Rule Induction under Uncertainty........378 3.1 Basic Notions of Rough Sets Theory..............378 3.1.1 The Information System..........................378 3.1.2 Approximations..................................379 3.2 Rough Sets and Inductive Learning...............381 3.2.1 Inductive Learning, Rough Sets and the RClass...381 3.2.2 Framework of the RClass.........................382 3.3 Validation and Discussion.......................384 3.3.1 Example 1: Machine Condition Monitoring.........385 3.3.2 Example 2: A Chemical Process...................386 4. Conclusions.............................................388 References......................................................389 Authors’ Biographical Statements................................394 Chapter 12 DISCOVERING KNOWLEDGE NUGGETS WITH A GENETIC ALGORITHM, by E. Noda, and A.A. Freitas........................395 Click here for the abstract of this Chapter in PDF format 1. Introduction............................................396 2. The Motivation for Genetic Algorithm-Based Rule Discovery.........399 2.1 An Overview of Genetic Algorithms (GAs).........400 2.2 Greedy Rule Induction...........................402 2.3 The Global Search of Genetic Algorithms (GAs)...404 3. GA-Nuggets..............................................404 3.1 Single-Population GA-Nuggets....................404 3.1.1 Individual Representation.......................405 3.1.2 Fitness Function................................406 3.1.3 Selection Method and Genetic Operators..........410 3.2 Distributed-Population GA-Nuggets...............411 3.2.1 Individual Representation.......................411 3.2.2 Distributed Population..........................412 3.2.3 Fitness Function................................414 3.2.4 Selection Method and Genetic Operators..........415 4. A Greedy Rule Induction Algorithm for Dependence Modeling.................415 5. Computational Results...................................416 5.1 The Data Sets Used in the Experiments...........416 5.2 Results and Discussion..........................417 5.2.1 Predictive Accuracy.............................419 5.2.2 Degree of Interestingness.......................422 5.2.3 Summary of the Results..........................426 6. Conclusions.............................................428 References......................................................429 Authors’ Biographical Statements................................432 Chapter 13 DIVERSITY MECHANISMS IN PITT-STYLE EVOLUTIONARY CLASSIFIER SYSTEMS, by M. Kirley, H.A. Abbass, and R.I. McKay.....................................433 Click here for the abstract of this Chapter in PDF format 1. Introduction............................................434 2. Background – Genetic Algorithms........................436 3. Evolutionary Classifier Systems........................439 3.1 The Michigan Style Classifier System............439 3.2 The Pittsburgh Style Classifier System..........440 4. Diversity Mechanisms in Evolutionary Algorithms.........440 4.1 Niching.........................................441 4.2 Fitness Sharing.................................441 4.3 Crowding........................................443 4.4 Isolated Populations....................................444 5. Classifier Diversity....................................446 6. Experiments.............................................448 6.1 Architecture of the Model.......................448 6.2 Data Sets.......................................449 6.3 Treatments......................................449 6.4 Model Parameters................................449 7. Results.................................................450 8. Conclusions.............................................452 References......................................................454 Authors’ Biographical Statements................................457 Chapter 14 FUZZY LOGIC IN DISCOVERING ASSOCIATION RULES: AN OVERVIEW, by G. Chen, Q. Wei, and E.E. Kerre..........459 Click here for the abstract of this Chapter in PDF format 1. Introduction............................................460 1.1 Notions of Associations.........................460 1.2 Fuzziness in Association Mining.................462 1.3 Main Streams of Discovering Associations with Fuzzy Logic.............................464 2. Fuzzy Logic in Quantitative Association Rules...........465 2.1 Boolean Association Rules.......................465 2.2 Quantitative Association Rules..................466 2.3 Fuzzy Extensions of Quantitative Association Rules..........468 3. Fuzzy Association Rules with Fuzzy Taxonomies...........469 3.1 Generalized Association Rules...................470 3.2 Generalized Association Rules with Fuzzy Taxonomies........................471 3.3 Fuzzy Association Rules with Linguistic Hedges.......................473 4. Other Fuzzy Extensions and Considerations...............474 4.1 Fuzzy Logic in Interestingness Measures.........474 4.2 Fuzzy Extensions of Dsupport / Dconfidence......476 4.3 Weighted Fuzzy Association Rules................478 5. Fuzzy Implication Based Association Rules...............480 6. Mining Functional Dependencies with Uncertainties.......482 6.1 Mining Fuzzy Functional Dependencies............482 6.2 Mining Functional Dependencies with Degrees.....483 7. Fuzzy Logic in Pattern Associations.....................484 8. Conclusions.............................................486 References......................................................487 Authors’ Biographical Statements................................493 Chapter 15 MINING HUMAN INTERPRETABLE KNOWLEDGE WITH FUZZY MODELING METHODS: AN OVERVIEW, by T.W. Liao..............495 Click here for the abstract of this Chapter in PDF format 1. Background..............................................496 2. Basic Concepts..........................................498 3. Generation of Fuzzy If-Then Rules.......................500 3.1 Grid Partitioning...............................501 3.2 Fuzzy Clustering................................506 3.3 Genetic Algorithms..............................509 3.3.1 Sequential Pittsburgh Approach..................510 3.3.2 Sequential IRL+Pittsburgh Approach..............511 3.3.3 Simultaneous Pittsburgh Approach................513 3.4 Neural Networks.................................517 3.4.1 Fuzzy Neural Networks...........................518 3.4.2 Neural Fuzzy Systems............................519 3.4.2.1 Starting Empty..........................519 3.4.2.2 Starting Full...........................520 3.4.2.3 Starting with an Initial Rule Base......524 3.5 Hybrids.........................................526 3.6 Others..........................................526 3.6.1 From Exemplar Numeric Data......................527 3.6.2 From Exemplar Fuzzy Data........................527 4. Generation of Fuzzy Decision Trees......................527 4.1 Fuzzy Interpretation of Crisp Trees with Discretized Intervals...................528 4.2. Fuzzy ID3 Variants..............................529 4.2.1 From Fuzzy Vector-Valued Examples.......................529 4.2.2 From Nominal-Valued and Real-Valued Examples............530 5. Applications............................................532 5.1 Function Approximation Problems.................532 5.2 Classification Problems.........................532 5.3 Control Problems................................533 5.4 Time Series Prediction Problems.................534 5.5 Other Decision-Making Problems..................534 6. Discussion..............................................534 7. Conclusions.............................................537 References......................................................538 Appendix 1: A Summary of Grid Partitioning Methods for Fuzzy Modeling......................545 Appendix 2: A Summary of Fuzzy Clustering Methods for Fuzzy Modeling......................546 Appendix 3: A Summary of GA Methods for Fuzzy Modeling......547 Appendix 4: A Summary of Neural Network Methods for Fuzzy Modeling..........................548 Appendix 5: A Summary of Fuzzy Decision Tree Methods for Fuzzy Modeling..........................549 Author’s Biographical Statement.................................550 Chapter 16 DATA MINING FROM MULTIMEDIA PATIENT RECORDS, by A.S. Elmaghraby, M.M. Kantardzic, and M.P. Wachowiak.........551 Click here for the abstract of this Chapter in PDF format 1. Introduction............................................552 2. The Data Mining Process.................................554 3. Clinical Patient Records: A Data Mining Source.........556 3.1 Distributed Data Sources........................560 3.2 Patient Record Standards........................560 4. Data Preprocessing......................................563 5. Data Transformation.....................................567 5.1 Types of Transformation.........................567 5.2 An Independent Component Analysis: Example of an EMG/ECG Separation........571 5.3 Text Transformation and Representation: A Rule-Based Approach...................573 5.4 Image Transformation and Representation: A Rule-Based Approach...................575 6. Dimensionality Reduction................................579 6.1 The Importance of Reduction.....................579 6.2 Data Fusion.....................................581 6.3 Example 1: Multimodality Data Fusion............584 6.4 Example 2: Data Fusion in Data Preprocessing....584 6.5 Feature Selection Supported By Domain Experts...588 7. Conclusions.............................................589 References......................................................591 Authors’ Biographical Statements................................595 Chapter 17 LEARNING TO FIND CONTEXT BASED SPELLING ERRORS, by H. Al-Mubaid, and K. Truemper........................597 Click here for the abstract of this Chapter in PDF format 1. Introduction............................................598 2. Previous Work...........................................600 3. Details of Ltest........................................601 3.1 Learning Step...................................602 3.2 Testing Step....................................605 3.2.1 Testing Regular Cases...........................605 3.2.2 Testing Special Cases...........................606 3.2.3 An Example......................................607 4. Implementation and Computational Results................607 5. Extensions..............................................614 6. Summary.................................................616 References......................................................616 Appendix A: Construction of Substitutions......................619 Appendix B: Construction of Training and History Texts.........620 Appendix C: Structure of Characteristic Vectors................621 Appendix D: Classification of Characteristic Vectors...........624 Authors’ Biographical Statements................................627 Chapter 18 INDUCTION AND INFERENCE WITH FUZZY RULES FOR TEXTUAL INFORMATION RETRIEVAL, by J. Chen, D.H. Kraft, M.J. Martin-Bautista, and M. –A., Vila..............629 Click here for the abstract of this Chapter in PDF format 1. Introduction............................................630 2. Preliminaries...........................................632 2.1 The Vector Space Approach to Information Retrieval...................632 2.2 Fuzzy Set Theory Basics.........................634 2.3 Fuzzy Hierarchical Clustering...................634 2.4 Fuzzy Clustering by the Fuzzy C-means Algorithm.................634 3. Fuzzy Clustering, Fuzzy Rule Discovery and Fuzzy Inference for Textual Retrieval...635 3.1 The Air Force EDC Data Set......................636 3.2 Clustering Results..............................637 3.3 Fuzzy Rule Extraction from Fuzzy Clusters.......638 3.4 Application of Fuzzy Inference for Improving Retrieval Performance.........639 4. Fuzzy Clustering, Fuzzy Rules and User Profiles for Web Retrieval...........................640 4.1 Simple User Profile Construction................641 4.2 Application of Simple User Profiles in Web Information Retrieval.......642 4.2.1 Retrieving Interesting Web Documents....642 4.2.2 User Profiles for Query Expansion by Fuzzy Inference.................643 4.3 Experiments of Using User Profiles..............644 4.4 Extended Profiles and Fuzzy Clustering..........646 5. Conclusions.....................................646 Acknowledgements................................................647 References......................................................648 Authors’ Biographical Statements................................652 Chapter 19 STATISTICAL RULE INDUCTION IN THE PRESENCE OF PRIOR INFORMATION: THE BAYESIAN RECORD LINKAGE PROBLEM, by D.H. Judson.................................655 Click here for the abstract of this Chapter in PDF format 1. Introduction............................................656 2. Why is Record Linkage Challenging?......................657 3. The Fellegi-Sunter Model of Record Linkage..............658 4. How Estimating Match Weights and Setting Thresholds is Equivalent to Specifying a Decision Rule...........................660 5. Dealing with Stochastic Data: A Logistic Regression Approach..........661 5.1 Estimation of the Model.........................665 5.2 Finding the Implied Threshold and Interpreting Coefficients...............665 6. Dealing with Unlabeled Data in the Logistic Regression Approach............668 7. Brief Description of the Simulated Data.................669 8. Brief Description of the CPS/NHIS to Census Record Linkage Project..........670 9. Results of the Bayesian Latent Class Method with Simulated Data..........................672 9.1 Case 1: Uninformative...........................673 9.2 Case 2: Informative.............................677 9.3 False Link and Non-Link Rates in the Population of All Possible Pairs........678 10. Results from the Bayesian Latent Class Method with Real Data...............................................679 10.1 Steps in Preparing the Data.....................679 10.2 Priors and Constraints..........................681 10.3 Results.........................................682 11. Conclusions and Future Research.........................690 References......................................................691 Author’s Biographical Statement.................................694 Chapter 20 FUTURE TRENDS IN SOME DATA MINING AREAS, by X. Wang, P. Zhu, G. Felici, and E. Triantaphyllou............695 Click here for the abstract of this Chapter in PDF format 1. Introduction............................................696 2. Web Mining..............................................696 2.1 Web Content Mining..............................697 2.2 Web Usage Mining................................698 2.3 Web Structure Mining............................698 2.4 Current Obstacles and Future Trends.............699 3. Text Mining.............................................700 3.1 Text Mining and Information Access..............700 3.2 A Simple Framework of Text Mining...............701 3.3 Fields of Text Mining...........................701 3.4 Current Obstacles and Future Trends.............702 4. Visual Data Mining......................................703 4.1 Data Visualization..............................704 4.2 Visualizing Data Mining Models..................705 4.3 Current Obstacles and Future Trends.............705 5. Distributed Data Mining.................................706 5.1 The Basic Principle of DDM......................707 5.2 Grid Computing..................................707 5.3 Current Obstacles and Future Trends.............708 6. Summary.................................................708 References......................................................710 Authors’ Biographical Statements................................715 Subject Index...................................................717 Author Index....................................................727 Contributor Index...............................................739 About the Editors...............................................747 Visit Dr. Triantaphyllou's Homepage Dr. Triantaphyllou's Books / Special Issues web site Send suggestions / comments to Dr. E. Triantaphyllou (trianta@lsu.edu). |