|
Data Mining and Knowledge Discovery Approaches Based on Rule
Induction Techniques
by
Evangelos Triantaphyllou
and
Giovanni Felici
(Editors)
An edited book published on June 2006 by Springer-Verlag, New York, NY, U.S.A.,
in its Massive Computing series, Vol. 6.
ISBN 0-3873-4294-X
|
TABLE OF CONTENTS (some of the links are under construction; to be finished soon)
List of Figures...............................................xxiii
List of Tables.................................................xxix
Foreword....................................................xxxvii
Preface.......................................................xxxix
Acknowledgments................................................xlvii
Chapter 1
A COMMON LOGIC APPROACH TO DATA MINING
AND PATTERN RECOGNITION, by A. Zakrevskij.........................1
Click here for the abstract of this Chapter in PDF format
1. Introduction..............................................2
1.1 Using Decision Functions..........................2
1.2 Characteristic Features of the New Approach.......4
2. Data and Knowledge........................................6
2.1 General Definitions...............................6
2.2 Data and Knowledge Representation
the Case of Boolean Attributes............9
2.3 Data and Knowledge Representation
the Case of Multi-Valued Attributes......10
3. Data Mining – Inductive Inference........................12
3.1 Extracting Knowledge from the Boolean Space
of Attributes............................12
3.2 The Screening Effect.............................18
3.3 Inductive Inference from Partial Data............20
3.4 The Case of Multi-Valued Attributes..............21
4. Knowledge Analysis and Transformations...................23
4.1 Testing for Consistency..........................23
4.2 Simplification...................................27
5. Pattern Recognition – Deductive Inference................28
5.1 Recognition in the Boolean Space.................28
5.2 Appreciating the Asymmetry in Implicative Regularities...31
5.3 Deductive Inference in Finite Predicates.........34
5.4 Pattern Recognition in the Space
of Multi-Valued Attributes.......................36
6. Some Applications........................................38
7. Conclusions..............................................40
References.......................................................41
Author’s Biographical Statement..................................43
Chapter 2
THE ONE CLAUSE AT A TIME (OCAT)
APPROACH TO DATA MINING AND
KNOWLEDGE DISCOVERY, by E. Triantaphyllou........................45
Click here for the abstract of this Chapter in PDF format
1. Introduction.............................................46
2. Some Background Information..............................49
3. Definitions and Terminology..............................52
4. The One Clause at a Time (OCAT) Approach.................54
4.1 Data Binarization................................54
4.2 The One Clause at a Time (OCAT) Concept..........58
4.3 A Branch-and-Bound Approach for
Inferring Clauses........................59
4.4 Inference of the Clauses for
the Illustrative Example.................62
4.5 A Polynomial Time Heuristic for
Inferring Clauses........................65
5. A Guided Learning Approach...............................70
6. The Rejectability Graph of Two Collections of Examples...72
6.1 The Definition of the Rejectability Graph................72
6.2 Properties of the Rejectability Graph....................74
6.3 On the Minimum Clique Cover
of the Rejectability Graph...............76
7. Problem Decomposition....................................77
7.1 Connected Components.....................................77
7.2 Clique Cover.............................................78
8. An Example of Using the Rejectability Graph..............79
9. Conclusions..............................................82
References.......................................................83
Author’s Biographical Statement..................................87
Chapter 3
AN INCREMENTAL LEARNING ALGORITHM FOR
INFERRING LOGICAL RULES FROM EXAMPLES IN
THE FRAMEWORK OF THE COMMON REASONING
PROCESS, by X. Naidenova.........................................89
Click here for the abstract of this Chapter in PDF format
1. Introduction.............................................90
2. A Model of Rule-Based Logical Inference..................96
2.1 Rules Acquired from Experts or Rules of
the First Type...........................97
2.2 Structure of the Knowledge Base..................98
2.3 Reasoning Operations for Using Logical Rules of
the First Type..........................100
2.4 An Example of the Reasoning Process.............102
3. Inductive Inference of Implicative Rules From Examples..103
3.1 The Concept of a Good Classification Test.......103
3.2 The Characterization of Classification Tests....105
3.3 An Approach for Constructing Good
Irredundant Tests.......................106
3.4 Structure of Data for Inferring Good
Diagnostic Tests........................107
3.5 The Duality of Good Diagnostic Tests............109
3.6 Generation of Dual Objects with the Use
of Lattice Operations...................110
3.7 Inductive Rules for Constructing Elements of
a Dual Lattice..........................111
3.8 Special Reasoning Operations for Constructing
Elements of a Dual Lattice..............112
3.8.1 The Generalization Rule.........................112
3.8.2 The Diagnostic Rule.............................113
3.8.3 The Concept of an Essential Example.............114
4. Algorithms for Constructing All
Good Maximally Redundant Tests..................115
4.1 NIAGaRa: A Non-Incremental Algorithm for Constructing
All Good Maximally Redundant Tests.....115
4.2 Decomposition of Inferring Good Classification
Tests into Subtasks.....................122
4.2.1 Forming the Subtasks............................123
4.2.2 Reducing the Subtasks...........................125
4.2.3 Choosing Examples and Values for the Formation
of Subtasks.............................127
4.2.4 An Approach for Incremental Algorithms..........129
4.3 DIAGaRa: An Algorithm for Inferring All GMRTs
with the Decomposition into Subtasks of
the First Kind..........................130
4.3.1 The Basic Recursive Algorithm for Solving a Subtask
Of the First Kind.......................130
4.3.2 An Approach for Forming the Set STGOOD..........131
4.3.3 The Estimation of the Number of Subtasks to
Be Solved...............................131
4.3.4 CASCADE: Incrementally Inferring GMRTs
Based on the Procedure DIAGaRa..........132
4.4 INGOMAR: An Incremental Algorithm for
Inferring All GMRTs.....................132
5. Conclusions.............................................138
Acknowledgments.................................................138
Appendix........................................................139
References......................................................143
Author’s Biographical Statement.................................147
Chapter 4
DISCOVERING RULES THAT GOVERN MONOTONE
PHENOMENA, by V.I. Torvik and E. Triantaphyllou.................149
Click here for the abstract of this Chapter in PDF format
1. Introduction............................................150
2. Background Information..................................152
2.1 Problem Descriptions............................152
2.2 Hierarchical Decomposition of Variables.........155
2.3 Some Key Properties of Monotone Boolean
Functions...............................157
2.4 Existing Approaches to Problem 1................160
2.5 An Existing Approach to Problem 2...............162
2.6 Existing Approaches to Problem 3................162
2.7 Stochastic Models for Problem 3.................162
3. Inference Objectives and Methodology....................165
3.1 The Inference Objective for Problem 1...........165
3.2 The Inference Objective for Problem 2...........166
3.3 The Inference Objective for Problem 3...........166
3.4 Incremental Updates for the Fixed Misclassification
Probability Model.......................167
3.5 Selection Criteria for Problem 1................167
3.6 Selection Criteria for
Problems 2.1, 2.2, and 2.3..............168
3.7 Selection Criterion for Problem 3...............169
4. Experimental Results....................................174
4.1 Experimental Results for Problem 1..............174
4.2 Experimental Results for Problem 2..............176
4.3 Experimental Results for Problem 3..............179
5. Summary and Discussion..................................183
5.1 Summary of the Research Findings................183
5.2 Significance of the Research Findings...........186
5.3 Future Research Directions......................187
6. Concluding Remarks......................................187
References......................................................188
Authors’ Biographical Statements................................191
Chapter 5
LEARNING LOGIC FORMULAS AND RELATED ERROR
DISTRIBUTIONS, by G. Felici, F. Sun, and K. Truemper............193
Click here for the abstract of this Chapter in PDF format
1. Introduction............................................194
2. Logic Data and Separating Set...........................197
2.1 Logic Data......................................197
2.2 Separating Set..................................198
3. Problem Formulation.....................................200
3.1 Logic Variables.................................201
3.2 Separation Condition for Records in A...........201
3.3 Separation Condition for Records in B...........201
3.4 Selecting a Largest Subset......................202
3.5 Selecting a Separating Vector...................203
3.6 Simplification for 0/1 Records..................204
4. Implementation of Solution Algorithm....................204
5. Leibniz System..........................................205
6. Simple-Minded Control of Classification Errors..........206
7. Separations for Voting Process..........................207
8. Probability Distribution of Vote-Total..................208
8.1 Mean and Variance for ZA........................209
8.2 Random Variables Yi.............................211
8.3 Distribution for Y..............................212
8.4 Distribution for ZA.............................213
8.5 Probabilities of Classification Errors..........213
8.6 Summary of Algorithm............................216
9. Computational Results...................................216
9.1 Breast Cancer Diagnosis.........................218
9.2 Australian Credit Card..........................219
9.3 Congressional Voting............................219
9.4 Diabetes Diagnosis..............................219
9.5 Heart Disease Diagnosis.........................220
9.6 Boston Housing..................................221
10. Conclusions.............................................221
References......................................................222
Authors’ Biographical Statements................................226
Chapter 6
FEATURE SELECTION FOR DATA MINING
by V. de Angelis, G. Felici, and G. Mancinelli..................227
Click here for the abstract of this Chapter in PDF format
1. Introduction............................................228
2. The Many Routes to Feature Selection....................229
2.1 Filter Methods..................................232
2.2 Wrapper Methods.................................234
3. Feature Selection as a Subgraph Selection Problem.......237
4. Basic IP Formulation and Variants.......................238
5. Computational Experience................................241
5.1 Test on Generated Data..........................242
5.2 An Application..................................246
6. Conclusions.............................................248
References......................................................249
Authors’ Biographical Statements................................252
Chapter 7
TRANSFORMATION OF RATIONAL AND SET DATA
TO LOGIC DATA, by S. Bartnikowski, M. Granberry,
J. Mugan, and K. Truemper.......................................253
Click here for the abstract of this Chapter in PDF format
1. Introduction............................................254
1.1 Transformation of Set Data..............................254
1.2 Transformation of Rational Data.........................254
1.3 Computational Results...................................256
1.4 Entropy-Based Approaches................................257
1.5 Bottom-up Methods.......................................258
1.6 Other Approaches........................................258
2. Definitions.............................................259
2.1 Unknown Values..........................................259
2.2 Records.................................................260
2.3 Populations.............................................260
2.4 DNF Formulas............................................260
2.5 Clash Condition.........................................261
3. Overview of Transformation Process......................262
4. Set Data to Logic Data..................................262
4.1 Case of Element Entries.................................262
4.2 Case of Set Entries.....................................264
5. Rational Data to Logic Data.............................264
6. Initial Markers.........................................265
6.1 Class Values............................................265
6.2 Smoothed Class Values...................................266
6.3 Selection of Standard Deviation.........................266
6.4 Definition of Markers...................................269
6.5 Evaluation of Markers...................................271
7. Additional Markers......................................271
7.1 Critical Interval.......................................272
7.2 Attractiveness of Pattern Change........................272
7.3 Selection of Marker.....................................273
8. Computational Results...................................274
9. Summary.................................................275
References......................................................276
Authors’ Biographical Statements................................278
Chapter 8
DATA FARMING: CONCEPTS AND METHODS, by A. Kusiak................279
Click here for the abstract of this Chapter in PDF format
1. Introduction............................................280
2. Data Farming Methods....................................281
2.1 Feature Evaluation..............................282
2.2 Data Transformation.............................282
2.2.1 Filling in Missing Values.......................282
2.2.2 Discretization..................................283
2.2.3 Feature Content Modification....................283
2.2.4 Feature Transformation..........................286
2.2.5 Data Evolution..................................289
2.3 Knowledge Transformation........................290
2.4 Outcome Definition..............................295
2.5 Feature Definition..............................297
3. The Data Farming Process................................298
4. A Case Study............................................299
5. Conclusions.............................................301
References......................................................302
Author’s Biographical Statement.................................304
Chapter 9
RULE INDUCTION THROUGH DISCRETE SUPPORT
VECTOR DECISION TREES, by C. Orsenigo and C. Vercellis..........305
Click here for the abstract of this Chapter in PDF format
1. Introduction............................................306
2. Linear Support Vector Machines..........................308
3. Discrete Support Vector Machines with Minimum Features..312
4. A Sequential LP-based Heuristic for
Problems LDVM and FDVM..................314
5. Building a Minimum Features Discrete Support
Vector Decision Tree....................316
6. Discussion and Validation of the Proposed Classifier....319
7. Conclusions.............................................322
References......................................................324
Authors’ Biographical Statements................................326
Chapter 10
MULTI-ATTRIBUTE DECISION TREES AND
DECISION RULES, by J.-Y. Lee and S. Olafsson....................327
Click here for the abstract of this Chapter in PDF format
1. Introduction............................................328
2. Decision Tree Induction.................................329
2.1 Attribute Evaluation Rules......................330
2.2 Entropy-Based Algorithms........................332
2.3 Other Issues in Decision Tree Induction.........333
3. Multi-Attribute Decision Trees..........................334
3.1 Accounting for Interactions between Attributes..334
3.2 Second Order Decision Tree Induction............335
3.3 The SODI Algorithm..............................339
4. An Illustrative Example.................................334
5. Numerical Analysis......................................347
6. Conclusions.............................................349
Appendix: Detailed Model Comparison.............................351
References......................................................355
Authors’ Biographical Statements................................358
Chapter 11
KNOWLEDGE ACQUISITION AND UNCERTAINTY IN
FAULT DIAGNOSIS: A ROUGH SETS PERSPECTIVE,
by L.-Y. Zhai, L.-P. Khoo, and S.-C. Fok........................359
Click here for the abstract of this Chapter in PDF format
1. Introduction............................................360
2. An Overview of Knowledge Discovery and Uncertainty......361
2.1 Knowledge Acquisition and Machine Learning......361
2.1.1 Knowledge Representation........................361
2.1.2 Knowledge Acquisition...........................362
2.1.3 Machine Learning and Automated
Knowledge Extraction....................362
2.1.4 Inductive Learning Techniques for Automated
Knowledge Extraction............................364
2.2 Uncertainties in Fault Diagnosis................366
2.2.1 Inconsistent Data...............................366
2.2.2 Incomplete Data.................................367
2.2.3 Noisy Data......................................368
2.3 Traditional Techniques for Handling Uncertainty.369
2.3.1 MYCIN’s Model of Certainty Factors..............369
2.3.2 Bayesian Probability Theory.....................370
2.3.3 The Dempster-Shafer Theory of Belief Functions..371
2.3.4 The Fuzzy Sets Theory...........................372
2.3.5 Comparison of Traditional Approaches for
Handling Uncertainty....................373
2.4 The Rough Sets Approach.........................374
2.4.1 Introductory Remarks............................374
2.4.2 Rough Sets and Fuzzy Sets.......................375
2.4.3 Development of Rough Set Theory.................376
2.4.4 Strengths of Rough Sets Theory and Its
Applications in Fault Diagnosis.........376
3. Rough Sets Theory in Classification and
Rule Induction under Uncertainty........378
3.1 Basic Notions of Rough Sets Theory..............378
3.1.1 The Information System..........................378
3.1.2 Approximations..................................379
3.2 Rough Sets and Inductive Learning...............381
3.2.1 Inductive Learning, Rough Sets and the RClass...381
3.2.2 Framework of the RClass.........................382
3.3 Validation and Discussion.......................384
3.3.1 Example 1: Machine Condition Monitoring.........385
3.3.2 Example 2: A Chemical Process...................386
4. Conclusions.............................................388
References......................................................389
Authors’ Biographical Statements................................394
Chapter 12
DISCOVERING KNOWLEDGE NUGGETS WITH A GENETIC
ALGORITHM, by E. Noda, and A.A. Freitas........................395
Click here for the abstract of this Chapter in PDF format
1. Introduction............................................396
2. The Motivation for Genetic
Algorithm-Based Rule Discovery.........399
2.1 An Overview of Genetic Algorithms (GAs).........400
2.2 Greedy Rule Induction...........................402
2.3 The Global Search of Genetic Algorithms (GAs)...404
3. GA-Nuggets..............................................404
3.1 Single-Population GA-Nuggets....................404
3.1.1 Individual Representation.......................405
3.1.2 Fitness Function................................406
3.1.3 Selection Method and Genetic Operators..........410
3.2 Distributed-Population GA-Nuggets...............411
3.2.1 Individual Representation.......................411
3.2.2 Distributed Population..........................412
3.2.3 Fitness Function................................414
3.2.4 Selection Method and Genetic Operators..........415
4. A Greedy Rule Induction Algorithm
for Dependence Modeling.................415
5. Computational Results...................................416
5.1 The Data Sets Used in the Experiments...........416
5.2 Results and Discussion..........................417
5.2.1 Predictive Accuracy.............................419
5.2.2 Degree of Interestingness.......................422
5.2.3 Summary of the Results..........................426
6. Conclusions.............................................428
References......................................................429
Authors’ Biographical Statements................................432
Chapter 13
DIVERSITY MECHANISMS IN PITT-STYLE
EVOLUTIONARY CLASSIFIER SYSTEMS, by M. Kirley,
H.A. Abbass, and R.I. McKay.....................................433
Click here for the abstract of this Chapter in PDF format
1. Introduction............................................434
2. Background – Genetic Algorithms........................436
3. Evolutionary Classifier Systems........................439
3.1 The Michigan Style Classifier System............439
3.2 The Pittsburgh Style Classifier System..........440
4. Diversity Mechanisms in Evolutionary Algorithms.........440
4.1 Niching.........................................441
4.2 Fitness Sharing.................................441
4.3 Crowding........................................443
4.4 Isolated Populations....................................444
5. Classifier Diversity....................................446
6. Experiments.............................................448
6.1 Architecture of the Model.......................448
6.2 Data Sets.......................................449
6.3 Treatments......................................449
6.4 Model Parameters................................449
7. Results.................................................450
8. Conclusions.............................................452
References......................................................454
Authors’ Biographical Statements................................457
Chapter 14
FUZZY LOGIC IN DISCOVERING ASSOCIATION
RULES: AN OVERVIEW, by G. Chen, Q. Wei, and E.E. Kerre..........459
Click here for the abstract of this Chapter in PDF format
1. Introduction............................................460
1.1 Notions of Associations.........................460
1.2 Fuzziness in Association Mining.................462
1.3 Main Streams of Discovering Associations with
Fuzzy Logic.............................464
2. Fuzzy Logic in Quantitative Association Rules...........465
2.1 Boolean Association Rules.......................465
2.2 Quantitative Association Rules..................466
2.3 Fuzzy Extensions of
Quantitative Association Rules..........468
3. Fuzzy Association Rules with Fuzzy Taxonomies...........469
3.1 Generalized Association Rules...................470
3.2 Generalized Association Rules with
Fuzzy Taxonomies........................471
3.3 Fuzzy Association Rules with
Linguistic Hedges.......................473
4. Other Fuzzy Extensions and Considerations...............474
4.1 Fuzzy Logic in Interestingness Measures.........474
4.2 Fuzzy Extensions of Dsupport / Dconfidence......476
4.3 Weighted Fuzzy Association Rules................478
5. Fuzzy Implication Based Association Rules...............480
6. Mining Functional Dependencies with Uncertainties.......482
6.1 Mining Fuzzy Functional Dependencies............482
6.2 Mining Functional Dependencies with Degrees.....483
7. Fuzzy Logic in Pattern Associations.....................484
8. Conclusions.............................................486
References......................................................487
Authors’ Biographical Statements................................493
Chapter 15
MINING HUMAN INTERPRETABLE KNOWLEDGE WITH
FUZZY MODELING METHODS: AN OVERVIEW, by T.W. Liao..............495
Click here for the abstract of this Chapter in PDF format
1. Background..............................................496
2. Basic Concepts..........................................498
3. Generation of Fuzzy If-Then Rules.......................500
3.1 Grid Partitioning...............................501
3.2 Fuzzy Clustering................................506
3.3 Genetic Algorithms..............................509
3.3.1 Sequential Pittsburgh Approach..................510
3.3.2 Sequential IRL+Pittsburgh Approach..............511
3.3.3 Simultaneous Pittsburgh Approach................513
3.4 Neural Networks.................................517
3.4.1 Fuzzy Neural Networks...........................518
3.4.2 Neural Fuzzy Systems............................519
3.4.2.1 Starting Empty..........................519
3.4.2.2 Starting Full...........................520
3.4.2.3 Starting with an Initial Rule Base......524
3.5 Hybrids.........................................526
3.6 Others..........................................526
3.6.1 From Exemplar Numeric Data......................527
3.6.2 From Exemplar Fuzzy Data........................527
4. Generation of Fuzzy Decision Trees......................527
4.1 Fuzzy Interpretation of Crisp Trees with
Discretized Intervals...................528
4.2. Fuzzy ID3 Variants..............................529
4.2.1 From Fuzzy Vector-Valued Examples.......................529
4.2.2 From Nominal-Valued and Real-Valued Examples............530
5. Applications............................................532
5.1 Function Approximation Problems.................532
5.2 Classification Problems.........................532
5.3 Control Problems................................533
5.4 Time Series Prediction Problems.................534
5.5 Other Decision-Making Problems..................534
6. Discussion..............................................534
7. Conclusions.............................................537
References......................................................538
Appendix 1: A Summary of Grid Partitioning Methods
for Fuzzy Modeling......................545
Appendix 2: A Summary of Fuzzy Clustering Methods
for Fuzzy Modeling......................546
Appendix 3: A Summary of GA Methods for Fuzzy Modeling......547
Appendix 4: A Summary of Neural Network Methods for
Fuzzy Modeling..........................548
Appendix 5: A Summary of Fuzzy Decision Tree Methods for
Fuzzy Modeling..........................549
Author’s Biographical Statement.................................550
Chapter 16
DATA MINING FROM MULTIMEDIA PATIENT RECORDS,
by A.S. Elmaghraby, M.M. Kantardzic, and M.P. Wachowiak.........551
Click here for the abstract of this Chapter in PDF format
1. Introduction............................................552
2. The Data Mining Process.................................554
3. Clinical Patient Records: A Data Mining Source.........556
3.1 Distributed Data Sources........................560
3.2 Patient Record Standards........................560
4. Data Preprocessing......................................563
5. Data Transformation.....................................567
5.1 Types of Transformation.........................567
5.2 An Independent Component Analysis:
Example of an EMG/ECG Separation........571
5.3 Text Transformation and Representation:
A Rule-Based Approach...................573
5.4 Image Transformation and Representation:
A Rule-Based Approach...................575
6. Dimensionality Reduction................................579
6.1 The Importance of Reduction.....................579
6.2 Data Fusion.....................................581
6.3 Example 1: Multimodality Data Fusion............584
6.4 Example 2: Data Fusion in Data Preprocessing....584
6.5 Feature Selection Supported By Domain Experts...588
7. Conclusions.............................................589
References......................................................591
Authors’ Biographical Statements................................595
Chapter 17
LEARNING TO FIND CONTEXT BASED SPELLING
ERRORS, by H. Al-Mubaid, and K. Truemper........................597
Click here for the abstract of this Chapter in PDF format
1. Introduction............................................598
2. Previous Work...........................................600
3. Details of Ltest........................................601
3.1 Learning Step...................................602
3.2 Testing Step....................................605
3.2.1 Testing Regular Cases...........................605
3.2.2 Testing Special Cases...........................606
3.2.3 An Example......................................607
4. Implementation and Computational Results................607
5. Extensions..............................................614
6. Summary.................................................616
References......................................................616
Appendix A: Construction of Substitutions......................619
Appendix B: Construction of Training and History Texts.........620
Appendix C: Structure of Characteristic Vectors................621
Appendix D: Classification of Characteristic Vectors...........624
Authors’ Biographical Statements................................627
Chapter 18
INDUCTION AND INFERENCE WITH FUZZY RULES
FOR TEXTUAL INFORMATION RETRIEVAL, by J. Chen,
D.H. Kraft, M.J. Martin-Bautista, and M. –A., Vila..............629
Click here for the abstract of this Chapter in PDF format
1. Introduction............................................630
2. Preliminaries...........................................632
2.1 The Vector Space Approach to
Information Retrieval...................632
2.2 Fuzzy Set Theory Basics.........................634
2.3 Fuzzy Hierarchical Clustering...................634
2.4 Fuzzy Clustering by the
Fuzzy C-means Algorithm.................634
3. Fuzzy Clustering, Fuzzy Rule Discovery and
Fuzzy Inference for Textual Retrieval...635
3.1 The Air Force EDC Data Set......................636
3.2 Clustering Results..............................637
3.3 Fuzzy Rule Extraction from Fuzzy Clusters.......638
3.4 Application of Fuzzy Inference for
Improving Retrieval Performance.........639
4. Fuzzy Clustering, Fuzzy Rules and User Profiles for
Web Retrieval...........................640
4.1 Simple User Profile Construction................641
4.2 Application of Simple User Profiles in
Web Information Retrieval.......642
4.2.1 Retrieving Interesting Web Documents....642
4.2.2 User Profiles for Query Expansion by
Fuzzy Inference.................643
4.3 Experiments of Using User Profiles..............644
4.4 Extended Profiles and Fuzzy Clustering..........646
5. Conclusions.....................................646
Acknowledgements................................................647
References......................................................648
Authors’ Biographical Statements................................652
Chapter 19
STATISTICAL RULE INDUCTION IN THE PRESENCE OF
PRIOR INFORMATION: THE BAYESIAN RECORD
LINKAGE PROBLEM, by D.H. Judson.................................655
Click here for the abstract of this Chapter in PDF format
1. Introduction............................................656
2. Why is Record Linkage Challenging?......................657
3. The Fellegi-Sunter Model of Record Linkage..............658
4. How Estimating Match Weights and Setting Thresholds is
Equivalent to Specifying a
Decision Rule...........................660
5. Dealing with Stochastic Data:
A Logistic Regression Approach..........661
5.1 Estimation of the Model.........................665
5.2 Finding the Implied Threshold and
Interpreting Coefficients...............665
6. Dealing with Unlabeled Data in the
Logistic Regression Approach............668
7. Brief Description of the Simulated Data.................669
8. Brief Description of the CPS/NHIS to
Census Record Linkage Project..........670
9. Results of the Bayesian Latent Class Method with
Simulated Data..........................672
9.1 Case 1: Uninformative...........................673
9.2 Case 2: Informative.............................677
9.3 False Link and Non-Link Rates in the
Population of All Possible Pairs........678
10. Results from the Bayesian Latent Class Method with
Real Data...............................................679
10.1 Steps in Preparing the Data.....................679
10.2 Priors and Constraints..........................681
10.3 Results.........................................682
11. Conclusions and Future Research.........................690
References......................................................691
Author’s Biographical Statement.................................694
Chapter 20
FUTURE TRENDS IN SOME DATA MINING AREAS,
by X. Wang, P. Zhu, G. Felici, and E. Triantaphyllou............695
Click here for the abstract of this Chapter in PDF format
1. Introduction............................................696
2. Web Mining..............................................696
2.1 Web Content Mining..............................697
2.2 Web Usage Mining................................698
2.3 Web Structure Mining............................698
2.4 Current Obstacles and Future Trends.............699
3. Text Mining.............................................700
3.1 Text Mining and Information Access..............700
3.2 A Simple Framework of Text Mining...............701
3.3 Fields of Text Mining...........................701
3.4 Current Obstacles and Future Trends.............702
4. Visual Data Mining......................................703
4.1 Data Visualization..............................704
4.2 Visualizing Data Mining Models..................705
4.3 Current Obstacles and Future Trends.............705
5. Distributed Data Mining.................................706
5.1 The Basic Principle of DDM......................707
5.2 Grid Computing..................................707
5.3 Current Obstacles and Future Trends.............708
6. Summary.................................................708
References......................................................710
Authors’ Biographical Statements................................715
Subject Index...................................................717
Author Index....................................................727
Contributor Index...............................................739
About the Editors...............................................747
Visit Dr. Triantaphyllou's Homepage
Dr. Triantaphyllou's Books /
Special Issues web site
Send suggestions / comments to Dr. E. Triantaphyllou (trianta@lsu.edu).
|