|
Recent Advances in Data Mining of Enterprise Data:
Algorithms and Applications
by
T. Warren Liao
and
Evangelos Triantaphyllou
(Editors)
An edited book published in 2007 by World Scientific Publishing Company, Singapore,
in its Computers and Operations Research series, Vol. 6.
ISBN-10: 981277985X
ISBN-13: 978-9812779854
|
Summary of the Scope of this Book:
The main goal of the new field of data mining is the analysis of large and complex datasets.
Some very important datasets may be derived from business and industrial activities.
This kind of data is known as “enterprise data.”
The common characteristic of such datasets is that the analyst wishes to analyze them for the purpose
of designing a more cost-effective strategy for optimizing some type of performance
measure, such as reducing production time, improving quality, eliminating wastes, or maximizing profit.
Data in this category may describe different scheduling scenarios in a manufacturing environment,
quality control of some process, fault diagnosis in the operation of a machine or process,
risk analysis when issuing credit to applicants, management of supply chains in a manufacturing system,
or data for business related decision-making.
Key Features:
The chapters have been written by top
researchers / practitioners in the corresponding fields.
The problems related to the mining of enterprise data
are studied comprehensively and in depth.
Each chapter is self-contained and thorough on the topic it studies.
They are also complementary of each other.
The developments are presented in a complete and intuitive manner.
At the end of each chapter there is a discussion
of challenges for future research.
Table of Contents (as part of this Webpage)
PDF files:
Foreword (as a downloadable PDF file)
Preface (as a downloadable PDF file)
Acknowledgments (as a downloadable PDF file)
Table of Contents (as a downloadable PDF file)
Subject Index (as a downloadable PDF file)
List of Contributors (as a downloadable PDF file)
About the Editors (as a downloadable PDF file)
Purchasing Information from Amazon.com:
TABLE OF CONTENTS
Foreword..........................................................xxi
Preface...........................................................xxiii
Acknowledgments...................................................xxxi
CHAPTER 1
Enterprise Data Mining – A Review and Research
Directions, by T. W. Liao.......................1
Click here for the abstract of this Chapter in PDF format
1. Introduction................................................2
2. The Basics of Data Mining and Knowledge Discovery...........6
2.1. Data mining and the knowledge discovery process.....6
2.2 Data mining algorithms/methodologies................9
2.3 Data mining system architectures...................12
2.4 Data mining software programs......................14
3. Types and Characteristics of Enterprise Data...............17
4. Overview of the Enterprise Data Mining Activities..........23
4.1 Customer related...................................23
4.2 Sales related......................................30
4.3 Product related....................................37
4.4 Production planning and control related............43
4.5 Logistics related..................................51
4.6 Process related....................................55
4.6.1 For the semi-conductor industry..........55
4.6.2 For the electronics industry.............63
4.6.3 For the process industry.................72
4.6.4 For other industries.....................79
4.7 Others.............................................83
4.8 Summary............................................87
4.8.1 Data type, size, and sources..............87
4.8.2 Data preprocessing........................88
5. Discussion.................................................90
6. Research Programs and Directions...........................91
6.1 On e-commerce and web mining........................91
6.2 On customer-related mining..........................92
6.3 On sales-related mining.............................93
6.4 On product-related mining...........................94
6.5 On process-related mining...........................94
6.6 On the use of text mining in enterprise systems.....95
References.......................................................96
Author’s Biographical Statement.................................109
CHAPTER 2
Application and Comparison of Classification
Techniques in Controlling Credit Risk,
by L. Yu, G. Chen, A. Koronios, S. Zhu,
and X. Guo....................................111
Click here for the abstract of this Chapter in PDF format
1. Credit Risk and Credit Rating.............................112
2. Data and Variables........................................115
3. Classification Techniques.................................115
3.1 Logistic regression................................116
3.2 Discriminant analysis..............................117
3.3 K-nearest neighbors................................119
3.4 Naïve Bayes........................................120
3.5 The TAN technique..................................121
3.6 Decision trees.....................................122
3.7 Associative classification.........................124
3.8 Artificial neural networks.........................126
3.9 Support vector machines............................129
4. An Empirical Study........................................131
4.1 Experimental settings..............................131
4.2 The ROC curve and the Delong-Pearson method........133
4.3 Experimental results...............................135
5. Conclusions and Future Work...............................139
References......................................................140
Authors' Biographical Statements................................144
CHAPTER 3
Predictive Classification with Imbalanced
Enterprise Data, by S. Daskalaki, I. Kopanas,
and N. M. Avouris.............................147
Click here for the abstract of this Chapter in PDF format
1. Introduction..............................................148
2. Enterprise Data and Predictive Classification.............151
3. The Process of Knowledge Discovery from Enterprise Data...154
3.1 Definition of the problem and application domain...155
3.2 Creating a target database.........................156
3.3 Data cleaning and preprocessing....................157
3.4 Data reduction and projection......................159
3.5 Defining the data mining function and
performance measures............160
3.6 Selection of data mining algorithms................163
3.7 Experimentation with data mining algorithms........164
3.8 Combining classifiers and interpretation of
the results........167
3.9 Using the discovered knowledge.....................171
4. Development of a Cost-Based Evaluation Framework..........171
5. Operationalization of the Discovered Knowledge: Design
of an Intelligent Insolvencies Management System........178
6. Summary and Conclusions...................................181
References......................................................183
Authors' Biographical Statements................................187
CHAPTER 4
Using Soft Computing Methods for Time Series
Forecasting, by P.-C. Chang and Y.-W. Wang....189
Click here for the abstract of this Chapter in PDF format
1. Introduction..............................................190
1.1 Background and motives..............................190
1.2 Objectives..........................................191
2. Literature Review..........................................191
2.1 Traditional time series forecasting research........191
2.2 Neural network based forecasting methods............192
2.3 Hybridizing a genetic algorithm (GA) with a
neural network for forecasting..........193
2.3.1 Using a GA to design the NN architecture..193
2.3.2 Using a GA to generate the NN
connection weights......................194
2.4 Review of sales forecasting search..................194
3. Problem Definition.........................................200
3.1 Scope of the research data..........................200
3.2 Characteristics of the variables considered.........200
3.2.1 Macroeconomic domain......................200
3.2.2 Downstream demand domain..................201
3.2.3 Industrial production domain..............202
3.2.4 Time series domain........................202
3.3 Performance index...................................202
4. Methodology................................................203
4.1 Data preprocessing..................................203
4.1.1 Gray relation analysis....................203
4.1.2 Winter’s exponential smoothing............207
4.2 Evolving neural networks (ENN)......................209
4.2.1 ENN modeling..............................209
4.2.2 ENN parameters design.....................214
4.3 Weighted evolving fuzzy neural networks (WEFuNN)....218
4.3.1 Building of the WEFuNN....................218
4.3.1.1 The feed-forward learning
phase...........220
4.3.1.2 The forecasting phase........226
4.3.2 WEFuNN parameters design..................227
5. Experimental Results.......................................229
5.1 Winter’s exponential smoothing......................230
5.2 The Back propagation neural network model...........230
5.3 Multiple regression analysis model..................231
5.4 Evolving fuzzy neural network model.................232
5.5 Evolving neural network.............................233
5.6 Comparisons.........................................235
6. Conclusions................................................236
References......................................................237
Appendix........................................................243
Authors’ Biographical Statements................................246
CHAPTER 5
Data Mining Applications of Process Platform
Formation for High Variety Production,
by J. Jiao and L. Zhang.......................247
Click here for the abstract of this Chapter in PDF format
1. Background................................................248
2. Methodology...............................................249
3. Routing Similarity Measure................................251
3.1 Node content similarity measure....................251
3.1.1 Material similarity measure..............252
3.1.1.1 Procedure for calculating
similarities between
primitive components....253
3.1.1.2 Procedure for calculating
similarities between
compound components.....257
3.1.2 Product similarity measure...............258
3.1.3 Resource similarity measure..............258
3.1.4 Operation similarity and node content
similarity measures.....259
3.1.5 Normalized node content similarity
matrix..................260
3.2 Tree structure similarity measure..................261
3.3 Routing similarity measure.........................265
4. Routing Clustering.........................................265
5. Routing Unification........................................267
5.1 Basic routing elements.............................267
5.2 Master and selective routing elements..............267
5.3 Basic tree structures..............................268
5.4 Tree growing.......................................269
6. A Case Study...............................................275
6.1 The routing similarity measure......................275
6.2 The routing clustering..............................281
6.3 The routing unification.............................282
7. Summary....................................................283
References......................................................284
Authors’ Biographical Statements................................286
CHAPTER 6
A Data Mining Approach to Production Control
in Dynamic Manufacturing Systems,
by C. Kwak and Y. Yih.........................287
Click here for the abstract of this Chapter in PDF format
1. Introduction...............................................288
2. Previous Approaches to Scheduling of Wafer Fabrication.....291
3. Simulation Model and Solution Methodology..................294
3.1 Simulation model...................................292
3.2 Development of a scheduler.........................298
3.2.1 Decision variables and decision rules...298
3.2.2 Evaluation criteria: system
performance and status..........300
3.2.3 Data collection: a simulation approach..300
3.2.4 Data classification: a competitive
neural network approach.........301
3.2.5 Selection of decision rules for
decision variables..............306
4. An Experimental Study......................................306
4.1 Experimental design................................306
4.2 Results and analyses...............................309
5. Related Studies............................................313
6. Conclusions................................................317
References......................................................319
Authors’ Biographical Statements................................321
CHAPTER 7
Predicting Wine Quality from Agricultural
Data with Single-Objective and
Multi-Objective Data Mining Algorithms,
by M. Last, S. Elnekave, A. Naor, and
V. Schoenfeld.................................323
Click here for the abstract of this Chapter in PDF format
1. Introduction...............................................324
2. Problem Description........................................325
3. Information Networks and the Information Graph.............329
3.1 An extended classification task.....................329
3.2 Single-objective information networks...............330
3.3 Multi-objective information networks................336
3.4 Information graphs..................................338
4. A Case Study: the Cabernet Sauvignon problem...............342
4.1 Data selection......................................342
4.2 Data preprocessing..................................344
4.2.1 Ripening data............................344
4.2.2 Meteorological measurements..............347
4.3 Design of data mining runs..........................349
4.4 Sing-objective models...............................350
4.5 Multi-objective models..............................353
4.6 Comparative evaluation..............................355
4.7 The discovered knowledge and its potential use......357
5. Related Work...............................................358
5.1 Mining of agricultural data........................358
5.2 Multi-objective classification models and
algorithms......359
6. Conclusions................................................361
References......................................................362
Authors’ Biographical Statements................................364
CHAPTER 8
Enhancing Competitive Advantages and
Operational Excellence for High-Tech
Industry through Data Mining and Digital
Management, by C.-F. Chien and S.-C. Hsu......367
Click here for the abstract of this Chapter in PDF format
1. Introduction...............................................368
2. Knowledge Discovery in Databases and Data Mining...........370
2.1 Problem types for data mining in the high-tech
industry........373
2.2 Data mining methodologies..........................374
2.2.1 Decision trees..........................374
2.2.1.1 Decision tree
construction....375
2.2.1.2 CART......................379
2.2.1.3 C4.5......................380
2.2.1.4 CHAID.....................382
2.2.2 Artificial neural networks..............383
2.2.2.1 Associate learning
networks........386
2.2.2.2 Supervised learning
networks........388
2.2.2.3 Unsupervised learning
networks........390
3. Application of Data Mining in Semiconductor Manufacturing..393
3.1 Problem definition.................................393
3.2 Types of data mining applications..................395
3.2.1 Extracting characteristics from
WAT data........396
3.2.2 Process failure diagnosis of CP and
engineering data........397
3.2.3 Process failure diagnosis of WAT and
engineering data........398
3.2.4 Extracting characteristics from
semiconductor manufacturing data........399
3.3 A Hybrid decision tree approach for CP low
yield diagnosis.........400
3.4 Key stage screening................................402
3.5 Construction of the Decision tree..................404
4. Conclusions...............................................406
References......................................................407
Authors' Biographical Statements................................411
CHAPTER 9
Multivariate Control Charts from
a Data Mining Perspective, by G. C. Porzio
and G. Ragozini...............................413
Click here for the abstract of this Chapter in PDF format
1. Introduction...............................................414
2. Control Charts and Statistical Process Control Phases......415
3. Multivariate Statistical Process Control...................419
3.1 The sequential quality control setting.............419
3.2 The Hotelling T2 Control Chart.....................421
4. Is the T2 Statistic Really Able to Tackle Data
Mining Issues?..........424
4.1 Many data, many outliers...........................424
4.2 Questioning the assumptions on shape and
distribution....430
5. Designing Nonparametric Charts When Large HDS
Are Available: the Data Depth Approach..........434
5.1 Data depth and control charts......................436
5.2 Towards a parametric setting for data depth
control charts..........438
5.3 A Shewhart chart for changes in location and
increases in scale......442
5.4 An illustrative example............................443
5.5 Average run length functions for data depth
control charts..........446
5.6 A simulation study of chart performance............448
5.7 Choosing an empirical depth function...............453
6. Final Remarks..............................................454
References......................................................456
Authors’ Biographical Statements................................462
CHAPTER 10
Data Mining of Multi-Dimensional Functional
Data for Manufacturing Fault Diagnosis,
by M. K. Jeong, S. G. Kong, and
O. A. Omitaomu................................463
Click here for the abstract of this Chapter in PDF format
1. Introduction...............................................464
2. Data Mining of Functional Data.............................465
2.1 Dimensionality reduction techniques for
functional data.........465
2.2 Multi-scale fault diagnosis........................468
2.2.1 A case study: data mining of
functional data.........469
2.3 Motor shaft misalignment prediction based on
functional data.........472
2.3.1 Techniques for predicting with high
number of predictors....474
2.3.2 A case study: motor shaft misalignment
prediction......477
3. Data Mining in Hyperspectral Imaging.......................481
3.1 A hyperspectral fluorescence imaging system........483
3.2 Hyperspectral image dimensionality reduction.......485
3.3 Spectral band selection............................490
3.4 A case study: data mining in hyperspectral
imaging.........494
4. Conclusions................................................496
References......................................................497
Authors’ Biographical Statements................................503
CHAPTER 11
Maintenance Planning Using Enterprise Data
Mining,by L. P. Khoo, Z. W. Zhong,
and H. Y. Lim.................................505
Click here for the abstract of this Chapter in PDF format
1. Introduction...............................................506
2. Rough Sets, Genetic Algorithms, and Tabu Search............508
2.1 Rough sets.........................................508
2.1.1 Overview.................................508
2.1.2 Rough sets and fuzzy sets................509
2.1.3 Applications.............................510
2.1.4 The strengths of the theory of
rough sets......511
2.1.5 Enterprise information and the
information system......512
2.2 Genetic algorithms.................................516
2.3 Tabu search........................................520
3. The Proposed Hybrid Approach...............................521
3.1 Background..........................................521
3.2 The rough set engine...............................521
3.3 The tabu-enhanced GA engine........................523
3.4 Rule organizer.....................................528
4. A Case Study...............................................528
4.1 Background.........................................528
4.1.1 Mounting bracket failures...............531
4.1.2 The alignment problem...................532
4.1.3 Sea/land inner/outer guide roller
failures........532
4.2 Analysis using the proposed hybrid approach........532
4.3 Discussion.........................................537
4.3.1 Validity of the extracted rules.........537
4.3.2 A comparative analysis of the results...538
5. Conclusions................................................540
References......................................................541
Authors’ Biographical Statements................................544
CHAPTER 12
Data Mining Techniques for Improving Workflow
Model, by D. Gunopulos and S. Subramaniam.....545
Click here for the abstract of this Chapter in PDF format
1. Introduction...............................................546
2. Workflow Models............................................549
3. Discovery of Models from Workflow Logs.....................552
4. Managing Flexible Workflow Systems.........................555
5. Workflow Optimization Through Mining of Workflow Logs......557
5.1 Repositioning decision points.....................557
5.2 Prediction of execution paths.....................560
6. Capturing the Evolution of Workflow Models.................565
7. Applications in Software Engineering.......................566
7.1 Discovering reasons for bugs in software
processes.......567
7.2 Predicting the control flow of a software
process for efficient resource management.......568
8. Conclusions................................................569
References......................................................569
Authors’ Biographical Statements................................576
CHAPTER 13
Mining Images of Cell-Based Assays,
by P. Perner..................................577
Click here for the abstract of this Chapter in PDF format
1. Introduction...............................................578
2. The Application Used for the Demonstration of the
System Capability.......580
3. Challenges and Requirements for the Systems................582
4. The Cell-Interpret’s Architecture..........................582
5. Case-Based Image Segmentation..............................584
5.1 The case-based reasoning unit.....................585
5.2 Management of case bases..........................587
6. Feature Extraction.........................................588
6.1 Our flexible texture descriptor....................589
7. The Decision-Tree Induction Unit...........................591
7.1 The basic principle................................591
7.2 Terminology of the decision tree...................592
7.3 Subtasks and decision criteria for decision-tree
induction.......594
7.4 Attribute selection criteria.......................597
7.4.1 Information gain criteria and the
gain ratio......598
7.4.2 The Gini function.......................600
7.5 Discretization of attribute values.................601
7.5.1 Binary discretization...................603
7.5.1.1 Binary discretization
based on entropy........603
7.5.1.2 Discretization based on
inter- and intra-class
variance........604
7.5.2 Multi-interval discretization...........605
7.5.2.1 The basic algorithm.......606
7.5.2.2 Determination of the
number of intervals.....606
7.5.2.3 Cluster-utility criteria..607
7.5.2.4 MLD-based criteria........607
7.5.2.5 LVQ-based discretization..608
7.5.2.6 Histogram-based
discretization..609
7.5.2.7 Chi-merge discretization..610
7.5.3 The influence of discretization
methods on the resulting decision tree..612
7.5.4 Discretization of categorical or
symbolic attributes.....614
7.5.4.1 Manual abstraction of
attribute values........614
7.5.4.2 Automatic aggregation.....615
7.6 Pruning............................................615
7.6.1 Overview of pruning methods.............617
7.6.2 Cost-complexity pruning.................617
7.7 Some general remarks...............................618
8. The Case-Based Reasoning Unit..............................621
9. Concept Clustering as Knowledge Discovery..................623
10. The Overall Image Mining Procedure..........................627
10.1 A case study........................................629
10.2 Brainstorming and image catalogue...................629
10.3 The interviewing process............................630
10.4 Collection of image descriptions into
the database....................630
10.5 The image-mining experiment.........................631
10.6 Review..............................................634
10.7 Lessons learned.....................................635
11. Conclusions and Future Work................................636
References......................................................637
Author's Biographical Statement.................................641
CHAPTER 14
Support Vector Machines and Applications,
by T. B. Trafalis and O. O. Oladunni..........643
Click here for the abstract of this Chapter in PDF format
1. Introduction..............................................644
2. Fundamentals of Support Vector Machines...................646
2.1 Linear separability...............................646
2.2 Linear inseparability.............................649
2.3 Nonlinear separability............................652
2.4 Numerical testing.................................654
2.4.1 The AND problem........................654
2.4.2 The XOR problem........................656
3. Least Squares Support Vector Machines.....................657
4. Multi-Classification Support Vector Machines..............662
4.1 The one-against-all (OAA) method..................662
4.2 The one-against-one method........................664
4.3 Pairwise multi-classification support vector
machines........665
4.4 Further techniques based on central
representation of the version space.....672
5. Some Applications.........................................674
5.1 Enterprise modeling (novelty detection)...........674
5.2 Non-enterprise modeling application
(multiphase flow).......679
6. Conclusions................................................681
References......................................................682
Authors’ Biographical Statements................................689
CHAPTER 15
A Survey of Manifold-Based Learning Methods,
by X. Huo, X. Ni, and A. K. Smith.............691
Click here for the abstract of this Chapter in PDF format
1. Introduction..............................................692
2. Survey of Existing Methods................................694
2.1 Group 1: Principal component analysis............695
2.2 Group 2: Semi-classical methods –
multidimensional scaling........697
2.2.1 Solving MDS as an eigenvalue
problem.........698
2.3 Group 3: Manifold searching methods..............699
2.3.1 Generative topographic mapping.......699
2.3.2 Locally linear embedding.............701
2.3.3 ISOMAP...............................703
2.4 Group 4: Methods from spectral theory............704
2.4.1 Laplacian eigenmaps...................704
2.4.2 Hessian eigenmaps.....................706
2.5 Group 5: Methods based on global alignment.......707
3. Unification via the Null-Space Method.....................708
3.1 LLE as a null-space based method.................709
3.2 LTSA as a null-space based method................711
3.3 Comparison between LTSA and LLE..................713
4. Principles Guiding the Methodological Developments........713
4.1 Sufficient dimension reduction...................713
4.2 Desired statistical properties...................714
4.2.1 Consistency..........................714
4.2.2 Rate of convergence..................715
4.2.3 Exhaustiveness.......................715
4.2.4 Robustness...........................716
4.3 Initial results..................................716
4.3.1 Formulation and related open
questions.......716
4.3.2 Consistency of LTSA..................718
5. Examples and Potential Applications.......................722
5.1 Successes of manifold based methods on
synthetic data..........722
5.1.1 Examples of LTSA recovering
implicit parameterization.......722
5.1.2 Examples of LLP in denoising.........724
5.2 Curve clustering..................................725
5.3 Image detection...................................728
5.3.1 Formulation...........................731
5.3.2 Distance to manifold..................732
5.3.3 SRA: the significance run algorithm...733
5.3.4 Parameter estimation..................734
5.3.4.1 Number of nearest
neighbors.......734
5.3.4.2 Local dimension........734
5.3.5 Simulations...........................736
5.3.6 Discussion............................738
5.4 Application on the localization of
sensor networks.........738
6. Conclusions...............................................740
References......................................................741
Authors’ Biographical Statements................................745
CHAPTER 16
Predictive Regression Modeling for Small
Enterprise Data Sets with Bootstrap,
Clustering, and Bagging, by C. J. Feng
and K. Erla..................................747
Click here for the abstract of this Chapter in PDF format
1. Introduction..............................................748
2. Literature Review 750
2.1 Tree-based classifiers and the
bootstrap 0.632 rule....750
2.2 Bagging..........................................751
3. Methodology...............................................753
3.1 The data modeling procedure......................753
3.2 Bootstrap sampling...............................753
3.3 Selecting the best subset regression model.......756
3.4 Evaluation of prediction errors..................758
3.4.1 Prediction error evaluation..........758
3.4.2 The 0.632 prediction error...........759
3.5 Cluster analysis.................................760
3.6 Bagging..........................................760
4. A Computational Study.....................................761
4.1 The experimental data............................761
4.2 Computational results............................761
5. Conclusions................................................770
References......................................................771
Authors’ Biographical Statements................................774
Subject Index...................................................775
List of Contributors............................................779
About the Editors...............................................785
Visit Dr. Triantaphyllou's Homepage
Dr. Triantaphyllou's Books /
Special Issues web site
Send suggestions / comments to Dr. E. Triantaphyllou (trianta@lsu.edu).
|