by Lian-Yin Zhai, Li-Pheng Khoo, and Sai-Cheong Fok (See Abstract) A Scalable, Incremental Learning Algorithm for Classification Problems by Nong Ye and Xiangyang Li (See Abstract) A Fuzzy Curved Search Algorithm for Neural Network Learning by Peitsang Wu (See Abstract) Multiscale Approximation MEthods (MAME) to Locate Embedded Consecutive Subsequences - Its Applications in Statistical data Mining and Spatial Statistics by Xiaoming Huo (See Abstract) Simple Association Rules (SAR) and the SAR-Based Rule Discovery by Guoqing Chen, Qiang Wei, De Liu, and Geert Wets (See Abstract) Mining Fuzzy Association Rules for Classification Problems by Y.-C. Hu, R.-S., and G.-H. Tzeng (See Abstract) Visual Exploration of Production Data Using Small Multiples Design with Non-Uniform Color Mapping by Tien-Lung Sun, and Wen-Lin Kuo (See Abstract) A Data Mining Approach For Improving Polycythemia vera Diagnosis by Mehmed Kantardzic, Benjamin Djulbegovic, and Hazem Hamdan (See Abstract) Data Mining Techniques For Improved WSR-88D rainfall Estimation by T. B. Trafalis, A. Whitea, B. Santosa, and M. B. Richman (See Abstract) Knowledge Discovery Techniques for Predicting Country Investment Risk by Irma Becerra-Fernandez, Stelios H. Zanakis, and Steven Walczak (See Abstract) Customer's Time-Variant Purchase Behavior and Corresponding Marketing Strategies: An Online Retailer's Case by Sung Ho Ha, Sung Min Bae, Sang Chan Park (See Abstract) Data Mining Corrosion From Eddy Current Non-Destructive Tests by Donald E. Brown, and John R. Brence (See Abstract) DIVA: A Visualization System for Exploring Document Databases For Technology Forecasting by Steven Morris, Zheng Wu, Camille DeYong, Sinan Salman, Dagmawi Yemenu (See Abstract) ABSTRACTS:
Computers and Industrial Engineering, Vol. 43, No. 4, pp. 661-676. by Lian-Yin Zhai, Li-Pheng Khoo, and Sai-Cheong Fok School of Mechanical and Production Engineering Nanyang Technological University 50 Nanyang Avenue Singapore 639798 E-mail: mlyzhai@ntu.edu.sg E-mail: mlpkhoo@ntu.edu.sg E-mail: mscfok@ntu.edu.sg Web: http://www.ntu.edu.sg/MPE/Divisions/manufacturing/Faculty/mlpkhoo.htm ABSTRACT: Feature extraction is an important aspect in data mining and knowledge discovery. In this paper an integrated feature extraction approach, which is based on rough set theory and genetic algorithms, is proposed. Based on this approach, a prototype feature extraction system is established and illustrated in an application for the simplification of product quality evaluation. The prototype system successfully integrates the capability of rough set theory in handling uncertainty with a robust search engine, which is based on a genetic algorithm. The results show that it can remarkably reduce the cost and time consumed on product quality evaluation without compromising the overall specifications of the acceptance tests. KEY WORDS: Feature extraction, Rough sets, Genetic algorithms, Knowledge extraction. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 677-692. by Nong Ye, and Xiangyang Li P. O. Box 875906 Department of Industrial Engineering Arizona State University Tempe, Arizona, 85287, USA E-mail: nongye@asu.edu Phone: (480) 965-7812, fax: 480-965-8692 Web: http://ceaspub.eas.asu.edu/ye/ ABSTRACT: In this paper a novel data mining algorithm, Clustering and Classification Algorithm-Supervised (CCA-S), is introduced. CCA-S enables the scalable, incremental learning of a non-hierarchical cluster structure from training data. This cluster structure serves as a function to map the attribute values of new data to the target class of these data, that is, classify new data. CCA-S utilizes both the distance and the target class of training data points to derive the cluster structure. In this paper, we first present problems with many existing data mining algorithms for classification problems, such as decision trees, artificial neural networks, in scalable and incremental learning. We then describe CCA-S and discuss its advantages in scalable, incremental learning. The testing results of applying CCA-S to several common data sets for classification problems are presented. The testing results show that the classification performance of CCA-S is comparable to the other data mining algorithms such as decision trees, artificial neural networks and discriminant analysis. KEY WORDS: Data mining, Classification, Incremental learning, Scalability. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 693-702. by Peitsang Wu Department of Industrial Engineering and Management I-Shou University, Kaohsiung County Taiwan 84008, ROC E-mail: pwu@isu.edu.tw Web: http://www.im.isu.edu.tw/pwu/ ABSTRACT: In this paper we develop a curved search algorithm which uses second-order information, for the learning algorithm for a supervised neural network. With the objective of reducing the training time, we introduce a fuzzy controller for adjusting the first and second-order approximation parameters in the iterative method to further reduce the training time and to avoid the spikes in the learning curve which sometimes occurred with the fixed step length. Computational results indicate a significant reduction in training when comparing with the delta learning rule. KEY WORDS: Neural Networks, Fuzzy Control, Curved-Search Algorithm, Back Propagation Learning. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 703-720. by Xiaming Huo School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205 Web: http://www.isye.gatech.edu/~xiaoming/ E-mail: xiaoming@isye.gatech.edu ABSTRACT: In statistical data mining and spatial statistics, many problems (such as detection and clustering) can be formulated as optimization problems whose objective functions are functions of consecutive subsequences. Some examples are (1) searching for a high activity region in a Bernoulli sequence, (2) estimating an underlying boxcar function in a time series, and (3) locating a high concentration area in a point process. A comprehensive search algorithm always ends up with a high order of computational complexity. For example, if a length-$n$ sequence is considered, the total number of all possible consecutive subsequences is ${n+1 \choose 2} \approx n^2/2$. A comprehensive search algorithm requires at least 0(n2) numerical operations. We present a multiscale-approximation-based approach. It is shown that most of the time, this method finds the exact same solution as a comprehensive search algorithm does. The derived Multiscale Approximation MEthods (MAMEs) have low complexity: for a length-$n$ sequence, the computational complexity of an MAME can be as low as $O(n)$. Numerical simulations verify these improvements. The MAME approach is particularly suitable for problems having large size data. One known drawback is that this method does not guarantee the exact optimal solution in every single run. However, simulations show that as long as the underlying subjects possess statistical significance, a MAME find the optimal solution with probability almost equal to one. KEY WORDS: Data mining, maximum likelihood estimate, multiscale approximation. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 721-734. by Guoqing Chen1, Qiang Wei1, De Liu2, Geert Wets3 1: School of Economics and Management Tsinghua University Beijing 100084 CHINA E-mail: chengq@em.tsinghua.edu.cn 2: Center for Research on E-Commerce University of Texas at Austin, Austin, TX 78712 3: Limburg University Universitaire Campus Bld D 3590 Diepenbeek BELGIUM ABSTRACT: Association rule mining is one of the most important fields in data mining and knowledge discovery in databases (KDD). Rules explosion is a problem of concern, as conventional mining algorithms often produce too many rules for decision makers to digest. Instead, this paper concentrates on a smaller set of rules, namely, a set of simple association rules (SAR) each with its consequent containing only a single attribute. Such a rule set can be used to derive all other association rules, meaning that the original rule set based on conventional algorithms can be "recovered" from the simple rules without any information loss. The number of simple rules is much less than the number of all rules. Moreover, corresponding algorithms are developed such that certain forms of rules can be generated in a more efficient manner based on simple rules. KEY WORDS: Data mining, Knowledge discovery from databases (KDD), Simple association rules. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 735-750. by Y.-C. Hu1, R.-S, Chen1, and G.-H. Tzeng2 (Correspdonding Author) 1: Institute of Information Management National Chiao Tung University Hsinchu 300 Taiwan ROC E-mail: ghtzeng@cc.nctu.edu.tw 2: Institute of Management of Technology National Chiao Tung University Hsinchu 300 Taiwan ROC ABSTRACT: The effective development of data mining techniques for the discovery of knowledge from training samples for classification in industrial engineering is necessary in applications such as group technology. This paper proposes a learning algorithm, which can be viewed as a knowledge acquisition tool, to effectively discover fuzzy associative classification rules. The consequence part of each rule is one class label. The proposed learning algorithm consists of two phases: one to generate large fuzzy grids from training sample by fuzzy partitioning in each attribute, and the other to generate fuzzy associative classification rules by large fuzzy grids. The proposed learning algorithm is implemented by scanning training samples stored in a database only once and applying a sequence of Boolean operations to generate fuzzy grids and fuzzy rules; therefore, it can be easily extended to discover other types of fuzzy association rules. The simulation results from the iris data demonstrate that the proposed learning algorithm can effectively derive fuzzy associative classification rules. KEY WORDS: Data mining, Knowledge acquisition, Classification problems, Association rules. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 751-764. by Tien-Lung Sun1, and Wen-Lin Kuo2 1: Department of Industrial Engineering and Management Yuan-Ze University Nei-Li Taiwan, R.O.C. Web: http://cadcam.iem.yzu.edu.tw/Professor/TLS/index-English.htm E-mail: tsun@saturn.yzu.edu.tw 2: Department of Business Administration Chihlee Institute of Commerce Banchiau Taiwan, R.O.C. E-mail: wenlin.kuo@msa.hinet.net ABSTRACT: Visual data mining may overcome some of the flexibility problem often suffered by computer-centered data mining approaches. This can happen because human beings are introduced to the information discovery loop to take advantage of their natural strength in creative thinking and rapid visual pattern recognition to discover information not defined a priori and to perform approximated reasoning that computer algorithms are hard to do. This paper presents a novel visual exploration approach for mining abstract, multi- dimensional data stored in tables in a relational database. The visual image is constructed by converting each table into a visualization unit, called a table graph, and then by assembling these table graphs together to form a small multiples design. Different types of non-uniform color mappings to render this small multiples design could be automatically generated by minimizing the weight differences of colors in the visual image. These non-uniform color mappings are designed in such a way that the adjacent glyphs in a table graph that have near underlying values will be assigned with the same color. As such, visual patterns not able to see under the traditional uniform color mapping could be revealed. This enables the users to examine the input tables from different perspectives. The proposed flexible visualization method has been applied to generate visual images from which the users could quickly and easily compare the machine idle cost performances of alternative master production plans. KEY WORDS: Visual data mining, Data visualization, Production management. Computers and Industrial Engineering, Vol. 43, No. 4 pp. 765-774. by Mehmed Kantardzic1, Benjamin Djulbegovic2, and Hazem Hamdan1 1: Computer Engineering and Computer Science Department J. B. Speed Scientific School University of Louisville Louisville, KY 40292 Phone: (502) 852-3703 E-mail: mmkant01@athena.louisville.edu 2: Division of Blood and Bone Marrow Transplant H.Lee Moffitt Cancer Center & Research Institute University of South Florida Tampa, FL ABSTRACT: This paper presents a data mining approach to the extraction of new decision rules for Polycythemia Vera (PV) diagnosis, based on a reduced and optimized set of lab parameters. Ten laboratory and other clinical findings (8 parameters from the PVSG criteria + Sex and HCT) on 431 PV patients from the original PVSG cohort, and records on 91 patients with other myeloproliferative disorders that can be easily misdiagnosed with PV, were included in this study. Significant differences were not found in the correctness of diagnostic classification of patients using either a trained artificial-neural network (ANN) (98.1%) or a support vector machine (SVM) (95%) versus using PVSG diagnostic criteria, which are considered as a "gold- standard" for the diagnosis of PV. Reducing the original parameters of our dataset to only four parameters: HCT, PLAT, SPLEEN and WBC, we still have obtained good classification results. New rules for improved differential diagnosis of PV are specified based on these four parameters. These rules may be used as a complement to the standard PVSG criteria, particularly in the differential diagnosis between PV and other myeloproliferative syndromes. KEY WORDS: Polycythemia Vera, Feature Extraction, Artificial Neural Networks, Support Vector Machines, Decision Rules, N-dimensional Visualization. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 775-786. by T. B. Trafalis1, A. White1, B. Santosa1, and M. B. Richman2 1: School of Industrial Engineering The University of Oklahoma 202 W. Boyd, Ste 124 Norman, OK 73019 E-mail: ttrafalis@ou.edu E-mail: Andy.White@noaa.gov E-mail: bsant@ou.edu 2: School of Meteorology The University of Oklahoma 100 E. Boyd, Ste 1310 Norman, OK 73019 E-mail: mrichman@ou.edu ABSTRACT: The main objective of this paper is to utilize data mining and an intelligent system, Artificial Neural Networks (ANNs), to facilitate rainfall estimation. Ground truth rainfall data are necessary to apply intelligent systems techniques. A unique source of such data is the Oklahoma Mesonet. Recently, with the advent of a national network of advanced radars (i.e., WSR-88D), massive archived data sets have been created generating terabytes of data. Data mining can draw attention to meaningful structures in the archives of such radar data, particularly if guided by knowledge of how the atmosphere operates in rain producing systems. The WSR-88D records digital database contains three native variables: velocity, reflectivity, and spectrum width. However, current rainfall detection algorithms make use of only the reflectivity variable, leaving the other two to be exploited. The primary focus of the proposed research is to capitalize on these additional radar variables at multiple elevation angles and multiple bins in the horizontal for precipitation prediction. Linear regression models and feed-forward ANNs are used for precipitation prediction. Rainfall totals from the Oklahoma Mesonet are utilized for the training and verification data. Results for the linear modeling suggest that, taken separately, reflectivity and spectrum width models are highly significant. However, when the two are combined in one linear model, they are not significantly more accurate than reflectivity alone. All linear models are prone to under-prediction when heavy rainfall occurred. The ANN results of reflectivity and spectrum width inputs show that a 250-5-1 architecture is least prone to under- prediction of heavy rainfall amounts. When a three-part ANN was applied to reflectivity based on light, moderate to heavy rainfall, in addition to spectrum width, it estimated rainfall amounts most accurately of all methods examined. KEY WORDS: Back-propagation, Clustering, Data Mining Applications, Dimensionality Reduction, Exploratory Data Analysis, Feed-forward Neural Networks, Mean-Square Error, Neural Network Architectures, Pattern Recognition, Principal Component Analysis, Rainfall Estimation. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 787-800. by Irma Becerra-Fernandez1, Stelios H. Zanakis1 (Corresponding Author), and Steven Walczak2 Florida International University Decision Sciences & Information Systems Department College of Business Administration Miami, FL 33199 Phone (305)348-2830 E-mail: zanakis@fiu.edu E-mail: becferi@fiu.edu 2: University of Colorado at Denver, College of Business Denver, CO 80217-3364 E-mail: swalczak@carbon.cudenver.edu ABSTRACT: This paper presents the insights gained from applying knowledge discovery in databases (KDD) processes for the purpose of developing intelligent models, used to classify a country's investing risk based on a variety of factors. Inferential data mining techniques, like C5.0, as well as intelligent learning techniques, like neural networks, were applied to a dataset of fifty-two countries. The dataset included 27 variables (economic, stock market performance/risk and regulatory efficiencies) on 52 countries, whose investing risk category was assessed in a Wall Street Journal survey of international experts. The results of applying KDD techniques to the dataset are promising, and successfully classified most countries as compared to the experts' classifications. Implementation details, results, and future plans are also presented. KEY WORDS: Data mining, Knowledge discovery, Country investing risk. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 801-820. by Sung Ho Ha, Sung Min Bae, and Sang Chan Park Dept. of Industrial Engineering Korean Advanced Institute of Science and Technology KOREA Web: http://captain.kaist.ac.kr E-mail: hash@major.kaist.ac.kr E-mail: loveiris@major.kaist.ac.kr E-mail: sangpark@cais.kaist.ac.kr ABSTRACT: The traditional customer relationship management (CRM) studies are mainly focused on CRM in a specific point of time. The static CRM and derived knowledge of customer behavior could help marketers to redirect marketing resources for profit gain at the given point in time. However, as time goes on the static knowledge becomes obsolete. Therefore, application of CRM to an online retailer should be done dynamically in time. Though the concept of buying-behavior-based CRM was advanced several decades ago, virtually little application of the dynamic CRM has been reported to date. In this paper, we propose a dynamic CRM model utilizing data mining and a Monitoring Agent System (MAS) to extract longitudinal knowledge from the customer data and to analyze customer behavior patterns over time for the retailer. Furthermore, we show that longitudinal CRM could be usefully applied to solving several managerial problems, which any retailer may face. KEY WORDS: Customer Relationship Management, Data Mining, Electronic Commerce, Marketing Strategy, Markov Chains. Computers and Industrial Engineering, Vol. 43, No. 4, pp. 821-840. by Donald E. Brown1, and John R. Brence2 1: Department of Systems and Information Engineering University of Virginia Charlottesville, VA 22903 Phone: (804) 924-5393 E-mail: brown@virginia.edu 2: Department of Systems Engineering United States Military Academy West Point, NY 10996 Phone: (845) 938-5535 E-mail: fj672@usma.edu ABSTRACT: Quicker, more effective methods of corrosion prediction and classification can help to ensure a safe and operational transportation system for both civilian and military sectors. This is especially critical now as transportation providers attempt to meet the increased expense of repairing aging aircraft with smaller budgets. These budget constraints make it imperative to find corrosion and to correctly determine the appropriate time to replace corroded parts. If the part is replaced too soon, the result is wasted resources. However, if the part is not replaced soon enough, it could cause a catastrophic accident. The discovery of models that limit the possibility of a costly accident while optimizing resource utilization would allow transportation providers to efficiently focus their maintenance efforts. While our concern in this study was with aircraft, the results will also be useful to other transportation providers. This paper describes the discovery and comparison of empirical models to predict corrosion damage from non-destructive test (NDT) data. The NDT data were derived from eddy current (EC) scans of the United States Air Force's (USAF) KC-135 aircraft. While we might suspect a link between NDT results and corrosion, up until now this link has not been formally established. Instead, the NDT data have been converted into false color images that are analyzed visually by maintenance operators. The models we discovered are quite complex and suggest data mining approaches we can sometimes more effectively handle noisy data through more complex models rather than simpler ones. Our results also show that while a variety of modeling techniques can predict corrosion with reasonable accuracy, regression trees are particularly effective in modeling the complex relationships between the eddy current measurements and the actual amount of corrosion. KEY WORDS: To be filled in soon... Computers and Industrial Engineering, Vol. 43, No. 4, pp. 841-xxx. by Steven Morris1, Zheng Wu1, Camile DeYong2, Sinan Salman2, and Dagmawi Yemenu2 1: Dept. of Electrical and Computer Engineering 202 Engineering So. Oklahoma State University Stillwater, OK 74078 E-mail: samorri@okstate.edu FAX: (405) 744-9198 2: Industrial Engineering and Management 322F Engineering North Oklahoma State University Stillwater, OK 74078 ABSTRACT: DIVA (for Database Information Visualization and Analysis system) is a computer program which helps perform bibliometric analysis of collections of scientific literature and patents for technology forecasting. Documents, drawn from the technological field of interest, are visualized as clusters on a two dimensional map, permitting exploration of the relationships among the documents and document clusters and also permitting derivation of summary data about each document cluster. Such information, when provided to subject matter expects performing a technology forecast, can yield insight into trends in the technological field of interest. This paper discusses the document visualization and analysis process: acquisition of documents, mapping documents, clustering, exploration of relationships, and generation of summary and trend information. Detailed discussion of DIVA exploration functions is presented and followed by an example of visualization and analysis of a set of documents about chemical sensors. KEY WORDS: Technology forecasting, Information visualization, Knowledge discovery in databases (KDD), Data mining, Citation analysis, Document mapping, Bibliometrics, Scientometrics. Dr. Triantaphyllou's Homepage Dr. Triantaphyllou's Books / Special Issues web site Dr. Liao's Homepage Dr. Iyengar's Homepage Send suggestions / comments to Dr. E. Triantaphyllou (trianta@lsu.edu). |