Data Mining and Knowledge Discovery in Databases: Applications in Astronomy and Planetary Science
Abstract
of Invited Talk Overview of the Topic Knowledge Discovery in Databases (KDD) is a new field of research concerned with the extraction of high-level information (knowledge) from low-level data (usually stored in large databases) [l]. It is an area of interest to researchers and practitioners from many fields including: AI, statistics, pattern recognition, databases, visualization, and high-performance and parallel computing. The basic problem is to search databases for patterns or models that can be useful in accomplishing one or more goals. Examples of such goals include: prediction (e.g. regression and classification), descriptive or generative modeling (e.g. clustering), e data summarization (e.g. report generation), or e visualization of either data or extracted knowledge (e.g. to support decision making or exploratory data analysis). KDD is a process that includes many steps. Among these steps are: data preparation and cleaning, data selection and sampling, preprocessing and transformation, data mining to extract patterns and models, interpretation and evaluation of extracted information, and finally evaluation, rendering, or use of final extracted knowledge. Note that under this view, data mining constitutes one of the steps of the overall KDD process. The other steps are essential to make the application of data mining possible, and to make the results useful. Within data mining, methods for deriving patterns or extracting models originate from statistics, machine 1590 IAAI-96 learning, statistical pattern recognition, uncertainty management, and database methods such as on-line analysis processing (OLAP) or association rules [2]. The process is typically highly interactive and may involve many iterations before useful knowledge is extracted from the underlying data. This talk will give an overview and summary of the rapidly growing field of KDD, and then focus on two specific applications in scientific data analysis to illustrate the potential, limitations, challenges, and promise of KDD. An overview of the KDD process is given in [3]. Today’s science instruments are capable of gathering huge amounts of data, making traditional human-based comprehensive analysis an infeasible endeavor. This has been a primary motivation to develop tools to automate science data analysis tasks. The talk will describe efforts to develop a new generation of data mining systems where users specify what to search for simply by providing the system with training examples, and letting the system automatically learn what to do. The system would then automatically sift through the data and catalog objects of interest for analysis purposes. The learn-from-example approach is a natural solution to a problem we call the query formulation problem in the exploration and analysis of image data [4]: How does one express a query for objects that are typically only recognized by visual intuition? Translating human visual intuition to pixel-level algorithmic constraints is a difficult problem. By asking the user to simply “show” the system examples of objects of interest, then let the system figure out how to formulate the appropriate query, we believe the problem can be surmounted in certain circumstances. Two applications at JPL will be used to illustrate the learning techniques and their effects. The first targets automating the cataloging of sky objects in a digitized sky survey consisting of three terabytes of image data and From: AAAI-96 Proceedings. Copyright © 1996, AAAI (www.aaai.org). All rights reserved. containing on the order of two billion sky objects. The Sky Image Cataloging and Analysis Tool (SKICAT) [5] allows for automated and accurate classification, enabling the automated cataloging of an estimated two billion sky objects, the majority of which being too faint for visual recognition by astronomers. This represents an instance where learning algorithms solved a significant and difficult scientific analysis problem. Several new results in astronomy have been achieved based on the SKICAT catalog [6]. Recent results of the application of SKICAT to help in discovery of new objects in the Universe include the discovery of 16 new high-redshift quasars: some of the furthest and oldest objects detectable by today’s instruments [7]. The second system we describe is called JARtool (JPL Adaptive Recognition Tool) [8]. JARtool is being initially developed to detect and catalog an estimated one million small volcanoes (< 15km in diameter) visible in a database consisting of over 30,000 images of the planet Venus. The images were collected by the Magellan spacecraft using synthetic aperture radar (SAR) to penetrate the permanent gaseous cloud cover that obscures the planet’s surface in the optical range. Work at JPL’s Machine Learning Systems Group continues to extend data mining techniques to automate analysis in other areas of science including: cataloging of Sun spots, remote-sensing detection of earthquake faults [9], spatiotemporal analysis of atmospheric data, and others (see http://www-aig.jpl.nasa.gov/mls/ for live descriptions of ongoing work).
Cite
Text
Fayyad. "Data Mining and Knowledge Discovery in Databases: Applications in Astronomy and Planetary Science." AAAI Conference on Artificial Intelligence, 1996.Markdown
[Fayyad. "Data Mining and Knowledge Discovery in Databases: Applications in Astronomy and Planetary Science." AAAI Conference on Artificial Intelligence, 1996.](https://mlanthology.org/aaai/1996/fayyad1996aaai-data/)BibTeX
@inproceedings{fayyad1996aaai-data,
title = {{Data Mining and Knowledge Discovery in Databases: Applications in Astronomy and Planetary Science}},
author = {Fayyad, Usama M.},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {1996},
pages = {1590-1592},
url = {https://mlanthology.org/aaai/1996/fayyad1996aaai-data/}
}