Abstract- Data mining is a broad field of data science that is used in numerous fields like education, medical, investigation, environment etc. In aviation industry data mining is used to make prediction on future data based on patterns found in collected data. Using sophisticated data mining techniques millions of data can be searched to spot patterns and detect real causes of aviation accidents. This paper gives an insight into various data mining techniques which are efficient in analyzing flight crashes.I. INTRODUCTIONAviation industry is the business sector dedicated to manufacturing and operating all types of aircrafts. Over the last 100 years, there is a drastic change in aviation industry. Aviation industry is stepping up from Wright Flyer planes to Boeing 747 to 787 Dreamliner.As there is a drastic increase in new aviation technologies but still there is a need to improve security measures to avoid accidents. With the increment in aviation accidents there is loss of human life, money in millions of dollars and efforts & time taken by industry. But safety features also change from last 100 years and still the field of aviation is always searching new ways to improve their safety.As an aviation industry collects several case reports including structured and unstructured data and parsing all such data by hand would be impossible because of this, problem often found while investigating accidents.But with the relatively new fields of Data Mining it would be able to parse through unmanageable amount of data to find patterns and anomalies that indicate potential incidents before they happened.II.THEORITICAL BACKGROUNDThe most common reasons for flight crashes are pilot error, mechanical failure, human error etc. Roughly 50% of the aircraft losses due to pilot errors as there are many chances for the pilots to cause errors from failing to program correctly to miscalculation of the required fuel. Mechanical Failure will lead to 20% of aircraft losses and despite having multiple electronic aids, aircrafts still struggle to function properly when the weather turns out to be unpleasant like in storms, snow and fog. Human errors are another common reason for flight crash as major operations like air traffic controllers, dispatchers, loaders etc. all are operated by humans.To overcome the above root causes of aviation industry patterns are needed to be identified which are major cause of accidents. Such patterns are also known as incidents. Finding such incidents or patterns in the aeronautical data physically is impracticable because of the mass measure of data delivered each day. So, with the data mining we can parse through such a huge amount of data and find out all those unknown patterns which detect these causes of aircraft accident.Data mining refers to the task of analysing large amount of data with intend of finding hidden patterns and trends that are not immediately apparent from summarized data. Data mining and knowledge extraction from raw data is becoming more and more important and useful as the amount and complexity of data is rapidly increasing. Data mining commonly involves four classes of tasks: Classification –It is the problem of identifying to which of a set of categories a new observation belongs, based on a training set of data containing observations whose category membership is known. Clustering – It is the task of grouping a set of objects in such a way thatobjects in the same group (called a cluster) are more similar to each other than to those in other groups. Regression- It is a set of statistical processes for estimating the relationships among variables and Association Rule- Association is a rule based machine learning method for discovering interesting relationships between variables in large databases. It is intended to identify strong rules discovered on database using some measures.III. RELATED WORKSeveral studies that have been conducted using data mining methods in different applications related to aviation industry are as follows:Shagun and Sabitha 1aimed to analyse and investigate flight crashes through K-Mean clustering data mining technique and cosine similarity on international flight crash dataset taken from the year 1908 to 2009. Cluster model forms five clusters of both numerical and text dataset and cosine similarity measure find similarity among different text. Analysis of model is done based on different factors like operators, locations and type of flight, result shows that fatality of ground is more than aboard and model also find operator with maximum fatality of 4266.Nazeri, Donohue and Sherry 2investigated a difference between aircraft incident and accidents. They majorly focus on the identification of all patterns that can cause an accident. Important and contributory factors were taken from dataset containing data from year 1995 to 2004 from National Transportation Safety Board (NTSB). If the factors were identified as a single reason causing the problem situation then they were labelled as incident factor but if problem situation occurred based on the combination of several factors, then such combination factors became precondition for the accident.Christopher and Appavu 3focused on different feature selection techniques, applied on the airline dataset to understand and clean dataset and to improve the performance of classification method. The dataset contain data from year 1970 to 2011 on which various feature selection techniques are applied. After feature selection, five classifiers are used to build classification model and these models are evaluated on the basis of accuracy performance matric. Result shows that decision tree perform better than other classifiers with accuracy of 97.68% and principal component analysis perform better than other feature selections with 99.76% accuracy.David Pagels 4shows implementation of three methods on three different type of aviation data. Flight data record which contain continuous and discrete data, synthetic data containing dispersed anomalies and incident reports contain non-uniform data. Multiple kernel learning is applied which find patterns in flight data record, then effectiveness of Hidden Markov model versus Hidden Semi-Markov model were compared in detecting anomalies and incident reports were used for analysing the effectiveness of text classification. With such model heterogeneous anomalies in the data can also be detected where previously either discrete or continuous were identified.Feng and Juanjuan 5focused their work only on pilot related accidents and incidents through contrast set mining and attribute focusing techniques of data mining. The dataset contain data from year 1995 to 2008 from National Transportation Safety Board (NTSB) and Aviation Safety Reporting System (ASRS). Various patterns of factors were identified which are more likely lead to accidents than incidents by using contrast set mining technique and attribute focusing technique is used to find all those attributes that may lead to accidents.For analysis of these work, refer to Table 1.TitleDescriptionFlight Crash Investigation Using Data Mining TechniquesThis research work identifies aboard/ground fatality rate with operators and location. Work is done by using k-mean clustering which assumes symmetric Gaussian shape for density function, which requires a large amount of clean data for successful clustering. To overcome such problem density based clustering can be used.Analyzing relationships between aircraft accident and incidentsThis analysis identifies relationship between accident and incident data to find patterns of casual and contributory factors. But analysis doesn’t contain any data about quality assurance and aviation safety programs which is maintained by airlines.Prediction of Warning level in Aircraft accidents using Data Mining TechniquesIn this work different feature selection techniques are applied for improving performance of classification algorithm but use of feature selection techniques may eliminate some important or useful features which may be helpful for analysis of data.Aviation Data MiningThis research work focus on three different type of dataset and able to model heterogeneous anomalies in the data. But work doesn’t consider generalized data in incident report and also linked reports are not considered.Analyzing Pilot-related Accidents by data MiningThis work uses contrast set mining to study pilot related accidents and incidents. Patterns of factors are identified which are more likely lead to accidents than incidents, but study contain only selected structured fields which may not be helpful for the evaluation of overall result.Tabl/e 1: Analytical StudyIV. CONCLUSIONAnalysis of historical data describing aviation incidents provides important source of knowledge to prevent such events in future and also reduce their occurrences. In conducting such analysis data mining act as a useful tool to determine the causes of fatality so that preventive actions can be taken.