Feature Weighted Naïve Bayes Algorithm for Information Retrieval of Enterprise Systems
Enterprise Information Systems
Taylor & Francis
Automated information retrieval is critical for enterprise information systems to acquire knowledge from the vast amount of data sets. One challenge in information retrieval is text classification. Current practices rely heavily on the classical naïve Bayes algorithm due to its simplicity and robustness. However, results from this algorithm are not always satisfactory. In this article, the limitations of the naïve Bayes algorithm are discussed, and it is found that the assumption on the independence of terms is the main reason for an unsatisfactory classification in many real-world applications. To overcome the limitations, the dependent factors are considered by integrating a term frequency–inverse document frequency (TF-IDF) weighting algorithm in the naïve Bayes classification. Moreover, the TF-IDF algorithm itself is improved so that both frequencies and distribution information are taken into consideration. To illustrate the effectiveness of the proposed method, two simulation experiments were conducted, and the comparisons with other classification methods have shown that the proposed method has outperformed other existing algorithms in terms of precision and index recall rate.
enterprise information systems (EIS), information retrieval, data mining, text classification, Naïve Bayesian algorithm, term frequency–inverse document frequency (TF-IDF)
Li Wang, Ping Ji, Jing Qi, Siqing Shan, Zhuming M. Bi, Weiguo Deng, and Naijing Zhang (2014).
Feature Weighted Naïve Bayes Algorithm for Information Retrieval of Enterprise Systems. Enterprise Information Systems.8 (1), 107-120. Taylor & Francis.