Title

Feature Weighted Naïve Bayes Algorithm for Information Retrieval of Enterprise Systems

Document Type

Article

Publication Date

2014

Publication Source

Enterprise Information Systems

Volume

8

Issue

1

Inclusive pages

107-120

DOI

https://doi.org/10.1080/17517575.2013.860481

Publisher

Taylor & Francis

ISBN/ISSN

1751-7583

Abstract

Automated information retrieval is critical for enterprise information systems to acquire knowledge from the vast amount of data sets. One challenge in information retrieval is text classification. Current practices rely heavily on the classical naïve Bayes algorithm due to its simplicity and robustness. However, results from this algorithm are not always satisfactory. In this article, the limitations of the naïve Bayes algorithm are discussed, and it is found that the assumption on the independence of terms is the main reason for an unsatisfactory classification in many real-world applications. To overcome the limitations, the dependent factors are considered by integrating a term frequency–inverse document frequency (TF-IDF) weighting algorithm in the naïve Bayes classification. Moreover, the TF-IDF algorithm itself is improved so that both frequencies and distribution information are taken into consideration. To illustrate the effectiveness of the proposed method, two simulation experiments were conducted, and the comparisons with other classification methods have shown that the proposed method has outperformed other existing algorithms in terms of precision and index recall rate.

Keywords

enterprise information systems (EIS), information retrieval, data mining, text classification, Naïve Bayesian algorithm, term frequency–inverse document frequency (TF-IDF)

Disciplines

Engineering

This document is currently not available here.

  Contact Author

Share

COinS