2015 IPFW Student Research and Creative Endeavor Symposium



Download Full Text (1.3 MB)

Faculty Sponsor

Dr. Jin Soung Yoo


Department of Computer Science

University Affiliation

Indiana University – Purdue University Fort Wayne


Many industries that have been collecting digital data are having difficulties in scaling up their systems because of the large size of the data. Since a collection of data sets is so large and complex, it becomes difficult and expensive to process using available database management tools or traditional data processing applications. The challenge for large data sets is because most of relational database management systems do not scale to meet the needs. Working in a cloud parallel systems running on clusters of commodity servers, big data can be analyzed much quicker and more efficiently.

Radio Frequency Identification (RFID) technology is a prevalent tool in tracking commodities in supply chain management systems. Most major retailers use RFID systems to track the movement of products from suppliers to warehouses, store backrooms and eventually to points of sale. The amount of information generated by such systems can be enormous since each individual item (a pallet, a box, or a SKU) will leave a trail of data as it moves through different locations. Data warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions.

Warehousing and mining massive RFID datasets is an essential problem with great potential benefits for inventory management, object tracking, and product procurement process. This work presents a cloud based data warehouse (DW) solution for storing and analyzing spatio-temporal RFID data. The design of the data warehouse would be improved with cloud computing environment (Hadoop and Hive) by enhancing performances in Extract-Transform- Load (ETL) and OLAP (Online Analytical Processing) by changing and upgrading traditional relational schema with Hive data model and various collection types like array, struct, and map. Hadoop provides a parallel-processing computing framework for data storage and processing and Hive, an open-source data warehousing solution built on top of Hadoop, provides an SQL dialect (HiveQL) that translates existing data infrastructure to MapReduce jobs, thereby exploiting the scalability of Hadoop while presenting a familiar SQL abstraction.


Computer Sciences | Physical Sciences and Mathematics

Big RFID Data Warehousing and OLAP in Cloud Computing Environment