BIG DATA MINING TOOLS FOR UNSTRUCTURED DATA: A REVIEW

Yogesh S. Kalambe, D. Pratiba, Pritam Shah

Abstract


Big data is a buzzword that is used for a large size data which includes structured data, semi-structured data and unstructured data. The size of big data is so large, that it is nearly impossible to collect, process and store data using traditional database management system and software techniques. Therefore, big data requires different approaches and tools to analyze data. The process of collecting, storing and analyzing large amount of data to find unknown patterns is called as big data analytics. The information and patterns found by the analysis process is used by large enterprise and companies to get deeper knowledge and to make better decision in faster way to get advantage over competition. So, better techniques and tools must be developed to analyze and process big data. Big data mining is used to extract useful information from large datasets which is mostly unstructured data. Unstructured data is data that has no particular structure, it can be any form. Today, storage of high dimensional data has no standard structure or schema, because of this problem has risen. This paper gives an overview of big data sources, challenges, scope and unstructured data mining techniques that can be used for big data.


Keywords


Big Data; Data Analytics; Unstructured Data; Unstructured Data Mining; Analytics as a Service;

References


J. Gantz and D. Reinsel, ``The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east,'' in Proc. IDC iView,IDC ANAL. FUTURE, 2012.

M. R. WIGAN, AND R. CLARKE, "Big Data's Big Unintended Consequences," Computer , vol.46, no.6, pp.46-53, June 2013, doi:10.1109/MC.2013.195

J. Gantz and D. Reinsel, ``Extracting value from chaos,'' in Proc. IDC iView, 2011, pp. 1_12.

J. Manyika et al., Big data: The Next Frontier for Innovation, Competition, and Productivity. San Francisco, CA, USA: McKinsey Global Institute, 2011, pp. 1_137.

M. Cooper and P. Mell. (2012). Tackling Big Data [Online]. Available: http://csrc.nist.gov/groups/SMA/forum/documents/june2012presentations/f%csm_june2012_cooper_mell.pdf

P. C. ZIKOPOULOS, C. EATON, D. deROOS, T. DEUTSCH, AND G. LAPIS, “Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data,” Published by McGraw-Hill Companies,2012

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Big%20Data%20University/page/FREE%20ebook%20%20Understanding%20Big%20Data.

“IBM What Is Big Data: Bring Big Data to the Enterprise,”

http://www-01.ibm.com/software/data/bigdata/, IBM, 2012.

K. RUPANAGUNTA, D. ZAKKAM, AND H. RAO, “How to Mine Unstructured Data,” Article in Information Management, June 29 2012,

http://www.information-management.com/newsletters/dataminingunstructured-big-data-youtube--10022781-1.html

S. Marche, ``Is Facebook making us lonely,'' Atlantic, vol. 309, no. 4, pp. 60_69, 2012.

G. Blackett. (2013). Analytics Network-O.R. Analytics [Online].

Available: http://www.theorsociety.com/Pages/SpecialInterest/AnalyticsNetwork_anal%ytics.aspx

IBM Research, “Analytics-as-a-Service Platform,” Available:

http://researcher.ibm.com/researcher/view_project.php?id=3992

X. SUN, B. GAO, L. FAN, AND W. AN, "A Cost-Effective Approach to Delivering Analytics as a Service," IEEE 19th International Conference on Web Services (ICWS 2012), vol., no., pp.512,519, 24-29 June 2012, doi: 10.1109/ICWS.2012.79

J. Y. HSU, AND W. YIH, “Template-Based Information Mining from HTML Documents,” American Association for Artificial Intelligence, July 1997.

M. DELGADO, M. MARTÍN-BAUTISTA, D. SÁNCHEZ, AND M. VILA, “Mining Text Data: Special Features and Patterns,” Pattern Detection and Discovery, Lecture Notes in Computer Science, 2002, Volume 2447/2002, 175-186, DOI: 10.1007/3-540-45728-3_11

Q. ZHAO AND S. S. BHOWMICK, “Association Rule Mining: A Survey,” Technical Report, CAIS, Nanyang Technological University, Singapore, No. 2003116 , 2003.

L. HAN, T. O. SUZEK, Y. WANG, AND S. H. BRYANT, “The Textmining based PubChem Bioassay neighboring analysis,” BMC Bioinformatics 2010, 11:549 doi:10.1186/1471-2105-11-549

R. K. LOMOTEY AND R. DETERS, “RSenter: Tool for Topics and Terms Extraction from Unstructured Data Debris”, Proc. of the 2013 IEEE International Congress on Big Data (BigData Congress 2013), pp:395-402, Santa Clara, California, 27 June–2 July 2013.

Lomotey, R.K., and R. Deters. “Towards Knowledge Discovery in Big Data.” Service Oriented System Engineering (SOSE), 2014 IEEE 8th International Symposium on, April 7, 2014, 181–91. doi: 10.1109/ SOSE. 2014. 25.

Erlang Programing Language,

http://www.erlang.org/


Full Text: PDF

Refbacks

  • There are currently no refbacks.




Copyright © 2012 - 2018, All rights reserved.| ijitr.com

Creative Commons License
International Journal of Innovative Technology and Research is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJITR , Permissions beyond the scope of this license may be available at http://creativecommons.org/licenses/by/3.0/deed.en_GB.