GOOGLE REFINE


“Water, Water everywhere and not a drop to drink” may be the apt representation of the humungous amount of available data in the current generation ,which cannot be used for useful purpose. The data available are huge in number and if we look carefully ,we can find a lot of redundancies and inconsistency in the data. While redundancy can be reduced at the design level, the inconsistency of the data can be reduced to a large extent by using the GOOGLE REFINE tool.Clustering and other related BI features with elegant UI  makes this tool a powerful Business Intelligence tool. Its advantages
·         Ease of use
·         Extensive functionality
·         Undo/Redo is simply awesome

Some of the disadvantages are

a) Looks more like a spreadsheet
b) Cannot do much operations as in spreadsheet [only for high end purpose]
c) Difficult to handle very large amount of data


ARCHITECTURE

Google Refine is a web application, but unlike 99% of web applications, it is intended to be run on one's own machine and used by oneself. The server-side maintains states of the data (undo/redo history, long-running processes, etc.) while the client-side maintains states of the user interface (facets and their selections, view pagination, etc.). The client-side makes GET and POST ajax calls to cause changes to the data and to fetch data and data-related states from the server-side.



HOW TO GET Google Refine

GOOGLE!!! So,Its free and can be easily downloaded from http://code.google.com/p/google-refine/ .It comes with instructions to install and use it in your PC

CREATING A PROJECT

Just download the tool(Zip) and execute it .  Then get a sample data to do a Business Intelligence and Upload the test data to create a project. Clear instructions from google at every step makes the creation of a project a cakewalk.



TRANSFORMATION of Data made easy

Transformation is its prime weapon.The transformation of data can be done easily as shown below and also lot of intuitive suggestions will arise from Google Refine to make our life easier.




Text Facet Feature to Cluster data

Clustering of data is no more a complex job done by a ETL tool. Clustering of data can be done easily using the "Text Facet" feature from Refine. It gives results as and when the change is made to ensure accuracy of Transformation.





DATA For Analysis 

The data to be used for the Business Intelligence purpose is “Disasters worldwide from 1900-2008”. For a disaster to be entered into the database, it must meet at least one of the following criteria:

a) Ten (10) or more people reported killed.

b) Hundred (100) or more people reported affecte

c) Declaration of a state of emergency.

d) Call for international assistance.



This data is available in http://www.infochimps.com/datasets/disasters-worldwide-from-1900-2008 .


BUSINESS VALUE

As this tool is used mainly for refining the inconsistent data available to make useful predictions based on the data. We have pulled out the disaster database for the past 100 years and hoping to use the acquired skills in analyzing the data

a) to find Pattern involved in the occurrence of these major disasters
b) to predict the next big disaster and its location based on the available data.
c) To analyze the frequency of each and every type of disaster by location.
By,
Vijaya Prabhu (1oBM60097)
Sathishwaran (10BM60079)



No comments: