They are realising unstructured data generated from social media and mobiles is vital for business decisions.
Umesh Jain, chief information officer (CIO) of private sector bank YES Bank, is busy finding out solutions to handle the increasing pile of unstructured data in his organisation.
Being in the banking industry, regulatory requirements mandate storing internally generated structured data. But Jain is also facing the problem of data deluge of the unstructured kind. On one hand, internally, increasingly mobile users are generating immense data due to mobile devices and social media networks. On the other hand, YES Bank captures every simple transaction of its customers, starting from withdrawal of money from bank, ATM through debit or credit cards. He believes that the data generation volume is multiplying by almost 2.5 times every year.
|* IDC-EMC study says 1.8 zettabytes (1.8 trillion gigabytes) of data is expected to be created worldwide in 2011|
|This figure is equivalent to:|
|* Every person in India tweeting 3 tweets a minute for 6,883 years non-stop|
|* 32 days of data download by the entire population of India (appx 1.21 billion)|
|* Every person in the world having over 215 million high-resolution MRI scans per day|
|* The amount of information needed to fill 57.5 billion 32GB Apple iPads|
|* With that many iPads, one can create a wall of 4,005 miles long and 61 feet high extending from Anchorage, Alaska to Miami, Florida|
|* With that many iPads one can build the Great iPad Wall of China, at twice the average height of the original|
|* Build a 20-foot high wall around South America and even build a mountain 25-times higher than Mt Fuji|
"I think unstructured data is like sugarcane which is not required to be stored in its totality. The concept is simple. In sugarcane, if you take the juice out and store it then the space required for storing it is much lesser. One needs to keep checking the data and extract the insights, and then store and archive those insights; which can be important," says Jain.
|A DECADE OF DIGITAL UNIVERSE GROWTH|
|Global data generated||Storage cost per gigabyte||Total storage investment|
|Source: IDC-EMC Digital Universe Study|
Unstructured data is data that does not have a identifiable structure. This includes images, emails, tweets, data generated on social media and text, which though not a part of the database of the company is now becoming increasingly important to understand the customers better and serve them with better offerings.
There is a reason why Jain is already working on his company's strategy of managing data. According to a report by storage solutions provider EMC, almost 40,000 petabytes of data is being generated and shared in India in a year which is anticipated to grow by about 60 times to reach 2.3 exabytes in the next 10 years. Not just that, the growth rate of data in India is twice of what is being seen worldwide.
To put things in context, if one takes 16 GB iPads and align them side by side along the Indian Railways network then it will go 10 times around. "If this is the amount of data we are creating and sharing in just one year, imagine what it will be 10 years from now; it is going to be 60 times bigger than this," said Manoj Chug, president, India & SAARC for EMC Corporation. And this data will be almost uniformly generated by corporate in sectors such financial services and insurance, healthcare, pharmaceuticals retail and manufacturing apart from the individual users.
At Essar Group, the diverse business conglomerate which is into different businesses such as energy, oil, gas, steel and realty among others, the amount of data it generates internally as well as externally has gone up several times during the last few years. Pharmaceutical company Elder Pharma says it is seeing its data is increasing by 15-20 per cent with every passing year, as business grows.
"It will be very difficult for me to quantify the amount of data we generate because we have various lines of business. But what has really changed over a period of time is that while in the early years we were mostly storing the data which was more pertinent to that phase of operation, ever since we started data warehousing initiative, we are archiving all the data with the intention of analyzing it later," said C N Ram, CIO of Essar Group.
Agrees Jitendra Mishra, CIO of Mumbai-headquartered Elder Pharmaceuticals, "The data which is not business driven and which is generated mostly from social media sites and video feeds also contribute to the growth to a larger extent. We are worried and trying to mitigate that issue through third-party tools.
While the enormity of this deluge has created a huge challenge for the corporates it has also opened new vistas of opportunities for companies, which by analyzing and mining the historical information, are able to identify new trends. They are able to do predictive analysis of what is going to happen in the future apart from improving their efficiency.
Essar Group, for example, is pulling out the data from its warehouse to take operational level decisions. Recently, the company has started doing 'In Memory Computing' using SAP's HANA appliance software, which is helping it to process and analyse a large amount of data quickly.
"For instance, in our project business, we did two or three concepts just about in three second, that traditionally used to take 15 hours of processing time. So this is a very exciting new development from business point of view," said Ram. He says the ability to capture data relating to trends and making it available to the users requires a lot of meticulous planning and execution. "It involves a lot more extraction, cleansing and summarising of data captured over a period of time."
To handle various internally generated unstructured data, YES Bank is doing re-duplication - store it as one copy instead of multiple copies. "Migrate as much as you can as this can go into applications. And whatever cannot go into applications, at least do a re-duplication. Without that, a single email of one megabyte sent, to say, some five people will get duplicated in backup storage and archiving thus ending up consuming 50 MB of storage space," says Jain of YES Bank.
Industry insiders say that looking at the exciting opportunities to harness the information assets from Big Data, Indian companies in 2012 are expected to prepare themselves on how to harness those for serving the customers better. Indian companies are expected to help Data scientist, a role which is still in the early phase of development. Data scientists help organizations doing predictive analysis looking at the future.
UC Dubey, executive director, IT, of IFFCO Tokio General Insurance, says the company is seeing at least 15 per cent increase of its data generation and sharing every year.
"I would say, 2011 was the year of awareness and 2012 is the year when organisations will begin the journey of being able to harness the vast amount of data they have gathered from internal as well as from external sources," concluded Chug of EMC.