In 2000, the open-source operating system Linux was viewed askance in many corporations as an oddball creation and even legally risky to use, since the open-source ethos prefers sharing ideas rather than owning them. But IBM endorsed Linux and poured money and people into accelerating the adoption of the open-source operating system.
On Monday, IBM is to announce a broadly similar move in big data software. The company is placing a large investment - contributing software developers, technology and education programmes - behind an open-source project for real-time data analysis, called Apache Spark.
The commitment, according to Robert Picciano, senior vice-president for IBM's data analytics business, will amount to "hundreds of millions of dollars" a year.
In the big data software market, much of the attention and investment so far has been focused on Apache Hadoop and the companies distributing that open-source software, including Cloudera, Hortonworks and MapR. Hadoop, put simply, is the software that makes it possible to handle and analyse vast volumes of all kinds of data. The technology came out of the pure internet companies like Google and Yahoo!, and is increasingly being used by mainstream companies, which want to do similar big data analysis in their businesses.
But if Hadoop opens the door to probing vast volumes of data, Spark promises speed. Real-time processing is essential for many applications, from analysing sensor data streaming from machines to sales transactions on online marketplaces. The Spark technology was developed at the Algorithms, Machines and People Lab at the University of California, Berkeley. A group from the Berkeley lab founded a company two years ago, Databricks, which offers Spark software as a cloud service.
Spark, Picciano said, is crucial technology that will make it possible to "really deliver on the promise of big data." That promise, he said, is to quickly gain insights from data to save time and costs, and to spot opportunities in fields like sales and new product development.
IBM said it will put more than 3,500 of its developers and researchers to work on Spark-related projects. It will contribute machine-learning technology to the open-source project, and embed Spark in IBM's data analysis and commerce software. IBM will also offer Spark as a service on its programming platform for cloud software development, Bluemix. The company will open a Spark technology centre in San Francisco to pursue Spark-based innovations.
And IBM plans to partner with academic and private education organisations including UC Berkeley's AMPLab, DataCamp, Galvanize and Big Data University to teach Spark to as many as 1 million data engineers and data scientists.
Ion Stoica, the chief executive of Databricks, who is a Berkeley computer scientist on leave from the university, called the IBM move "a great validation for Spark." He had talked to IBM people in recent months and knew they planned to back Spark, but, he added, "the magnitude is impressive."
With its Spark initiative, analysts said, IBM wants to lend a hand to an open-source project, woo developers and strengthen its position in the fast-evolving market for big data software.
By aligning itself with a popular open-source project, IBM, they said, hopes to attract more software engineers to use its big data software tools, too. "It's first and foremost a play for the minds - and hearts - of developers," said Dan Vesset, an analyst at IDC.
IBM is investing in its own future as much as it is contributing to Spark. IBM needs a technology ecosystem, where it is a player and has influence, even if it does not immediately profit from it. IBM mainly makes its living selling applications, often tailored to individual companies, which address challenges in their business like marketing, customer service, supply-chain management and developing new products and services.
"IBM makes its money higher up, building solutions for customers," said Mike Gualtieri, a analyst for Forrester Research. "That's ultimately why this makes sense for IBM."
© 2015 The New York Times News Service
You’ve reached your limit of {{free_limit}} free articles this month.
Subscribe now for unlimited access.
Already subscribed? Log in
Subscribe to read the full story →
Smart Quarterly
₹900
3 Months
₹300/Month
Smart Essential
₹2,700
1 Year
₹225/Month
Super Saver
₹3,900
2 Years
₹162/Month
Renews automatically, cancel anytime
Here’s what’s included in our digital subscription plans
Exclusive premium stories online
Over 30 premium stories daily, handpicked by our editors


Complimentary Access to The New York Times
News, Games, Cooking, Audio, Wirecutter & The Athletic
Business Standard Epaper
Digital replica of our daily newspaper — with options to read, save, and share


Curated Newsletters
Insights on markets, finance, politics, tech, and more delivered to your inbox
Market Analysis & Investment Insights
In-depth market analysis & insights with access to The Smart Investor


Archives
Repository of articles and publications dating back to 1997
Ad-free Reading
Uninterrupted reading experience with no advertisements


Seamless Access Across All Devices
Access Business Standard across devices — mobile, tablet, or PC, via web or app
