Steve Lohr: IBM invests to help open-source big data software - and itself

With the Apache Spark initiative, IBM is investing in its own future too. It needs a technology ecosystem, where it is a player and has influence, even if it does not immediately profit from it

Image
Steve Lohr
Last Updated : Jun 15 2015 | 10:32 PM IST
The IBM "endorsement effect" has often shaped the computer industry over the years. In 1981, when IBM entered the personal computer business, the company decisively pushed an upstart technology into the mainstream.

In 2000, the open-source operating system Linux was viewed askance in many corporations as an oddball creation and even legally risky to use, since the open-source ethos prefers sharing ideas rather than owning them. But IBM endorsed Linux and poured money and people into accelerating the adoption of the open-source operating system.

On Monday, IBM is to announce a broadly similar move in big data software. The company is placing a large investment - contributing software developers, technology and education programmes - behind an open-source project for real-time data analysis, called Apache Spark.

The commitment, according to Robert Picciano, senior vice-president for IBM's data analytics business, will amount to "hundreds of millions of dollars" a year.

In the big data software market, much of the attention and investment so far has been focused on Apache Hadoop and the companies distributing that open-source software, including Cloudera, Hortonworks and MapR. Hadoop, put simply, is the software that makes it possible to handle and analyse vast volumes of all kinds of data. The technology came out of the pure internet companies like Google and Yahoo!, and is increasingly being used by mainstream companies, which want to do similar big data analysis in their businesses.

But if Hadoop opens the door to probing vast volumes of data, Spark promises speed. Real-time processing is essential for many applications, from analysing sensor data streaming from machines to sales transactions on online marketplaces. The Spark technology was developed at the Algorithms, Machines and People Lab at the University of California, Berkeley. A group from the Berkeley lab founded a company two years ago, Databricks, which offers Spark software as a cloud service.

Spark, Picciano said, is crucial technology that will make it possible to "really deliver on the promise of big data." That promise, he said, is to quickly gain insights from data to save time and costs, and to spot opportunities in fields like sales and new product development.

IBM said it will put more than 3,500 of its developers and researchers to work on Spark-related projects. It will contribute machine-learning technology to the open-source project, and embed Spark in IBM's data analysis and commerce software. IBM will also offer Spark as a service on its programming platform for cloud software development, Bluemix. The company will open a Spark technology centre in San Francisco to pursue Spark-based innovations.

And IBM plans to partner with academic and private education organisations including UC Berkeley's AMPLab, DataCamp, Galvanize and Big Data University to teach Spark to as many as 1 million data engineers and data scientists.

Ion Stoica, the chief executive of Databricks, who is a Berkeley computer scientist on leave from the university, called the IBM move "a great validation for Spark." He had talked to IBM people in recent months and knew they planned to back Spark, but, he added, "the magnitude is impressive."

With its Spark initiative, analysts said, IBM wants to lend a hand to an open-source project, woo developers and strengthen its position in the fast-evolving market for big data software.

By aligning itself with a popular open-source project, IBM, they said, hopes to attract more software engineers to use its big data software tools, too. "It's first and foremost a play for the minds - and hearts - of developers," said Dan Vesset, an analyst at IDC.

IBM is investing in its own future as much as it is contributing to Spark. IBM needs a technology ecosystem, where it is a player and has influence, even if it does not immediately profit from it. IBM mainly makes its living selling applications, often tailored to individual companies, which address challenges in their business like marketing, customer service, supply-chain management and developing new products and services.

"IBM makes its money higher up, building solutions for customers," said Mike Gualtieri, a analyst for Forrester Research. "That's ultimately why this makes sense for IBM."

© 2015 The New York Times News Service
*Subscribe to Business Standard digital and get complimentary access to The New York Times

Smart Quarterly

₹900

3 Months

₹300/Month

SAVE 25%

Smart Essential

₹2,700

1 Year

₹225/Month

SAVE 46%
*Complimentary New York Times access for the 2nd year will be given after 12 months

Super Saver

₹3,900

2 Years

₹162/Month

Subscribe

Renews automatically, cancel anytime

Here’s what’s included in our digital subscription plans

Exclusive premium stories online

  • Over 30 premium stories daily, handpicked by our editors

Complimentary Access to The New York Times

  • News, Games, Cooking, Audio, Wirecutter & The Athletic

Business Standard Epaper

  • Digital replica of our daily newspaper — with options to read, save, and share

Curated Newsletters

  • Insights on markets, finance, politics, tech, and more delivered to your inbox

Market Analysis & Investment Insights

  • In-depth market analysis & insights with access to The Smart Investor

Archives

  • Repository of articles and publications dating back to 1997

Ad-free Reading

  • Uninterrupted reading experience with no advertisements

Seamless Access Across All Devices

  • Access Business Standard across devices — mobile, tablet, or PC, via web or app

More From This Section

Disclaimer: These are personal views of the writer. They do not necessarily reflect the opinion of www.business-standard.com or the Business Standard newspaper

First Published: Jun 15 2015 | 9:44 PM IST

Next Story