Google’s machine learning expertise followed a long road to Go

(L to R): Jeff Dean, Senior Fellow, Google; Tom Simonite, MIT Technology Review Structure Data 2016 (Credit: Jeremy Portje Photography)

(L to R): Jeff Dean, Senior Fellow, Google; Tom Simonite, MIT Technology Review Structure Data 2016 (Credit: Jeremy Portje Photography)

Google’s machine-learning research efforts traveled down a long road before it was able to pass Go, according to Jeff Dean of Google.

Dean, senior fellow at Google and architect of much of its machine-learning strategy, took attendees at Structure Data 2016 through a short history of Google’s machine-learning program that recently bested the South Korean world champion in the ancient game of Go, thought to be the most complex human game mastered by computers. Machine learning and neural networks started off as pure research for Google back in 2012, but quickly found their way into products such as speech recognition, Dean said. Other groups started to add machine-learning capabilities to their products as they realized the capabilities of the technology, especially in image-related areas.

With the release of Tensorflow last year, Google allowed others outside of the exclusive machine-learning expert community to start playing around with these technologies at different levels, depending on their familiarity with the technology. These capabilities will also become more widely available through Google’s cloud services over time, he said.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

Building Google: From search engine to AI poster child from Structure on Vimeo.

Bloomberg wants to help non-profits with its unique data skills

(L to R): Gideon Mann, Head of Data Science and CTO Office, Bloomberg; Derrick Harris, Mesosphere Structure Data 2016 (Credit: Jeremy Portje Photography)

(L to R): Gideon Mann, Head of Data Science and CTO Office, Bloomberg; Derrick Harris, Mesosphere Structure Data 2016 (Credit: Jeremy Portje Photography)

Bloomberg is a company woven directly into the fabric of capitalism, providing an information service that Wall Street considers indispensible when moving money around the world. That doesn’t mean it lacks a softer side.

One part of Bloomberg’s mission has always been philanthropy, said Gideon Mann, head of data science and the CTO office at Bloomberg, speaking at Structure Data 2016. The company regularly brings together non-profits and government organizations to see how it can apply its data to the problems faced by those groups, he said.

For example, Bloomberg’s data has helped conservationists make plans based on image-recognition analysis of zebra herds. And after Hurricane Katrina devastated New Orleans, Bloomberg helped the city’s fire department allocate a donation of smoke detectors to the neighborhoods in which that equipment was needed the most.

Check out the rest of our Structure 2016 coverage here, and a video embed of the session follows below:

Using Data Science for Social Good from Structure on Vimeo.

How Thorn is using data science to fight trafficking and child porn

Sugreev Chala, Data Scientist, Thorn Structure Data 2016 (Credit: Jeremy Portje Photography)

Sugreev Chala, Data Scientist, Thorn Structure Data 2016 (Credit: Jeremy Portje Photography)

There are some dark places on the internet, where human trafficking is organized like supply-chain logistics and where child pornography is shared freely within small isolated communities. Uncovering and prosecuting those within these dark places has been a very frustrating game for law enforcement officials somewhat behind the cutting edge of technology, but they are starting to get more help.

As Sugreev Chala of Thorn.org told attendees at Structure Data 2016, overworked investigators are turning to tools like Spotlight, developed by the non-profit organization as a data science tool for linking clues in human trafficking cases and identifying the perpetrators of child pornography. The group, which was founded by actors Ashton Kutcher and Demi Moore five years ago, scrapes escort advertisements on places like Backdoor and allows law enforcement to look for common phone numbers and/or descriptions of victims to find patterns.

It’s also just starting to do image recognition and analysis work on dark web chatrooms where child pornography is shared. That’s a more difficult undertaking (for a lot of reasons) in which Thorn works in conjunction with tech companies (“there’s a lot of bad content on everybody’s platforms” that they want gone, Chala said) and the U.S. Department of Homeland Security.

This was a difficult session to moderate, but I’m glad we were able to shine a spotlight on this important work for the Structure Data audience. Chala said Thorn is actively hiring engineers for its efforts, and is also soliciting help from tech companies on cutting edge image-recognition and analysis technology.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

How algorithms are saving lives and cleaning up the web from Structure on Vimeo.

Cloudera: We’re not just a Hadoop vendor, we’re a big-time user as well

(L to R): Tom Reilly, CEO, Cloudera; Derrick Harris, Mesosphere Structure Data 2016 (Credit: Jeremy Portje Photography)

(L to R): Tom Reilly, CEO, Cloudera; Derrick Harris, Mesosphere Structure Data 2016 (Credit: Jeremy Portje Photography)

Cloudera makes a living selling Hadoop software and services to big data customers, but CEO Tom Reilly thinks that Hadoop’s biggest role at Cloudera might actually come inside the company.

At Structure Data 2016, Reilly walked through the three types of business cases that Cloudera is seeing in 2016, now that the concept of “big data” is well established. One of those use cases is something Cloudera does itself: it uses the platform to ping its servers on behalf of its customers to detect potential problems in clusters and start support calls on their behalf before the customer has even noticed a problem.

“Our most impactful use of Hadoop is reaching out to our customers,” Reilly said. The other major Hadoop uses at the moment? Insurance companies are getting very interested in using data to deliver metered products, and financial companies are using big data as a risk-mitigation and fraud-detection service.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

Are we finally entering the big money stage for big data? from Structure on Vimeo.

AI needs to focus on “human-driven data curation”

(L to R): Heather Zhuang, Investor, Two Sigma Ventures; Stacey Higginbotham, The IoT Podcast Structure Data 2016 (Credit: Jeremy Portje Photography)c

(L to R): Heather Zhuang, Investor, Two Sigma Ventures; Stacey Higginbotham, The IoT Podcast Structure Data 2016 (Credit: Jeremy Portje Photography)

Don’t worry, humans: turns out data scientists still haven’t figured out the human instincts and emotions required before artificial intelligence really takes hold.

Modern AI systems are limited because they are so obviously non-human, said Heather Zhuang of Two Sigma Ventures at Structure Data 2016. If AI is going to be more widely used as a user interface for computing, it needs to understand the subtleties of human communication, and the only way that’s probably going to happen is by training computers to understand our reactions.

Think of this as “human-driven data curation,” Zhuang said. “We need to combine the human mind and data together in a more equal way. It’s about drawing from people and finding what’s most efficient.”

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

AI Needs a Dose of Humanity from Structure on Vimeo.

How McLaren uses data to inform high-stakes product design

Geoff McGrath, CIO, McLaren Applied Technologies Structure Data 2016 (Credit: Jeremy Portje Photography)

Geoff McGrath, CIO, McLaren Applied Technologies Structure Data 2016 (Credit: Jeremy Portje Photography)

Making sure Formula One cars can operate at peak performance while protecting the driver is an expensive but lucrative undertaking, and the lessons learned in using data to make informed decisions about those cars can be applied to a wide range of other industries.

Geoff McGrath, CIO of McLaren Applied Technologies, is one of those people responsible for taking the data analysis techniques generated by McLaren’s Formula One program — in which cars are literally redesigned for different tracks after each competition — and transferring the most relevant information to other industries. At Structure Data 2016, McGrath talked about some of those other areas, such as the design and scheduling of high-speed trains, the construction of energy-efficient datacenters, and even the planning of smart cities.

“The more data we gather, the more momentum we get on that flywheel of innovation,” McGrath said.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

Analyzing data at 250 miles per hour from Structure on Vimeo.

Managing the data produced by the industrial internet is its own challenge

(L to R): Quentin Clark, Chief Business Officer, SAP; Stacey Higginbotham, The IoT Podcast Structure Data 2016 (Credit: Jeremy Portje Photography)

(L to R): Quentin Clark, Chief Business Officer, SAP; Stacey Higginbotham, The IoT Podcast Structure Data 2016 (Credit: Jeremy Portje Photography)

It now seems clear that despite a decent amount of consumer interest in smart homes, a lot of the heavy lifting when it comes to building out the internet of things will happen inside industrial operations. Those companies are better equipped to handle the deluge of data that will result, according to Quentin Clark of SAP, but they still need to prepare their networks and storage systems.

Speaking at Structure Data 2016, Clark brought up the example of an oil rig at sea. Oil companies used to have people visually inspect rigs in very dangerous conditions to ensure everything was working properly, but have turned to cameras. As artificial intelligence systems become more comfortable with streaming video, the data generated by a dozen or so 4k cameras constantly streaming video back to the main system will explode.

“Machine learning is going to help us initially figure out what to be looking at,” Clark said. But industrial internet users must make sure they have the right equipment in place to feed those machine learning algorithims with data, he said.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

What it takes: How to move data in the industrial internet from Structure on Vimeo.

Data will enable the autonomous cars and homes of the future

Ashutosh Saxena, Founder and CEO, Brain of Things Structure Data 2016 (Credit: Jeremy Portje Photography)

Ashutosh Saxena, Founder and CEO, Brain of Things Structure Data 2016 (Credit: Jeremy Portje Photography)

If we’re going to have smart homes and self-driving cars, the people designing those systems are going to need a lot more data.

Ashutosh Saxena, founder and CEO of Brain of Things, founded his company around that idea, that data collection, curation, and analysis will be needed to enable the internet of things and autonomous vehicles. “What do we do in our lives that we want to automate?” he asked at Structure Data 2016. Whatever the answer, data and machine learning algorithms will be required.

He gave the example of a car the company designed that not only took as much data as possible from the car itself (speed, direction, performance) but also trained cameras on the driver to ascertain their attention level and reactions to events on the road. If people are going to embrace these autonomous systems, the makers of those systems are going to have to make it as easy as possible for customers to get up and running.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

How artificial intelligence can automate our lives from Structure on Vimeo.

Paypal’s fraud detection systems powered by unique data and strong AI

(L to R): Hui Wang, Senior Director, Global Risk Sciences, Paypal; Juliette Lewis, TuringAI Structure Data 2016 (Credit: Jeremy Portje Photography)

(L to R): Hui Wang, Senior Director, Global Risk Sciences, Paypal; Juliette Lewis, TuringAI Structure Data 2016 (Credit: Jeremy Portje Photography)

When you’ve been operating a financial transaction platform on the web for over a decade, you’ve pretty much seen it all when it comes to fraud and criminal activity.

“We have the luxury of accumulating a lot of interesting data,” said Hui Wang, senior director of global risk sciences at Paypal, during Structure Data 2016. Chances are, somebody has already stolen your credit card number, so simply identifying that theft occurred isn’t really enough to stop fraud. Instead, Paypal has to employ a unique set of technologies to detect criminal activities on its service.

Those technologies are informed by the rich set of data Wang referenced above married to machine-learning algorithms, which is also overseen by a human-driven fraud detection system Paypal built itself. “AI without strong system support is not possible,” she said.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

Melding man and machine to spot fraud from Structure on Vimeo.

Melding man and machine to spot fraud from Structure on Vimeo.

Intel’s Grobman on encryption: “You can’t legislate math”

(L to R): Steve Grobman, Intel Fellow and CTO, Intel Security; Stacey Higginbotham, The IoT Podcast Structure Data 2016 (Credit: Jeremy Portje Photography)

(L to R): Steve Grobman, Intel Fellow and CTO, Intel Security; Stacey Higginbotham, The IoT Podcast Structure Data 2016 (Credit: Jeremy Portje Photography)

Even though the FBI’s encryption case against Apple has fizzled to a halt, it would have run into a number of technical impracticalities had it gone to trial, according to Intel’s Steve Grobman.

“Encryption is math. You can’t legislate or prevent the use of math any more than you can legislate the use of gravity,” said Grobman, CTO and Intel Fellow for Intel Security, at Structure Data 2016. One of the fundamental questions that left unanswered by the sudden end of the Apple-FBI case was whether or not the federal government thinks tech companies should be allowed to encrypt their products.

Like may in the tech industry, Grobman doesn’t think that “back doors” into hardware or software can ever work as designed. “Intel has a very clear policy that we do not create back doors for any purpose. It’s practically difficult to keep a back door viable only for the party it was intended for.”

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

Can we really secure our data, and do we need to? from Structure on Vimeo.

The time for real-time data analytics has arrived

John Schroeder, CEO, MapR Structure Data 2016 (Credit: Jeremy Portje Photography)

John Schroeder, CEO, MapR Structure Data 2016 (Credit: Jeremy Portje Photography)

Real-time data processing really started to hit its stride last year, according to John Schroeder, CEO of MapR, as companies grew more comfortable with the concept of data analytics.

“It’s all moving to be a combination of an operational and analytical system,” Schroeder said at Structure Data 2016. He cited a project MapR did for the government of India in which it helped construct a next-generation biometric identity system that needed both a fast, responsive database and a powerful analytic tool.

“Companies were searching for their use case” a few years ago, Schroeder said, recognizing that data was important but unsure how to deploy it for maximum return. But larger companies are now started to implement these services in larger production runs as the benefits become clear.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

Convergence: What's Next With Big Data from Structure on Vimeo.

Better security through machine learning?

(L to R): Dharmesh Thakker, General Partner, Battery Ventures; Matt Howard, Managing Partner, Norwest Venture Partners; Steve Bowsher, Managing General Partner, In-Q-Tel Structure Data 2016 (Credit: Jeremy Portje Photography)

(L to R): Dharmesh Thakker, General Partner, Battery Ventures; Matt Howard, Managing Partner, Norwest Venture Partners; Steve Bowsher, Managing General Partner, In-Q-Tel Structure Data 2016 (Credit: Jeremy Portje Photography)

Security concerns are never far from the minds of anyone working in data, and some of the biggest opportunities in data science might involve helping companies secure their networks with machine learning techniques, according to a panel of venture capitalists at Structure Data 2016.

“The security industry has done a great job convincing people to buy security products, and they produce data. People are drowning in that data,” said Steve Bowsher of In-Q-Tel. Machine learning is a natural life preserver in these cases; Bowsher recalled a recent story about Mastercard stopping three attacks based on information developed by machine learning.

“A lot of hackers and attacks, they don’t follow a rule-based pattern,” said Dharmesh Thakker of Battery Ventures. And many companies are still dependent on humans to detect the patterns behind these attacks, according to Matt Howard of Norwest Venture Partners, who recently spoke with a financial institution that employs “200 people just trying to triage what they go after.”

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

Security Analytics, Startups and Investing from Structure on Vimeo.

Dragging health care into the big data era is still a challenge

(L to R): Chris Hogg, COO, Propeller Health; Julie Black, CTO, Evidation Health; Jonathan Hirsch, Founder and President, Syapse; Jeremy Howard, Founder and CEO, Enlitic Structure Data 2016 (Credit: Jeremy Portje Photography)

(L to R): Chris Hogg, COO, Propeller Health; Julie Black, CTO, Evidation Health; Jonathan Hirsch, Founder and President, Syapse; Jeremy Howard, Founder and CEO, Enlitic Structure Data 2016 (Credit: Jeremy Portje Photography)

Valley has been trying to impart a little tech wisdom into the health care system for decades. But according to a panel of experts at Structure Data 2016, there is still a ton of work to be done.

Despite the widespread march of modern technology into most industries, in health care, most providers are “where banking was before using spreadsheets,” said Jeremy Howard, founder and CEO of Enlitic. Jonathan Hirsch, founder and president of Syapse, took it a step further: “The American health care industry is keeping the fax machine industry alive.” Much of our health care data is on actual paper, and if it’s digitized, it’s in things like PDFs or electronic health care records that are basically glorified PDFs, he said.

And even assuming you can get that data from the various silos in which it is storied, putting it to use requires a lot of elbow grease, said Chris Hogg, COO of Propeller Health. “It’s not a big data problem, it’s a messy, intertwined data problem.”

So how do you get quality data scientists to work on these issues, now that you’ve outlined how messed up everything is? By appealing to their idealism, said Julie Black, CTO of Evidation Health. “Some of these problems may not be the sexiest technical challenges, but they are really meaningful to millions of people’s daily lives,” she said.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

Why is it such a nightmare to access data in healthcare? from Structure on Vimeo.

At Microsoft Research, you live by the AI bot, you die by the AI bot

(L to R): Peter Lee, Corporate VP, Microsoft Research; Jack Clark, Bloomberg Structure Data 2016 (Credit: Jeremy Portje Photography)

(L to R): Peter Lee, Corporate VP, Microsoft Research; Jack Clark, Bloomberg Structure Data 2016 (Credit: Jeremy Portje Photography)

Several weeks before Microsoft’s English-language AI-bot experiment blew up in its face, Peter Lee acknowledged that life at Microsoft Research under CEO Satya Nadella is more of a roller coaster than in the past.

“For the most part, we never went directly to consumers,” Lee said at Structure Data 2016 earlier this month. “The biggest change now is that he’s really pushed us and given us permission to take things all the way.” Lee discussed this new era at Microsoft Research in a preview of his talk at Structure Data, but the downside of that highwire act presented itself last week.

Tay, a Twitter bot meant to simulate a chatty teenager, turned into a racist demagogue after a group of Twitter users fed it some suggestions and Microsoft was forced to shut down the experiment. But back a few weeks ago, Lee discussed a Chinese-language version of the bot that has 40 million followers, 15 million of which converse with her several times a day.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

Taking AI from the lab to millions of consumers from Structure on Vimeo.

Want to build an enterprise startup? Focus on the app layer, and sell to business users

(L to R): Mike Driscoll, CEO, Metamarkets; Ann Johnson, CEO and Co-Founder, Interana; Derrick Harris, Mesosphere Structure Data 2016 (Credit: Jeremy Portje Photography)

(L to R): Mike Driscoll, CEO, Metamarkets; Ann Johnson, CEO and Co-Founder, Interana; Derrick Harris, Mesosphere Structure Data 2016 (Credit: Jeremy Portje Photography)

There have been two fundamental changes in enterprise software over the past decade, according to panelists at Structure Data 2016: the rise of open source to dominate every part of software except for the user interface, and the shift in buying from CIOs to department heads.

“Open source will dominate everything in computing except for the application layer,” said Mike Driscoll, CEO of Metamarkets. Ann Johnson, CEO of Interana, agreed, saying “I don’t see the incentives aligning for a great easy-to-use product from an open-source (project).” Open-source projects have been enormously important in creating building blocks for software development, allowing private companies to make their way adding proprietary software and/or services on top.

And the target market for those products has changed, Driscoll said. “That’s the future of enterprise software, selling to a business user” like a marketing executive or a design head, he said.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

Redefining analytics for the 21st century business from Structure on Vimeo.

IBM: Systems like Watson will save us from the data deluge

(L to R): Rob High, IBM Fellow, VP, CTO IBM Watson; Signe Brewster, Freelance Writer Structure Data 2016 (Credit: Jeremy Portje Photography)

(L to R): Rob High, IBM Fellow, VP, CTO IBM Watson; Signe Brewster, Freelance Writer Structure Data 2016 (Credit: Jeremy Portje Photography)

Cognitive computing systems like IBM’s Watson aren’t being built just for fun and marketing: they are vital tools that we humans will need to understand and process the torrents of data that will be thrown our way over the next few years.

That’s the view of Rob High of IBM, who is an IBM Fellow and the CTO of its Watson project, speaking at Structure Data 2016. Estimates vary, but there was somewhere around 2.5 exabytes (2.5 billion gigabytes) of data being produced every day in 2015, he said. By 2020, that number is expected to grow to 44 zetabytes (44 trillion gigabytes) of data produced every single day.

“We as human beings can’t keep up with it all,” High said. “Cognitive computing is about trying to tap into that vast amount of information. All those things we do as humans, we can do better with cognitive systems.”

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

Way past Jeopardy: How Watson is tackling industry after industry from Structure on Vimeo.

If you thought data was big before, wait for the internet of things

Jerome Dubreuil, Senior Director, Samsung Strategy and Innovation Center Structure Data 2016 (Credit: Jeremy Portje Photography)

Jerome Dubreuil, Senior Director, Samsung Strategy and Innovation Center Structure Data 2016 (Credit: Jeremy Portje Photography)

The data generated by the long-awaited internet of things revolution has the potential to make the revolution created by connecting PCs and mobile devices together look quaint, according to Jerome Dubreuil of Samsung.

“Everything around us is going to be sending data, and we’re going to be able to send back data to those things,” he said at Structure Data 2016. “I think this is going to make the world wide web look like a minor event when it is really here.”

Obviously, a lot is going to have to happen before we’re ready for that world, such as the pending squabbles over the rights to IoT data as it moves across the network, and how this data is organized and stored, and how it is used. Still, as Hortonworks CEO Rob Bearden echoed earlier in the day, the IoT explosion could make the era of “big data” look rather small indeed.

Check out the rest of our Structure 2016 coverage here, and a video embed of the session follows below:

If you don't know data, you don't know the Internet of Things from Structure on Vimeo.

Can machine learning kill surge pricing at Uber?

Jeff Schneider, Engineering Lead, Uber ATC Structure Data 2016 (Credit: Jeremy Portje Photography)

Jeff Schneider, Engineering Lead, Uber ATC Structure Data 2016 (Credit: Jeremy Portje Photography)

Uber’s rise to prominence was built around data, but its future plans are banking heavily on advances in machine learning to keep things moving, according to Jeff Schneider, engineering lead for Uber’s Advanced Technology Center in Pittsburgh.

The rise of big data thinking allowed companies like Uber and dozens of others to thrive, because it was enough to just get the data out there and let people make more informed decisions, Schneider said at Structure Data 2016. But now that data is widely available for most businesses, the new challenge is developing sophisticated algorithms to make sense of that data and extract patterns.

Machine learning might even allow Uber to do away with its most hated introduction to the lexicon: “surge pricing.” “Nobody likes surge pricing,” Schneider said. “That’s where machine learning comes in, now we can look at all this data and use it to make decisions, to find those Tuesday nights when it’s not raining and somehow know that demand is coming, that’s machine learning.”

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

How machine learning helps Uber keep the disruptions coming from Structure on Vimeo.

Facebook hopes to break down language barriers with machine learning and huge data sets

(L to R): Alan Packer, Director of Engineering, Language Tech, Facebook; Stacey Higginbotham, IoT Podcast Structure Data 2016 (Credit: Jeremy Portje Photography)

(L to R): Alan Packer, Director of Engineering, Language Tech, Facebook; Stacey Higginbotham, IoT Podcast Structure Data 2016 (Credit: Jeremy Portje Photography)

Speech recognition is one thing, but language understanding is one of the thornier machine-learning problems of our time. While we spent much of Structure Data 2016 talking about the need for smarter data over bigger data, Facebook thinks that when it comes to training machines to understand language, massive data sets are essential.

“There are 2 trillion posts and comments on Facebook,” said Alan Packer, director of engineering for language technology at Facebook. “It’s the data that makes it different,” he said, when it comes to explaining how his team works on training Facebook’s computers to understand a poster’s intent in one language and develop enough confidence in that intent to translate it into another language.

The industry has come a long way with speech recognition, but Facebook doesn’t feel like it can depend on third-party technology or open-source technology for this tricky task, Packer said. Most current speech recognition and understanding technology was trained on publicly available formal documents, like government records or appliance manuals, but “the language on Facebook really is different; it’s not formal, it’s human to human communication (and) it’s regional.”

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

Making the most of language on the world's largest social network from Structure on Vimeo.

How the GDELT Project is trying to harness data to understand the world

Kalev Leetaru, Founder, GDELT Project Structure Data 2016 (Credit: Jeremy Portje Photography)

Kalev Leetaru, Founder, GDELT Project Structure Data 2016 (Credit: Jeremy Portje Photography)

If you were at Structure Data 2016, there’s no way you could have missed Kalev Leetaru, founder of the GDELT project. Leetaru spent two days at the conference passionately explaining his project and debating the finer points of data science with anyone and everyone, and it was truly amazing to watch his mind work at warp speed.

Of course, he also delivered a talk about the GDELT Project to conference attendees, which was one of the highlights of the week. Leetaru is attempting to collect and analyze data from media sources around the world to assess the conditions of the people and politics of those countries from a single dashboard, in hopes of gaining a better understanding of the human condition.

He previewed the talk in a guest post for Structure here, but the full session is worth your time.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

One massive database, all the world at your fingertips from Structure on Vimeo.

The 21st century Farmer’s Almanac is being written with hard data

(L to R): Mark Young, CTO, Climate Corporation; Steve Lohr, The New York Times Structure Data 2016 (Credit: Jeremy Portje Photography)

(L to R): Mark Young, CTO, Climate Corporation; Steve Lohr, The New York Times Structure Data 2016 (Credit: Jeremy Portje Photography)

Turns out farmers are a lot like big-company CIOs when it comes to investing in emerging technologies, according to Mark Young of the Climate Corporation.

“They tend to be risk-averse,” Young said at Structure Data 2016, describing how many potential farmers view the Climate Corporation’s products and services designed to help farmers make better decisions with aggregated data. He should know: his grew up on a farm and knows all too well the demanding work, massive costs, and sheer luck that goes into whether or not a growing season is a boom or a best.

But the idea behind these products is to give farmers a digital version of the tattered almanacs many have been carrying for years: previous weather conditions, history of yields on particular pieces of land, and many of the other data points farmers have assembled on their own for thousands of years. “We can get down to the data for every single seed planted on every single field,” he said.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

Quantifying crops: The future of farming from Structure on Vimeo.

Forget machine learning: in the music biz, it’s “machine listening”

(L to R): Ty Roberts, Co-Founder and Chief Strategy Officer, Gracenote; Chris Martin, CTO, Pandora; Janko Roettgers, Variety Structure Data 2016 (Credit: Jeremy Portje Photography)

(L to R): Ty Roberts, Co-Founder and Chief Strategy Officer, Gracenote; Chris Martin, CTO, Pandora; Janko Roettgers, Variety Structure Data 2016 (Credit: Jeremy Portje Photography)

There are few things as personal as musical taste, which makes curating music for online listening quite a challenge. Two pioneering companies in this space have been working on the problem for quite some time, but there’s a long way to go.

After a full day of Structure Data 2016 panelists talking about machine learning, the discussion between Ty Roberts of Gracenote and Chris Martin of Pandora focused on “machine listening,” or training computers to assemble playlists based on some cues from listeners and large data sets. Pandora is sitting on a ton of data around listener preferences, which when married to the Music Genome Project allows it to customize playlists in ways other companies can’t match, Martin, the company’s CTO, argued.

Roberts agreed, but showed perhaps where the future of music personalization is headed by outlining a typical family debate over what to play in the car during road trips. If everyone in the car has different musical tastes, can machine learning find common ground? (He suggested The Eagles might be that common ground, a dangerous view that we at Structure do not endorse.)

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

Is data remaking the music business? from Structure on Vimeo.

How Yahoo is using machine learning to deliver the news that’s fit to click

Suju Rahan, Director of Research, Yahoo Structure Data 2016 (Credit: Jeremy Portje Photography)

Suju Rahan, Director of Research, Yahoo Structure Data 2016 (Credit: Jeremy Portje Photography)

Can machine learning solve the clickbait scourge?

Yahoo’s Suju Rahan, director of research and the force behind Yahoo’s personalization algorithms in its news feed, is working on algorithms that move beyond the “click” as the basic unit of attention to serve readers with content based on how long they spend reading certain types of articles. This is especially important for mobile news feeds, because people on mobile devices tend to click on fewer stories.

Of course, there are problems with this approach as well. Sometimes “people tend to read news not because they are personally interested in it, but because it’s popular” and they want to stay current on pop culture phenomenons even if they don’t really care about the topic, she said. Refining these personalization algorithms could one day move us past This One Weird Trick People Use To Get You To Click On This Story, but I’m not holding my breath.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

Personalizing the News Feed: A Large-Scale Recommendation Problem from Structure on Vimeo.

Why 40 petabytes is probably enough data for Netflix, for now

(L to R): Eva Tse, director of big data platform, Netflix; David Linthicum, SVP, Cloud Technology Partners Structure Data 2016 (Credit: Jeremy Portje Photography)

(L to R): Eva Tse, director of big data platform, Netflix; David Linthicum, SVP, Cloud Technology Partners Structure Data 2016 (Credit: Jeremy Portje Photography)

Count Netflix among the companies featured at Structure Data 2016 who are starting to eschew “big data” now that they are operating at a huge scale, and starting to focus more on the quality of that data.

“We’re not trying to add more petabytes and brag about it, we want to get the most out of it,” said Eva Tse, director of big data platform at Netflix, who might need to update her title. Don’t misunderstand her: Netflix is managing close to 40 petabytes of data in order to show you streaming recommendations, so it’s not like the company has rejected “big data” entirely, but more and more the focus is on getting the most out of that data, she said.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

How Netflix keeps up with big data demands from Structure on Vimeo.

How tech companies and universities formed a once-unlikely fraternity around open source

(L to R): Raghu Ramakrishnan, Technical Fellow, Microsoft; Michael Franklin, Director, AMPLab, University of California, Berkeley; Andrew Brust, Datameer Structure Data 2016 Credit: Jeremy Portje Photography

(L to R): Raghu Ramakrishnan, Technical Fellow, Microsoft; Michael Franklin, Director, AMPLab, University of California, Berkeley; Andrew Brust, Datameer Structure Data 2016 Credit: Jeremy Portje Photography

Big tech corporations and university researchers used to be on opposite sides of an intellectual divide about software development. That has changed in a big way with the rise of open-source software inside the enterprise, and as a result, those two groups are realizing they have more in common than they thought.

Big Tech and Scrappy University were represented at Structure Data 2016 by Raghu Ramakrishnan of Microsoft and Michael Franklin of the University of California, Berkeley. Franklin, who’s AMPLab was responsible for some ground-breaking open-source projects in the data world like Apache Spark, said that passionate students can draw a community of developers to a project in ways that big corporations simply can’t match. However, those projects often lack the datasets generated by big corporations, which provides a natural incentive for them to work together.

“We’re in a world, especially in big data, where a lot of the marketplace is using these open-source technologies.” Ramakrishnan said, of Microsoft’s relatively recent embrace of the open-source mindset. Microsoft still invests heavily in proprietary software development, obviously, but it recognizes that its customers want different things these days, and it’s willing to meet them halfway, he said.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

Shifting the center of gravity in tech innovation from Structure on Vimeo.

Data science is a recruiting tool for diverse candidates at Airbnb

(L to R): Elena Grewal, Data Science Manager, Airbnb; Megan Rose Dickey, Techcrunch Structure Data 2016 Credit: Jeremy Portje Photography

(L to R): Elena Grewal, Data Science Manager, Airbnb; Megan Rose Dickey, Techcrunch Structure Data 2016 Credit: Jeremy Portje Photography

Airbnb has used data science in many ways while building out its product and service, but it realized last year it could use the discipline toward a more noble goal: making sure it was building a diverse workforce.

Elena Grewal, data science manager at Airbnb, told Structure Data 2016 attendees about the company’s recruiting project, which used the principles of data science to double the ratio of women at the company from 15 percent to 30 percent. A data scientist now works full-time with the recruiting team, and the company is looking for ways to use data to increase the diversity of racial minorities and age groups in 2016, she said.

How will Airbnb know if this has worked over time? Grewal said the company is evaluating employee satisfaction scores over time to see if employees feel like they are included and part of the team. It has also tweaked how it manages a recruiting on-site visit by candidates to put a diverse group of employees in the room when candidates make presentations.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

How Airbnb’s Data Science Team Doubled The Ratio Of Female Employees Last Year from Structure on Vimeo.

Hortonworks CEO Rob Bearden thinks the data explosion has only just begun

(L to R): Rob Bearden, CEO, Hortonworks; Derrick Harris, Mesosphere Structure Data 2016 Credit: Jeremy Portje Photography

(L to R): Rob Bearden, CEO, Hortonworks; Derrick Harris, Mesosphere Structure Data 2016 Credit: Jeremy Portje Photography

Hortonworks’ first year as a public company was likely more difficult than the company anticipated. But CEO Rob Bearden doesn’t seem that worried, given the explosion in data generated and used by corporate customers as we approach the end of the decade.

“The volume of data in the enterprise used to double every ten years. Today it doubles every year,” Bearden said at Structure Data 2016. “There is fundamental value (in data analysis tools like Hortonworks sells) that’s enabling the enterprise to transform how they manage data.”

The company is hoping that demand for data tools will revive its flagging stock, which is down significantly since it first went public in December 2014. Without addressing Hortonworks’ financial situation directly, Bearden implied that investors are missing out on this coming explosion of data. “Before the next election, there will be 10x more data across the enterprise,” he said, and that data should generate a lot of value for Hortonworks’ customers — and the company itself.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

The future of data is bigger than you think from Structure on Vimeo.

Spark and Kafka helped make “big data” big, but implementing at scale is the real challenge

(L to R): Ion Stoica, CEO, Databricks; Neha Narkhede, co-founder and CTO, Confluent; Andrew Brust, Senior Director, Market Strategy and Intelligence, Datameer Structure Data 2016 Credit: Jeremy Portje Photography

(L to R): Ion Stoica, CEO, Databricks; Neha Narkhede, co-founder and CTO, Confluent; Andrew Brust, Senior Director, Market Strategy and Intelligence, Datameer Structure Data 2016 Credit: Jeremy Portje Photography

Tech companies have learned how to build incredible services using open-source projects as a common foundation, but the real test comes when you put those systems through the paces.

Neha Narkhede, co-founder and CTO of Confluent, recalled her days at LinkedIn at Structure Data 2016 implementing what would become Apache Kafka. “Building distributed systems is really the easy part, but really operationalizing it, putting it in production for real use cases, is what takes it to prime time,” she said.

Still, it’s very powerful that startups and big companies alike can draw on the open-source world to build foundational products and contribute useful changes back to the community, said Ion Stoica, CEO of Databricks, which was built around the Apache Spark project originally developed at the AMPLab at the University of California, Berkeley.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

Building the data framework for tomorrow's apps from Structure on Vimeo.

At GE, more data is literally more power

(L to R): Bill Ruh, CEO, GE Digital, CDO, GE; Steve Lohr, The New York Times Structure Data 2016 Credit: Jeremy Portje Photography

(L to R): Bill Ruh, CEO, GE Digital, CDO, GE; Steve Lohr, The New York Times Structure Data 2016 Credit: Jeremy Portje Photography

GE helped some of its power-plant operators earn significantly more profit by using data to determine the optimal curvature of its wind turbine blades, according to Structure Data 2016 speaker Bill Ruh.

“You get more efficiency out of data and analytics than you can even out of some new material these days. And that’s exciting,” said Ruh, CEO of GE Digital and Chief Digital Officer for GE proper. As with a lot of Structure Data speakers, this mentality only arrived at GE in the last few years, but it’s a top priority for the company these days.

Check out the rest of our Structure 2016 coverage here, and a video embed of the session follows below:

At industrial scale, data is only the beginning from Structure on Vimeo.

How Twitter realized the power of its data

Chris Moody, VP of Data Strategy, Twitter Structure Data 2016 Credit: Jeremy Portje Photography

Chris Moody, VP of Data Strategy, Twitter Structure Data 2016 Credit: Jeremy Portje Photography

Twitter didn’t always understand the value of the data generated by its service, but that mentality has changed in a big way.

“We do believe data is one of our strategic differentiators,” said Chris Moody, vice president of data strategy at Twitter and the former CEO of Gnip, which Twitter acquired in April 2014. When Twitter first started, its engineers were mostly focused on trying to keep the site up and running, but once it had managed to stabilize the service Twitter executives realized that they were sitting on a treasure trove of publicly available historical data, and moved to capitalize on that data, he said at Structure Data 2016.

Twitter isn’t making a ton of money on data licensing, but that work helps improve its advertising business and the real-time nature of the data is a bit of a competitive advantage over rival Facebook, Moody implied without naming the social media juggernaut directly. “The big thing about Twitter is our data is public,” he said.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

How smart analytics turn tweets into dollars from Structure on Vimeo.

Why data analysis tools need to get smarter and more accessible

(L to R): Chris Neumann, DataHero; Dan Wagner, Civis Analytics; Tom Krazit, Structure Events Structure Data 2016 Credit: Jeremy Portje Photography

(L to R): Chris Neumann, DataHero; Dan Wagner, Civis Analytics; Tom Krazit, Structure Events Structure Data 2016 Credit: Jeremy Portje Photography

It seems a little weird to think about at a conference organized around the promise of “big” data, but if the data-driven revolution is ever going to expand beyond its early adopters, its tools are going to have to get easier to use and its products are going to have to get smarter.

Chris Neumann, who founded Datahero before selling it to Cloudability earlier this year, noted Thursday at Structure Data 2016 that many of their customers interested in applying data to their businesses recognize the importance of that step but don’t know the right questions to ask. And businesses that know the right questions to ask are often frustrated by how difficult it can be to drag data out of silos and get it provided to them in real time, said Dan Wagner, founder and CEO of Civis Analytics.

Check out the rest of the our Structure Data 2016 coverage here, and a video embed of the session follows below:

Taking data science from Market Street to Main Street from Structure on Vimeo.

How the Orlando Magic are using data to keep fans happy

(L to R): John Paul, CEO and founder, VenueNext;  Alex Martins, CEO, Orlando Magic; Tom Krazit, Structure Events Credit: Jeremy Portje Photography

(L to R): John Paul, CEO and founder, VenueNext; Alex Martins, CEO, Orlando Magic; Tom Krazit, Structure Events Credit: Jeremy Portje Photography

The concept of “Moneyball” — that smart analytics could allow sports teams to build winners without spending a ton of money — changed the way players and coaches put together their strategies. Their colleagues on the business side have been slower to embrace the data revolution, according to Alex Martins, CEO of basketball’s Orlando Magic, but they’re making up for it now.

The Magic have partnered with a Bay Area company called VenueNext, which built a mobile app that allows Magic season ticket holders a number of perks, including seat upgrades, fast lanes for beer lines, and even parking. The data generated by this app (which is used by a significant percent of the team’s season ticket holders) allows the team to offer discounts to frequent users and start planning its arena around the in-seat activity, Martins said at Structure Data 2016.

The Magic were using a lot of data before hooking up with Venuenext, with a predictive analytics system already in place that they could offer to Venuenext. “I’d never heard a team say that before,” said John Paul, CEO and founder of Venuenext, which also works with the San Francisco 49ers.

Check out the rest of our Structure 2016 coverage here, and a video embed of the session follows below:

The ascent of the sentient robots may have been oversold

(L to R): Andrew Ng, Chief Scientist, Baidu; Derrick Harris, Mesosphere

(L to R): Andrew Ng, Chief Scientist, Baidu; Derrick Harris, Mesosphere

A common theme Wednesday morning at Structure Data 2016 was the automation of many tasks that will probably force more than a few tech workers out of their current gigs. But don’t worry too much about our robotic overlords.

“Worrying about killer robots today is like worrying about overpopulation on Mars,” said Baidu’s Andrew Ng, as reported by Stacey Higginbotham of Fortune. But those robots are definitely going to automate tasks that require human oversight today, and that will have a profound impact on our economy, he said.

Artificial intelligence and deep learning will have a profound impact on a lot of other areas, such as web search (Google is also shifting to this model) and self-driving cars. In a claim I hope we can revisit in a few years, Ng predicted that self-driving cars will be commercialized in three years and mass produced in five, according to Forbes.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the session follows below:

Fueling the deep learning rocket with huge data from Structure on Vimeo.

Building data infrastructure at startup speed? Stand on the shoulders of those before you

(L to R): June Andrews, Data Scientist, Pinterest; Josh Wills, Director of Data Engineering, Slack; Tom Krazit, Structure Events

(L to R): June Andrews, Data Scientist, Pinterest; Josh Wills, Director of Data Engineering, Slack; Tom Krazit, Structure Events

Building a data practice inside a fast-growing startup can be a little problematic, but those are the kinds of problems that companies definitely want to have, according to data scientists from Pinterest and Slack, speaking at Structure Data 2016.

For one thing, “there’s so much stuff that can get you off the ground,” said June Andrews, data scientist at Pinterest. For one thing, there’s Amazon Web Services — “flip a light, and hey, you’re bigger!” as Andrews put it — but also a wide variety of open-source tools such as Apache Kafka that can help a young company get up and running as well as the big kids.

“Kafka is like oxygen,” said Josh Wills, director of data engineering at Slack. However, while it might be a good decision to adopt something as widely used as Kafka, younger companies have to be careful about the decisions they make, he said: decisions about infrastructure or tools made at warp speed because the company is moving that fast can sometimes come back to haunt you later down the road, and then you have to find a workaround or evaluate the feasibility of ripping it out and starting over.

Check out the rest of our Structure Data 2016 coverage here, and a video embed of the sessions follows below.

Doing unicorn-scale data science from Structure on Vimeo.

Structure Data 2016 event coverage

Credit: Jeremy Portje Photography

Credit: Jeremy Portje Photography

Believing in the power of data is no longer an aspirational goal for a 21st-century business: it’s a vital goal. We’re proud once again to host what we consider to be the best event covering big data, artificial intelligence, machine learning and everything else that goes into those disciplines at Structure Data 2016.

Here you’ll find our coverage of the event as it unfolded: not necessarily in real time, but as we get around to summarizing the thoughts and contributions of our awesome lineup of speakers. If you need immediate gratification, you can find a livestream of the event here, and there’s still time to join us at the UCSF Mission Bay conference center if you want to hear our lineup of speakers in person.

Day One:
How Twitter realized the power of its data
At GE, more data is literally more power
Spark and Kafka helped make “big data” big, but implementing at scale is the real challenge
Hortonworks CEO Rob Bearden thinks the data explosion has only just begun
Building data infrastructure at startup speed? Stand on the shoulders of those before you
The ascent of the sentient robots may have been oversold
Data science is a recruiting tool for diverse candidates at Airbnb
How tech companies and universities formed a once-unlikely fraternity around open source
Why 40 petabytes is probably enough data for Netflix, for now
How Yahoo is using machine learning to deliver the news that’s fit to click
Forget machine learning: in the music biz, it’s “machine listening”
How the Orlando Magic are using data to keep fans happy
The 21st century Farmer’s Almanac is being written with hard data
How the GDELT Project is trying to harness data to understand the world
Facebook hopes to break down language barriers with machine learning and huge data sets
Can machine learning kill surge pricing at Uber?
If you thought data was big before, wait for the internet of things

Day Two:
IBM: Systems like Watson will save us from the data deluge
Why data analysis tools need to get smarter and more accessible
Want to build an enterprise startup? Focus on the app layer, and sell to business users
At Microsoft Research, you live by the AI bot, you die by the AI bot
Dragging health care into the big data era is still a challenge
Better security through machine learning?
The time for real-time data analytics has arrived
Intel’s Grobman on encryption: “You can’t legislate math”
Paypal’s fraud detection systems powered by unique data and strong AI
Data will enable the autonomous cars and homes of the future
Managing the data produced by the industrial internet is its own challenge
How McLaren uses data to inform high-stakes product design
AI needs to focus on “human-driven data curation”
Cloudera: We’re not just a Hadoop vendor, we’re a big-time user as well
How Thorn is using data science to fight trafficking and child porn
Bloomberg wants to help non-profits with its unique data skills
Google’s machine learning expertise followed a long road to Go

The Structure Show #6: Countdown to Structure Data

Structure Data starts tomorrow, and we kicked off Data Week with a special guest on the Structure Show: Jack Clark of Bloomberg, who will be interviewing Microsoft’s Peter Lee on stage on Thursday. Barb Darrow of Fortune also joined us to talk about the future of Slack amid rumors of buyout interest from some heavy hitters in enterprise software, and we went all the way back to last week and the RSA Conference to talk about the future of cryptography in a world where the U.S. government is skeptical about encryption.

Where are all the cyber data scientists?

Editor’s note: This guest post was written by Steve Bowsher, Sri Chandrasekar, and Charlie Greenbacker of In-Q-Tel. I’ll be moderating a panel with Bowsher and two other venture capitalists next Wednesday at Structure Data on security startups and investment opportunities that look to use data analysis techniques and practices to fight cyber crime. More information is available here, and tickets are still available (for now) here.

Over the past several years, data science has brought disruptive innovation to everything from advertising to agriculture. Yet one domain in particular seems to be surprisingly insulated from this trend: cybersecurity. Why aren’t more data scientists working on cyber-related issues, and what can we do to get things kickstarted?

The right stuff

Data scientists come from a wide range of backgrounds. Many begin their careers as traditional software engineers, statisticians, physicists, economists, or business analysts before growing into a data science role. Often, these individuals encountered a challenging problem in their established areas of expertise that lent itself to a quantitative solution, inspiring them to develop new skills in machine learning and big data analytics.

Read more

How Microsoft and Peter Lee are discovering the future of deep learning

It’s perhaps the most exhilarating and terrifying time in Peter Lee’s career at Microsoft Research.

His company has gone through some of the most profound changes in its storied history since Lee joined a little more than five years ago at the behest of Craig Mundle, after a career in academia and government research. After taking control of the top job at Microsoft in 2014, CEO Satya Nadella has made some sweeping changes to Microsoft’s culture; embracing a cross-platform strategy for its apps, getting on the open-source software bandwagon to a degree that would astonish a time traveler from 1999, and most recently, splitting Microsoft Research into two groups, one that focuses on what Lee calls “the grand areas in computer science” (networking and cryptography, to name two) and one that he runs called New Experiences and Technologies that’s expected to live up to specific timelines and business needs.

Lee, who will be speaking at Structure Data 2016 next week, said in an interview that he’s mostly focused these days on developing a management philosophy for research organizations. This is part of what makes his job the combination of fear and opportunity referenced above: he’s been given free reign to put Microsoft researchers on big projects using the company’s resources, but he has to deliver in much more concrete ways than traditional tech company research organizations have been asked to do.

Peter Lee, Microsoft Research

Peter Lee, Microsoft Research

“That’s the ride that we’ve been on. Not anything in the work or people has really changed, but there’s now this mandate, a combination of permissions and expectations to deploy things,” Lee said.

Read more

Just announced: Elena Grewal of Airbnb to talk data-driven diversity at Structure Data

Structure Data is just a week away, but we were able to find some last-minute room for something we’re very excited to present. Next Wednesday, Elena Grewal of Airbnb and Megan Rose Dickey of Techcrunch will join us at Structure Data 2016 to talk about Airbnb’s efforts to use data science to address a pressing issue in Silicon Valley: diversity.

Last year Airbnb realized it had a sadly typical problem among fast-growing tech startups: women were dramatically underrepresented at the company, making up just 15 percent of the company. As Dickey details in this article for Techcrunch last month, Grewal and Airbnb’s data science team set out to rectify that situation by pouring over the data generated by the company’s recruiting software. (Grewal went into more detail here on Medium.) They managed to increase the percentage of women at the company to 30 percent by the end of 2015, which still isn’t great but is at least heading in the right direction.

Grewal will share more details with us on how exactly Airbnb identified the roots of its hiring problem using data and made recommendations to address that issue. This year, the company is focusing on improving its racial diversity, and hopefully we’ll see some results from that effort as well.

Structure Data is happening next week, March 9th and 10th, at the UCSF Mission Bay conference center in San Francisco. We’ve assembled a great lineup of data scientists, deep-learning gurus, and businesspeople making their way in the data-driven world, and it should be a great conference. You can find more information here, and tickets are still available here.

Meet Michael Franklin and UC Berkeley’s AMPLab, where basic tech research still reigns

While walking across the campus of the University of California, Berkeley on an unseasonable warm and sunny Friday afternoon with Berkeley AMPLab director Michael Franklin, I was reminded of why so many Silicon Valley companies have tried to replicate this kind of setting for their own offices. Serendipitous meetings happen easily; we bumped into Structure 2015 speaker Eric Brewer of Google and UC Berkeley five minutes after leaving Franklin’s offices, and the walk-and-talk pace of university campus life provokes the kinds of conversations that stimulate thinking.

It’s clearly an environment that gives Franklin a charge. In an era in which billion-dollar tech companies are minted seemingly overnight, devoting one’s self to data science research at a university seems almost quaint (although he’s is no stranger to the entrepreneurial side of the Valley, having founded and sold Truviso to Cisco in 2012). But Franklin’s AMPLab has made quite a contribution to the field of open-source software and data analytics, donating both Spark and Mesos — two fundamental projects in modern data analysis and cloud computing — to the Apache Software Foundation.

structuredataconfmichaelfranklin

If you’re looking for a reason to come to Structure Data next week in San Francisco, Franklin’s panel discussion with Raghu Ramakrishnan of Microsoft on March 9th should be fascinating. They plan to discuss an interesting trend in enterprise computing over the last few years: the rise of fundamental technologies developed by universities or non-profits that are then commercialized by for-profit companies building on the work of those open-source projects.

Read more

Welcome to the age of applications in big data

Databases, file systems and streaming engines are sooooooo 2010. If a company is doing something interesting in the data space today, the chances are it doesn’t even consider itself a data company. And that’s a great thing.

Whatever you might think of the various “big data” software companies that have emerged over the past several years, there’s no denying that their technologies have been successful. Under the covers of our favorite web and mobile apps, and of our connected devices and automobiles, is an increasingly standard set of infrastructure components. Often, they include HDFS, Spark and Kafka, as well as some combination of MySQL, Postgres, Cassandra, MongoDB or HBase.

There are obviously other data-storage and data-processing technologies with plenty of merit, but they’re fighting an uphill battle to get attention. With the exception of a few hot new companies and projects, such as the memory-based Alluxio (formerly Tachyon) file system, there doesn’t appear to be a huge demand for newer, better data infrastructure.

While this might be bad news to folks trying to get new big data projects or startups off the ground, it’s great news for folks actually charged with building new types of data-driven applications. These people have heard for years how the future is in intelligent, personalized and real-time applications—and now, thanks to Hadoop, Spark, Kafka et al, they can actually build these applications.

That’s why Structure Data 2016 is such an important event. We’ll hear about the future of data infrastructure from the folks who created some of its star technologies (including Spark, Kafka and Alluxio) and who run some its biggest companies (including Cloudera, Hortonworks and MapR). We’ll also hear from people using these technologies to build amazing products and platforms, including speakers from Slack, Pinterest, Samsung, Mattermark, Thorn, Uber, Netflix and more.

Google’s Jeff Dean is speaking at Structure Data, as well. Need we say more?

The rest of the lineup includes great speakers on topics ranging from artificial intelligence to professional sports, and from social media to cybersecurity. If you want to listen to, learn from and network with some of the smartest people in the data world, register for Structure Data today.

How GDELT is cataloging and analyzing the entire planet

(Editor’s note: This post was written by Kalev Leetaru, founder of the GDELT Project and Structure Data 2016 speaker. Leetaru will be presenting a version of this essay on March 9th at Structure Data; register here.)

What would it look like to use massive computing power to see the world through others’ eyes, to break down language and access barriers, facilitate conversation between societies, and empower local populations with the information and insights they need to live safe and productive lives? By quantitatively codifying human society’s events, dreams and fears, can we map happiness and conflict, provide insight to vulnerable populations, and even potentially forecast global conflict in ways that allow us as a society to come together to deescalate tensions, counter extremism, and break down cultural barriers?

That’s the goal of the GDELT Project.

All locations mentioned in media coverage monitored by GDELT February to July 2015 colored by the primary language of coverage mentioning that location.

All locations mentioned in media coverage monitored by GDELT February to July 2015 colored by the primary language of coverage mentioning that location.

Read more

The Structure Show: IBM and AI, Netflix and the cloud, and storms ahead for Silicon Valley?

On this week’s episode of The Structure Show, Derrick Harris and Barb Darrow join me to talk about a possible change in direction for IBM’s AI strategy, why Netflix is now finally all-in on the cloud, and whether or not 2016 is shaping up to be bad year for tech, startups, and the economy in general.

Don’t forget that Structure Data 2016 is right around the corner, March 9th and 10th at the UCSF Mission Bay conference center. Derrick has put together a really cool lineup of speakers, including Rob High of IBM, who can shed some more light on the AI discussions taking place at Big Blue. You can find more information about the event here, and register here.

The Structure Show: Why AI is the future of search, Cisco’s IoT bet, and Structure Connect

On this week’s episode, Barb Darrow of Fortune and Stacey Higginbotham of The IoT Podcast joined me to talk about some interesting moves last week by two tech giants. Google’s changing of the guard in its search efforts, from the legendary Amit Singhal to AI researcher John Giannandrea, is a clear sign that artificial intelligence and machine learning are going to play an even greater role in the future of Google search than we might have thought. Cisco hopes it can keep some of its largest customer in the fold with Jasper Technologies’ internet-of-things management tools, and Stacey lays out some of the early topics and themes she’s planning for Structure Connect, this June in San Francisco.

And just as a reminder, Structure Data is coming up on March 9th and 10th in San Francisco at the UCSF Mission Bay conference center. Structure Data curator Derrick Harris shared a few thoughts on the show earlier today, which contains one of the best lineups we’ve ever put together for Structure Data. More information on Structure Data is here, and you can register for the event here.

01_structure-data-hero_2000x450

Can data play the hero during a tumultuous time in tech?

We’re just over a month into 2016, and folks who follow the tech sector have already had a wild ride. There was #RIPTwitter, Uber’s continuing legal battles (and new logo), the great unicorn devaluation, a bloodbath in the stock market for several once-darling tech stocks, and whatever the heck is happening at Yahoo. Have we seen the worst, or are we in for another one of those years that Silicon Valley won’t soon forget?

I can’t promise Structure Data will deliver the answer to that question, but it might help offer a little clarity. We’ll hear from executives and technologists at many of the companies in the news lately, who’ll share their takes on what’s going on, what the future holds, and how data analysis can play a role in turning things around. Speakers include:

  • Chris Moody, VP of Data Strategy, Twitter
  • Rob Bearden, CEO, Hortonworks
  • Josh Wills, Director of Data Engineering, Slack
  • June Andrews, Data Scientist, Pinterest
  • Eva Tse, Director of Big Data Platforms, Netflix
  • Ron Brachman, Head of Yahoo Labs, Yahoo
  • Chris Martin, CTO, Pandora
  • Jeff Schneider, Engineering Lead, Uber ATC
  • Rob High, VP and CTO of Watson, IBM
  • John Schroeder, CEO, MapR
  • Tom Reilly, CEO, Cloudera

If the conventional wisdom holds true, better use of data should lead to better products and happier customers. Which should lead to more revenue and happier investors. For companies in the business of selling data software, broader investment in big data should lead directly to more revenue and happier investors. Rinse and repeat.

Read more

Structure Rewind: How McLaren uses data at Formula 1 speed

One of the highlights of Structure Data 2014 was Geoff McGrath of McLaren, who is the exception to the notion that the best minds of our generation are working on ways to get people to click on ads. McGrath is the chief innovation officer at McLaren Applied Technologies, and in his appearance at the event delivered a very entertaining and informative talk on how the data insights of a racing pioneer can be used to make breakthroughs in lots of other fields.

McGrath will be back for Structure Data 2016, and it’s one of the sessions that I’m personally looking forward to as we gear up for the event. Structure Data 2016 is scheduled for March 9th and 10th at the UCSF Mission Bay conference center, and McGrath will be joined by a who’s who of speakers in big data and artificial intelligence, including Jeff Dean of Google, Peter Lee of Microsoft, and Andrew Ng of Baidu.

Check out McGrath’s 2014 talk in the video below. You can find more information about Structure Data 2016 here, and you can register here.

Slack’s Josh Wills: “Data is coming for your industry”

It’s refreshing to meet people in the tech industry like Josh Wills of Slack.

“I think all of software engineering is terrible,” he said during a recent visit to the offices of perhaps the hottest startup in enterprise technology. “I’d like to think that data engineering is uniquely terrible.”

Wills, who has been director of data engineering at Slack for four months following stints at Cloudera and Google, isn’t talking trash about the discipline or its practitioners, mind you. It’s just that behind what appears to be the magic of new types of software is often a lot of stopgap thrown-together chewing-gum reinforced code that may not be elegant, but that gets the job done in spite of the lack of a industry-standard approach. And while that’s often good enough to ship, problems can creep up later when changes need to be made and newer engineers, raised at a time when the solutions to those original challenges have become more obvious, can’t fathom the direction that was taken by the older engineers nor change things without creating chaos.

Josh Wills, Director of Data Engineering, Slack

Josh Wills, Director of Data Engineering, Slack

“You end up with a hodgepodge,” he said. This is especially true for data engineering, which is a newer field with fewer established best practices. “(You’re) hiring and everyone only knows one way to build data infrastructure, so you try to rebuild the way you know it.” The problem is that what worked for Google or Facebook might not work for a company like Slack.

Wills will be a very interesting presence on stage at Structure Data, scheduled for March 9th and 10th in San Francisco at the UCSF Mission Bay conference center. He’s quick-witted and forthright about the state of his profession without hiding behind bland talking points, paired with just the right amount of mild sarcasm and self-awareness that he’s lucky to work in such at such a fundamental time in data engineering for some of the world’s best companies.

Read more

The Structure Show: An AI legend, Twitter’s dilemma, Microsoft’s research

On this episode of The Structure Show, Derrick Harris and Tom Krazit discuss the life and legacy of MIT’s legendary artificial intelligence pioneer Marvin Minsky, wonder if Twitter is headed for a data-licensing future, and examine how big tech companies like Microsoft conduct basic research into the technologies of the future. (Sorry for the audio issues at the beginning, the engineer has been reprimanded.)

Show notes:

Marvin Minsky, Pioneer in Artificial Intelligence, Dies at 88

Twitter Confirms Amex’s Leslie Berland As Its New CMO

How Microsoft Plans To Beat Google and Facebook to the Next Tech Breakthrough

We’ll have representatives from both Twitter (Chris Moody) and Microsoft Research (Peter Lee) at Structure Data, March 9th and 10th in San Francisco. Check out the full schedule here, and you can buy your tickets here. Ticket prices go up after Friday, so this is a good week to sign up.

Facebook’s Alan Packer coming to Structure Data 2016

No need for translation here: we’re more than thrilled to welcome Alan Packer, Facebook’s head of Language Technology, to Structure Data 2016. Packer is responsible for the group within Facebook that allows people to comprehend Facebook posts written in something other than their primary language, using research developed by the company’s famous (within this world, anyway) artificial intelligence group led by Yann LeCun.

2b429b0

Packer has been at Facebook for a little over a year, after more than a decade at Microsoft working on products like Bing and Cortana. We’re particularly interested in how artificial intelligence makes meaningful translation possible, as many early language translation efforts on the part of tech companies have produced results that are either frustrating or hilarious, depending on how badly you need that information.

Packer joins a host of other speakers already confirmed for Structure Data, which will take place March 9th and 10th at the UCSF Mission Bay conference center. Event curator Derrick Harris has put together a fabulous agenda, with appearances by notable names such as Baidu’s Andrew Ng, Google’s Jeff Dean, and Microsoft’s Peter Lee.

I’d also like to take a moment to highlight some of the great moderators that are coming to Structure Data: Bloomberg’s Jack Clark, Wired’s Cade Metz, NPR’s Aarti Shahani, and The New York Times’ Steve Lohr. We want our attendees to get the most out of our sessions as possible, and one of the best ways to do that is to invite people who are good at asking good questions of our speakers.

You can find more information about the agenda for Structure Data here, and you can register here.

The Structure Show: Nvidia and AI, Yahoo’s donation, Foursquare’s future

Welcome to the first edition of the Structure Show! Today we’re kicking off what should be a weekly podcast discussing all the things we hold near and dear to our hearts at Structure, including data, cloud computing, the internet of things, and a few other topics we’re planning to showcase this year.

Today’s show features Structure Data curator Derrick Harris and yours truly discussing Nvidia’s role in AI research, Yahoo’s big machine-learning giveaway, and the future of Foursquare as it transitions to a data-driven business. We also talk about some of the great speakers we have lined up for Structure Data, March 9th and 10th in San Francisco, some of whom we hope to feature on The Structure Show in the coming weeks.

You can find The Structure Show here on Soundcloud (and below), and it’s coming soon to iTunes, Stitcher, and TuneIn. Thanks for listening.

From AI to IPOs: The top 5 issues in big data today

It’s 2016, and the business of big data is finally shaping up: thanks in large part to technological advances that have occurred over the past few years. Gone are the days of applying old, slow or inflexible technologies to new problems, and of searching for needles in haystacks. Here are the days of predictive analytics and machine learning, and of building new types of applications with data-powered intelligence baked right in.

However, there are dark sides to all this data analysis. An obvious one in this era of cybercrime run amok is privacy: all the data that companies and organizations collect about consumers is a juicy target for hackers and a tempting opportunity for ethically challenged businesspeople. There’s also the question of money, and how the companies that have come to epitomize big data will fare as venture capital shrinks back and public markets come to terms with open source business models.

01_structure-data-hero_2000x450

Here’s my take on the five biggest issues facing the big data market right now.

Read more

Structure Data Rewind: Foursquare’s Dennis Crowley on its data future

Two years before he was to step aside as Foursquare CEO amid a new funding round and a formal shift to “a new era” focused around a data-oriented strategy, Structure Data 2014 attendees heard firsthand from founder Dennis Crowley about how this shift was already underway to move beyond the check-in.

Dennis Crowley, founder, Foursquare speaks at Structure Data 2014 (screenshot)

Dennis Crowley, founder, Foursquare speaks at Structure Data 2014 (screenshot)

“The goal of the company was not to make an awesome check-in button,” Crowley said at the conference. “The goal of the company was, like, let’s make this social, crowdsourced map of the world and let’s use that to tell every single person in this room and everyone single person in the world, about all the interesting things they would find around them that they would otherwise normally miss.”

Foursquare’s future is certainly up in the air, but there’s no question that the company is sitting on a mountain of data that could be very attractive to other companies working on machine learning or artificial intelligence. Watch Crowley preview this future in the video embed below, and check out the schedule for Structure Data 2016 to see which speakers will hint at the future of their own companies in two years.

The state of AI is strong at the Future of AI Symposium

Artificial intelligence is one of the major topics we plan on exploring at Structure Data in March, and a conference organized by Facebook’s Yann LeCun in New York this week reminded us just how far we’ve come in honing this technology while still having so much work ahead.

The Future of AI Symposium was a chance for leading researchers and companies in the field of artificial intelligence to try and set realistic expectations around the development of AI, which can be hard in a media landscape raised on The Terminator movies. I particularly liked this illustration used by speaker Michael Littman, a professor at Brown University, in his talk during the conference, riffing on the famous New Yorker cover:

Courtesy of Michael Littman, Brown University

Courtesy of Michael Littman, Brown University

This quote from Murray Shanahan, AI professor at Imperial College, also seemed pretty typical of the discussion. As reported by The Future of Life institute blog, the general public tends to make mistakes when evaluating the state of artificial intelligence:

“The first,” (Shanahan) explained, “is that human level AI, general AI, is just around the corner, that it’s just […] a couple of years away,” while the second mistake is “to think that it will never happen, or that it will happen on a timescale that is so far away that we don’t need to think about it very much.”

This cautious mix of optimism and pragmatism is somewhat unique for the tech industry, which tends to forget its pragmatic side when discussing the future of something exciting. Eric Schmidt, chairman of Google parent company Alphabet, was a little more effusive in his outlook.

The current set of AI researchers and developers are “a small set of people that understand collectively that when we put all this stuff together we can build platforms that can change the world,” he said, as reported by Bloomberg. Google’s Jeff Dean will be one of our speakers at Structure Data, and the investments that Alphabet and Google have put into this field are enormous, even for a company of Google’s size.

And, of course, organizer LeCun’s Facebook group is working very hard on these problems and opportunities. Mike Schroepfer of Facebook summed up his talk in (where else) a Facebook post that focused on the company’s intention to give back to the data science community: “It is doubly rewarding to know that because of the way FAIR publishes all of their work, the very same technology we develop for Facebook will be used to help improve science, health care, and make safer cars.”

We hope to be able to advance the discussion around AI with speakers like Dean, Microsoft’s Peter Lee, and RoboBrain’s Ashutosh Saxena at Structure Datda 2016. You can find more information about the conference here, and you can register here.

Want to learn what real data innovation looks like? Come to the best Structure Data yet

Structure Data 2016 is approaching fast—it’s just a few months until the event kicks off on March 9—and it is looking like the best one yet. The speakers are great, the topics are interesting and, most importantly, the field of big data (or whatever you choose to call it) has never been more timely.

Structure Data 2014

Data analysis is so important today because the technologies are finally here to undertake real, meaningful data-driven business improvements. What began as a discussion about Hadoop, data volumes and cutting-edge web companies nearly a decade ago is now a discussion that touches business of every size and type. Artificial intelligence, predictive analytics, real-time processing and interactive business intelligence are here, and smart companies are already embracing them.

And they’re not adopting new data technologies just for the sake of doing it. They’re adopting new data technologies because customers are demanding it. The end product might be a smarter app, a connected devices, a better medical diagnosis or even a safer neighborhood, but data is the fuel that drives everything.

The current list of confirmed Structure Data speakers is consists of some of the smartest people from some of the world’s most innovative companies and organizations:

  • June Andrews, Pinterest
  • Maia Sciupac Arteaga, Thorn
  • Rob Bearden, Hortonworks
  • Steve Bowsher, In-Q-Tel
  • Ron Brachman, Yahoo
  • Jeff Dean, Google
  • Mike Driscoll, Metamarkets
  • Michael Franklin, UC-Berkeley/AMPLab
  • Ky Harlin, Condé Nast
  • Matthew Howard, Norwest Venture Partners
  • Ann Johnson, Interana
  • Peter Lee, Microsoft
  • Kalev Leetaru, GDELT
  • Chris Martin, Pandora
  • Geoff McGrath, MacLaren Applied Technologies
  • Neha Narkhede, Confluent
  • Andrew Ng, Baidu
  • Tom Reilly, Cloudera
  • Ty Roberts, Gracenote
  • Ashutosh Saxena, Brain of Things
  • John Schroeder, MapR
  • Ion Stoica, Databricks
  • Dharmesh Thakker, Battery Ventures
  • Eva Tse, Netflix
  • Dan Wagner, Civis Analytics
  • Hui Wang, PayPal
  • Josh Wills, Slack

And we’re just getting started! There will be more great speakers added in the weeks to come.

Buy your tickets today to ensure your place. Structure Data will be two very insightful days of presentations and networking, and you do not want to miss out.

NFL CIO Michelle McKenna-Doyle will make her Structure debut this November

Just as training camps kick into full swing across the country, we’re pleased to announce that Michelle McKenna-Doyle, CIO for the National Football League, is coming to Structure this November.

McKenna-Doyle, a first-time Structure speaker, oversees the technology used by NFL teams on the sidelines as well as the league’s overall technology strategy. The NFL signed a high-profile deal with Microsoft a few years ago to allow (and encourage) coaches to use Surface tablets as one of their in-game tools instead of those huge floppy laminated poster things. And given football’s fetish for statistics and data, as well as the increased scrutiny around player health, the league is investing in modern IT infrastructure.

Read more

Intel’s Diane Bryant is back to talk datacenter chips at Structure 2015

The cloud revolution has upended the server market, but Intel is determined to remain the power behind the massive datacenters that enable the cloud. That means we’re looking forward to an update from Diane Bryant, senior vice president and general manager of Intel’s Datacenter Group, at Structure 2015.

Despite the fact that the rise of cloud computing means fewer companies are actually buying their own off-the-shelf servers, Bryant leads a division of Intel that’s growing at a healthy clip. Massive cloud providers like Amazon are building their own infrastructure designed around their own needs, which means Intel has had to get flexible in order to accommodate the new buyers of server processors.

Read more

Facebook’s Jay Parikh is coming back to Structure this November

In just over ten years, Facebook has become a powerhouse company in the tech industry. And that’s not just because it lets you see which of your former high school classmates have turned into weirdos; it has unique infrastructure needs and has developed pioneering strategies to deal with those challenges. The man responsible for keeping Facebook going, Jay Parikh, is coming back to Structure 2015.

Last year at Structure Parikh unveiled Facebook’s top-of-rack Wedge switch, the latest installment in the Open Compute Project it kicked off years ago. And after it unveiled the modular “6-pack” switch in Februrary, it’s definitely time for an update on how Facebook’s networking strategy has evolved to make sure those baby pictures load quickly.

Read more

Five hot topics you can expect to hear about at Structure 2015

Acquiring the computing power you need to run your business has probably never been easier in the history of the information technology industry, yet it remains quite complicated in practice to make everything flow smoothly. Cloud computing has allowed thousands of companies big and small to get off the ground without having to put together their own infrastructure, but there are many ways to slice (and price) the cloud.

These are some of the things we’re thinking about as everything starts coming together for Structure 2015, the new-yet-quite-familiar tech conference that we’re bringing back after the demise of Gigaom earlier this year. Structure is scheduled for November 18th and 19th at the Julia Morgan Ballroom in San Francisco, and several familiar faces — such as Adrian Cockcroft, Urs Hölzle, and Vinod Khosla — are confirmed as speakers, with more to come.

Here are five major themes we plan to cover at Structure 2015:

Read more

Vinod Khosla of Khosla Ventures will speak at Structure 2015

Never shy and always thought-provoking, Khosla Ventures founder and longtime enterprise technology investor Vinod Khosla will be back for Structure 2015 this November in San Francisco.

I had the pleasure of interviewing Khosla last year at Structure when he caused a bit of a stir by assuring attendees focused on IT (no small number) that automation was coming for their jobs. “It’s ridiculous to have humans manage the level of complexity that they do. People are a big cost in IT. Let’s take that out,” he said, and I’m very curious to hear an update on how that’s going.

Read more

Urs Hölzle, Google’s legendary engineer, will join us at Structure 2015

It’s pretty safe to say that there would be no Google without Urs Hölzle, the man who figured out how to take Larry Page and Sergey Brin’s concept for a search engine and design the computing infrastructure that launched Google to prominence and is still the envy of the tech industry more than a decade later.

We’re proud to have Urs back for Structure 2015, set to take place this November in San Francisco. We have a lot of questions for Urs about the progress of Google Cloud Platform and its progress facing up against the deep-pocketed competitors — Amazon and Microsoft — that are also racing to build the public cloud infrastructure of the future.

Read more

Structure veteran Adrian Cockcroft will be back for 2015, talking the state of clouds and containers

Structure 2015 has only been live for a few days, but we’re ready to announce the first of our confirmed speakers for November: Adrian Cockcroft, technology fellow at Battery Ventures and alumnus of Silicon Valley tech stalwarts Netflix, eBay, and Sun Microsystems, will join us on stage at the Julia Morgan Ballroom to reprise his talk from last year on cloud trends, but with a very 2015 twist.

After roaring into the picture in 2014, container technology is maturing quite rapidly in 2015, and Cockcroft plans to update his talk to include discussion of the container ecosystem coming off the announcements at DockerCon in June. We hope to have other container-related discussions on tap for the event, so make sure to check back as we roll out more speakers.

Read more

Introducing Structure, a new tech events company

We’re getting (some of) the band back together.

When Gigaom abruptly shut down last March, the world lost not just a storied tech blog but a vibrant and growing series of tech events, a forum based on editorial integrity in which some of the most important issues of our world were hashed out and where visionaries and startups could mix, mingle, and learn from each other.

As several of us wondered what to do next, we realized pretty quickly in talking to our contacts in the tech industry that the demise of the Structure Series — Structure, Structure Data, and Structure Connect — left quite a void in the yearly calendar. So over the past several months, a series of quiet conversations began in hopes of bringing back this one-of-a-kind event portfolio.
Read more