Science creates knowledge via controlled experiments, so a data query isn’t an experiment. An experiment suggests controlled conditions; data scientists stare at data that someone else collected, which includes any and all sample biases.
Now, before you drag out the pitchforks: I’m not a query hater. You won’t see me standing outside the Oracle Open World conference with a sign that says “NO SQL” on it. Queries are fine. Smart people don’t always have the right answer, but they need to ask the right questions. Yes, building a query is like “forming a hypothesis,” but at that point we enter the realm of observational or “soft” science. Yes, by this standard, Astronomy and Social Sciences are also not sciences. I have no idea what Computer Science is, but no, it’s not a science either.
Oh what’s that? Your kind of “Data Science” includes things such as A|B Testing, and your “experiments” actually involve executing designs that affect the world? Allow me to retort: that’s not Data Science, that’s actually doing a job. You might have a job title like Product Management or Marketing. But if your job title is “Data Scientist,” you are effectively removing yourself from the actual creation of data.
I do sympathize. I appreciate that it’s no longer sexy to be a Database Administrator, and I guess the term “Business Analyst” is a bit too 1980’s. Slapping “Data Warehousing” on a resume is probably not going to land you a job, and it’s way down there with “Systems Analyst” on the cool-factor scale. If you’re going to make up a cool-sounding job title for yourself, “Data Scientist” seems to fit the bill. You can go buy a lab coat from a medical-supply surplus store and maybe some thick glasses from a costume shop. And it works! When you put “Data Scientist” on your LinkedIn profile, recruiters perk up, don’t they? Go to the Strata conference and look on the jobs board—every company wants to hire Data Scientists.
OK, so we want to be “Data Scientists” when we grow up, right? Wrong. Not only is Data Science not a science, it’s not even a good job prospect. In the immortal words of Admiral Akbar: “It’s a trap.”
These companies expect data scientists to (from a real job posting): “develop and investigate hypotheses, structure experiments, and build mathematical models to identify… optimization points.” Those scientists will help build “a unique technology platform dedicated to… operation and real-time optimization.”
Well, that sounds like a reasonable—albeit buzzword-filled—job description, no? There is going to be a ton of data in the future, certainly. And interpreting that data will determine the fate of many a business empire. And those empires will need people who can formulate key questions, in order to help surface the insights needed to manage the daily chaos. Unfortunately, the winners who will be doing this kind of work will have job titles like CEO or CMO or Founder, not “Data Scientist.” Mark my words, after the “Big Data” buzz cools a bit it will be clear to everyone that “Data Science” is dead and the job function of “Data Scientist” will have jumped the shark.
Yes, more and more companies are hoarding every single piece of data that flows through their infrastructure. As Google Chairman Eric Schmidt pointed out, we create more data in a single day today than all the data in human history prior to 2013.
Unfortunately, unless this is structured data, you will be subjected to the data equivalent of dumpster diving. But surfacing insight from a rotting pile of enterprise data is a ghastly process—at best. Sure, you might find the data equivalent of a flat-screen television, but you’ll need to clean off the rotting banana peels. If you’re lucky you can take it home, and oh man, it works! Despite that unappetizing prospect, companies continue to burn millions of dollars to collect and gamely pick through the data under respective roofs. What’s the time-to-value of the average “Big Data” project? How about “Never”?
If the data does happen to be structured data, you will probably be given a job title like Database Administrator, or Data Warehouse Analyst.
When it comes to sorting data, true salvation may lie in automation and other next-generation processes, such as machine learning and evolutionary algorithms; converging transactional and analytic systems also looks promising, because those methods deliver real-time analytic insight while it’s still actionable (the longer data sits in your store, the less interesting it becomes). These systems will require a lot of new architecture, but they will eventually produce actionable results—you can’t say the same of “data dumpster diving.” That doesn’t give “Data Scientists” a lot of job security: like many industries, you will be replaced by a placid and friendly automaton.
So go ahead: put “Data Scientist” on your resume. It may get you additional calls from recruiters, and maybe even a spiffy new job, where you’ll be the King or Queen of a rotting whale-carcass of data. And when you talk to Master Data Management and Data Integration vendors about ways to, er, dispose of that corpse, you’ll realize that the “Big Data” vendors have filled your executives’ heads with sky-high expectations (and filled their inboxes with invoices worth significant amounts of money). Don’t be the data scientist tasked with the crime-scene cleanup of most companies’ “Big Data”—be the developer, programmer, or entrepreneur who can think, code, and create the future.
Miko Matsumura is a Vice President at Hazelcast, an open source in-memory data grid company. He is a 20-year veteran of Silicon Valley.
It’s that time of year where pundits prognosticate about the upcoming year. I’ll bite–MMX (that’s Roman numeral for 2010) is shaping up to be a doozy of a year (although I prefer 7DA, which is 2010 in hex). Last night I decided to re-watch 2010 “The year we make contact”. It’s still an incredible movie and a fascinating way to examine people’s assumptions and predictions about the future. The book 2010 was published by Arthur C Clarke in January of 1982. Some of the striking differences between today and the 2010 imagined by Arthur C Clarke in his book include:
The radical advancement of Artificial Intelligence (AI) in the form of the HAL 9000 computer
Substantial investment in Space Exploration including a second manned trip to Jupiter 9 years after 2001, the first trip
Nuclear conflict between the United States and the Soviet Union, a nation that no longer exists
Contact between a super powerful alien race and humankind (this might yet happen but time is short)
Computer User Interfaces are barely better than a dumb terminal
It’s interesting how the predictions often say more about the time of publication and about the author than about the future–in the 1980’s the threat of nuclear war with the Soviet Union, as portrayed in the movie WarGames, which came out in 1983. Of course the advancement in the space program was a fond hope of Arthur C. Clarke, who is certainly a childhood hero of mine in terms of his message of technology transcendentalism and his pioneering science-based fiction.
So with this backdrop, I will venture to make my own technology predictions for 2010, focused on Enterprise Software, Cloud Computing and related topics.
Prediction 1: nothing will happen in 2010
A bold prediction. I’ve already read my share of predictions for 2010 including those from:
I’ve listed (in parentheses) some of the predictions made, above. First of all, the predictions I highlighted were the ones I found the most interesting. Aside from the unlikely (Steve Ballmer will leave Microsoft) and the just-plain-crazy (Supercomputers will achieve the same raw processing power as human brains), I can’t say that any of these predictions gets my blood moving. All due respect to those pundits and prognosticators, many of whom I consider my friends and colleagues.
So why won’t anything happen in 2010?
The short version is that big changes that you’d notice take a long time. It also happens that such changes also take a very short time.
If you find the previous statements irritating or conflicting, you are not alone. Big changes in technology and society are frequently driven by exponential functions–and Albert A. Bartlett, Professor Emeritus at UC Boulder (many thanks to my friend @avh for tweeting this video) makes a solid case that “The Greatest Shortcoming of the Human Race is the Inability to Understand the Exponential Function”. If you feel challenged by my previous statements, please take the time to have a look at this video:
As you can see, the exponential function is just a fixed percentage of growth that compounds. Albert Einstein never said “Compound Interest is the most powerful force in the Universe”, but he should have. The exponential function is the fundamental driver of many driving forces and the resulting human impact. This includes:
human population growth (overcrowding)
energy consumption (oil prices)
pollution emissions (global warming)
transistor density on a chip (computer industry)
DNA sequencing rate (Human Genome project)
Almost all of the hugely transformational items on any technologist’s list for the Enterprise are going to be growing slowly next year. Service Oriented Architecture(SOA), Business Process Management(BPM), Cloud Computing and others. According to IDC Chief Analyst Frank Gens (@fgens), “2010 will be a year of modest recovery for the IT and telecommunications industries. But the recovery will not mean a return to the pre-recession status quo. Rather, we’ll see a radically transforming marketplace — driven by surging demand in emerging markets, growing impact from the cloud services model, an explosion of mobile devices and applications, and the continuing rollout of higher-speed networks. These transformational forces will drive key players to redefine themselves and their offerings and will spark lots of M&A activity.”
But many of the core transformational topics in Enterprise Computing will be growing at single and double (but not triple) digit rates.
Ok, nothing is going to happen, now what?
We’ve established some very Twitter friendly names for 2010 such as MMX (the Roman) and 7DA (the Geek). But to peer farther into the future we should take a look at the upcoming decade. Every decade has a bit of a “theme” that emerges that you can use for when you have nostagia parties in future years. Here are some examples:
The Psychadelic Sixties 1960-1969
The Disco Seventies 1970-1979
The Yuppie Eighties 1980-1989
The Internet Nineties 1990-1999
The Miserable Naughties 2000-2009
Yes, we are good and ready for the Naughties to be over. Bad Naughties, no Krispy Kreme donut (NYSE:KKD)! Lets look back to January 2000.
Just a short (almost) ten years later and Time-Warner is divesting AOL, the Dow Jones Industrials are lower, Unemployment is at 10%, and we’re fighting global warming, economic collapse and Al Qaeda. Can I just say that we are all SICK and TIRED of the Naughties, the nothings, the zero decade, the lost decade, the decade from hell.
Predictions for the Teenies
Technically if the 2020’s will be called the “twenties”, perhaps the next decade should be called the “tens”. I’m not keen to focus on the early part of the decade, so I am going to point to 2013 and beyond, which we can refer to as the “Teenies” (2013-2019). If we absolutely must have a name for the interim period, lets call them the “Tweenies” (kids aged 10-12 are referred to as such). A few other reasons why I like Teenies as a name for this upcoming decade is:
We are an adolescent species, mere teenagers–more on this later
Growth in this decade will come from “teeny” things
As many forecasters will tell you, it will take a good long time to build our way out of what’s now called The Great Recesssion–and though we are seeing “green shoots” now, it will take a long time, well into the decade to start to see the significant effects. So to be fair, the prediction made earlier that “nothing will happen in 2010″ can be recast as a prediction about the decade as a whole–and in this spirit, lets carry on making some predictions about the Teenies.
Prediction 2: Very Teeny Things Become Very Big
Here’s the short list of very small things that will become very big in this upcoming decade:
The Higgs Boson
The Carbon Dioxide Molecule
Genomics & Personal Medicine
Chlorophyll and Artificial Photosynthesis
Dehalococcoides ethenogenes and other pollution eating microbes
Among many others. This prediction is of course very general, but it is intended to provide an impressionistic view on some of the leading advances approaching the boundary of industrial exploitation. In computing in particular, quantum computing is beginning to show promise, as is nanoassembly which is the more bottom-up approach to extremely small circuit design. at the 45nm chip design scale, the fabrication costs are already growing prohibitive. Nanotechnology is also showing tremendous promise in transforming the storage industry.
Prediction 3: Storage and Persistence are transformed
Naturally storage experiences a doubling interval similar to Moore’s law. But we are reaching a significant inflection point, both for the application requirements of persistence as well as the persistence technologies. companies like Steve Wozniak’s FusionIO are pioneering solid state technologies and distributed caching technologies are radically improving performance across traditional APIs according to researchers like Forrester’s @JohnRRymer and @MGualtieri. Companies like TerraCotta and RNA Networks and others are leading the charge. The exciting thing about these technologies is that they are completely disruptive technologies but also backwards compatible with today’s technology APIS, so they can be inserted into everyday applications. Unlike the radical wave of “Complex Event Processing” (CEP) vendors such as StreamBase that require completely rewritten applications (even as they use familiar SQL-like query languages), these solutions provide up to 6 orders of magnitude theoretical performance basis (millisecond disk access vs nanosecond RAM access) over interfaces such as filesystem mount points.
Beyond these advances in software, we see hardware advances such as bottom up storage using nanoscale self-assembly. Ting Xu, a UC Berkeley assistant professor with joint appointments in the Department of Material Sciences and Engineering and the Department of Chemistry, says in the February issue of the journal Science: “The density achievable with the technology we’ve developed could potentially enable the contents of 250 DVDs to fit onto a surface the size of a quarter”. “The challenge with photolithography is that it is rapidly approaching the resolution limits of light,” added Xu. “In our approach, we shifted away from this ‘top down’ method of producing smaller features and instead utilized advantages of a ‘bottom up’ approach. The beauty of the method we developed is that it takes from processes already in use in industry, so it will be very easy to incorporate into the production line with little cost.”
Prediction 4: Just like teenagers, we have trouble getting over ourselves
Despite utopian visions like Star Trek, the “Enterprise” struggles with it’s scale. The Star Trek universe is based on the concept of “Federation”. Daryl Plummer, VP and Research Fellow at Gartner defined Federation as “what autonomy you have to give up in order to be part of something bigger.” This is a great definition as it speaks to organizational silos as well as down to individuals in the Enterprise. I wrote about this challenge both in my blog post “There is no “I” in IT–oh yes there is” and a rational response at the Enterprise IT level in the InfoQ article “SOA Governance Revitalized” (thanks @FloydMarinescu and Ryan Slobojan @straxus)
The Shift Index 2009 (download the abstract here), published by @JHagel shows how poor we are at scaling organization. Since 1965, Return on Asset has declined 75% across US Public Corporations.
We’re not good at federation and scaling organization.
Even Order-To-Cash is going to require collaboration across Enterprise technology silos and Organizational tribes. The problem of Great-Idea-In-The-Shower-To-Cash requires Enterprise collaboration and continuous measurement, alignment and accountability across organizational boundaries. The problem is, the Enterprise may not be the best place for this kind of innovation. Recall that Enterprises are defined (yeah defined by me in this blog post: Top 5 Definitions of Enterprise) as organizations that require size and longevity in order to pursue their mission. The problem with size and longevity is the production of organizational and technology silos.
This results in a complex IT supply…
What remains to be seen is if organizations of size and longevity (read: fat and old) can collaborate at a rate competitive with smaller (perhaps Dunbar-number-sized) organizations. Christopher Allen (@ChristopherA) has an excellent blog post on organizational size as it relates to Dunbar’s number (commonly approximated as 150 people). These smaller organizations can have simpler IT (such as Cloud Powered) while being able to integrate and meet complex business requirements and form complex value chains. Large organizations will not be able to retain talent during the economic upturn and growth of the Teenies, nor will they be able to acquire and consolidate innovators due to the reopening of the IPO markets and the expansion of Market Capitalization proportional to the growth potential of these innovators.
Prediction 5: Trust will take time to heal
One of the reasons for Prediction 1 is the speed at which trust can be restored to the economy. The principle of exponential growth can be seen as a simple reiteration of the financial principle of Compound Annual Growth Rate (CAGR). However, exponential growth can also be a hiding place for charlatanism and multibillion dollar fraud schemes such as those perpetrated by Former NASDAQ Chairman Bernard Madoff.
The ripple effect is both cause and effect–the collapse of the pillars of the economy produces large scale job losses–which also puts fear and mistrust into the economy. Lets take a look at an animation that graphically depicts this blow to our economy regionally in the United States:
Thanks to Super VC David Hornik (@DavidHornik) for tweeting this video.
Speaking of Venture Capital–these are the people who are investing in exponential growth. Trust is returning to those markets as well with Benchmark’s amazing day thanks to Peter Fenton (@PeterFenton) and RedPoint’s successful IPO of Fortinet. Since the greatest failing of humankind is the inability to understand the exponential function, it is hard to understand how to combine trust with transformation and the unique chemistry that is Silicon Valley.
But at a Compound Annual Growth Rate of only 14% we have a doubling interval of 5 years. And interestingly enough, we are experiencing a much shorter cycle time for technology adoption. It took 38 years for radio to attract the first 50 million users. It took 13 years for Television to hit a similar number of users. 4 years for the Internet, 3 years for the iPod and less than 2 years for FaceBook. So we are very bad at understanding the exponential function and also terrible at federation and scaling organization. But the good news is that we have a tailwind.
It’s a cool sunny day in San Francisco and there’s some bustle at the Moscone center.
Enterprise 2.0 conference.
You can tell it’s an Enterprise conference because unlike the Web 2.0 Conference there’s no free pass even to the show floor. Also the full pass is about $2500 bucks. One way to define Enterprise is:
5. Stuff I wouldn’t do unless you paid me.
This definition puts Enterprise squarely in the camp of crime scene janitorial services. It adds a concept of “professional” to the discussion and establishes the Enterprise as the realm of uncomfortable clothing. I recall reconnecting with Arthur Van Hoff after our adventures in Java and having him laugh at me because I was wearing (in his words) an “IQ Restrictor”, his parlance for a necktie. This definition also puts a dynamic tension between the “Suits” at the Enterprise 2.0 conference and the boho hipsters wearing the Emo Hair.
4. Software that sucks.
This was the definition I evoked in my post “The Human Enterprise.” To be honest, I introduced the idea of “The Human Enterprise as a direct counterproposal to “Enterprise 2.0″. I think the piece that was missing from The Human Enterprise is the extent to which fragmentation plays a role in the essential nature of the Enterprise, which is a theme I’ve been addressing more lately in terms of the effect of sheer size on the Enterprise.
3. A venture requiring industriousness or courage.
This definition deserves some attention because it in some ways captures exactly what’s missing from the current debate around the Enterprise. The extent to which courage has been slowly sapped by the ravages of the Great Recession and “job security” is to some extent disheartening. In particular, efforts to rejuvenate the complex IT System Architecture and to mitigate the effects of Entropy and the “Heat Death of IT” have been met with cries of “SOA is Dead“. So here’s a call for the restoration of courage in IT, to boldly go. Set phasers on “frappe”.
2. Dead stuff that used to matter
Rumors of the death of Enterprise Software have been greatly exaggerated (nice post by David Hornik). The thing people find hard to understand about the longevity of most Enterprise IT is that “dead” software actually lives a long time. In fact dead software (nice post by James Governor) runs 90% of the economy. Another word for “legacy” is IT projects that worked. The word for IT projects that didnt work is “consolidation”. This should be especially resonant for folks at the Enterprise 2.0 conference, since 99% of the projects spawned by “Enterprise 2.0″ will fall into the latter category. We will have won when there’s “Legacy Enterprise 2.0″ apps out there.
1. An organization whose mission requires significant size, growth and longevity
I present this as the number one definition in an attempt to extract the most salient feature of the Enterprise to casual observers. The definitionis designed to be inclusive of Government organizations. I don’t want to open a can of worms (big government vs small government) but arguably some “missions” such as the regulation of interstate commerce and providing for the common defense would require a degree of size, scale and longevity. But what’s more interesting about this definition are the implications.
At this scale, the organization struggles with whether it’s “too big to fail” or “too big to succeed”.
The implications of size include fragmentation of organization into tribes.
The implications of growth include fragmentation of markets into niches.
The implications of longevity include fragmentation of technology into silos.
These forms of fragmentation is the key challenge of Enterprise, and the points that some E2.0 companies seem to miss. Trying to repackage consumer apps and peddle them to Enterprises misses the unique pain of Enterprise. I’ve spoken and written extensively about the effect of technological and organizational silos, for example in my book SOA Adoption for Dummies. But lately I’ve been thinking about the effects of market fragmentation.
There comes a tipping point in any large commercial sector Enterprise where the market for the flagship product or service becomes saturated. At this juncture, the revenue growth challenge becomes less about attracting and delighting new customers but rather about sucking as much money out of existing customers as possible. The example I will provide for you is the Apple iPod. At the risk of offending fanboys, the iPod market is saturated. I must own a half dozen iPods. Now I go running with my iPod nano 3g. When my 3g failed, I went to the Apple store to buy a new iPod. The way Apple segmented their products, they had created a low end model at $59 dollars (the clip) which had no screen, a “medium” range but portable option (the nano) at $150 and then the “platform” model, the iPod Touch at $199.
The nano costs only 50 bucks less than the Touch, but for users who want to run with an iPod, the Touch is too big. Since they overloaded the nano with features I dont want (accelerometer, video camera, FM radio) they were able to jack up the price.
This kind of behavior exists in many mature markets, including cell phone plans. The cell phone companies have “package designers” who specifically design packages including SMS and email that rack up a maximum number of overcharges and fees. They design packages that exploit the gap between what users think they will use and what they actually use based on data mining in their demographics. This type of behavior makes the Enterprise essentially the “enemy” of the consumer. Of course we want successful companies to have profits so they can fuel the next generation of investment. I certainly want Apple to succeed and I bought their product even though I found it mildly distateful (it was still the best player for my purpose).
I wrote this post in the hopes that it would stimulate discussion about how people define the “Enterprise” in “Enterprise 2.0″.
So why dust off this word? I suppose I enjoy collecting antiques.
It’s after all a perfectly good word, and can be repurposed as a pot holder or maybe a tea cozy. What I’d like to have is a word that signifies the following:
An organization that has grown in size to the point where the old tricks don’t work anymore.
* Its organization has shattered into factions
* It’s technology has separated into silos
* Its market has fragmented into niches
The big challenge is how does one maintain the advantages of size and scale but still retain agility?
I think it’s possible:
So how does fragmentation affect the use of cloud?
Well in terms of complex demand, cloud principles are very exciting.
If your market is fragmented, you will be happy to offer a platform of reusable services that can be customized by channel partners or even by end users into thousands of possible use cases. Think iPhone App Store. So for complex demand, the cloud is a good thing.
The challenge for the Enterprise and cloud is the concept of “Complex Supply”. Since both the technology in the Enterprise is already siloed, adding cloud just adds another silo. Legacy Mainframe apps, Web Application Servers, Enterprise Applications, you name it, Cloud just adds yet another technology silo to maintain, integrate, secure and govern. Since large organizations are fragmented into smaller organizations, this problem is compounded when one organization creates a dependency on cloud services without a systematic enabling architecture.
Size matters. People try to apply architectural patterns and software solutions as if they were one-size-fits all.