Harvesting Big Data to Enhance Supply Chain Innovation Capabilities: An Analytic Infrastructure Based on Deduction Graph Abstract
Today, firms can access to big data (tweets, videos, click streams, and other unstructured sources) to extract new ideas or understanding about their products, customers, and markets. Thus, managers increasingly view data as an important driver of innovation and a significant source of value creation and competitive advantage. To get the most out of the big data (in combination with a firm’s existing data), a more sophisticated way of handling, managing, analysing and interpreting data is necessary. However, there is a lack of data analytics techniques to assist firms to capture the potential of innovation afforded by data and to gain competitive advantage. This research aims to address this gap by developing and testing an analytic infrastructure based on the deduction graph technique. The proposed approach provides an analytic infrastructure for firms to incorporate their own competence sets with other firms. Case studies results indicate that the proposed data analytic approach enable firms to utilise big data to gain competitive advantage by enhancing their supply chain innovation capabilities.
How could operations managers harvest big data to enhance supply chain innovation as well as to deliver better fact-based strategic decisions?
Many countries are now pushing for Digital Economy, and Big Data is increasingly fashionable in recent jargon. Wong (2012) states that the key factor to gaining competitive advantage in today’s rapidly changing business environment is the ability to extract big data to gain helpful business insights. Being able to use big data allows firms to achieve outstanding performances against their competitors (Oh, 2012). For example, retailers can potentially increase their operating margins by 60 percent by tapping into hidden values in big data (Werdigier, 2009). Although a large capital and time should be invested in building a big data platform and technologies, the long-term benefits provided by big data to create competitive advantage is vast (Terziovski, 2010). Many researchers point out that firms can better understand customers’ preferences and needs by leveraging data available in loyalty cards and social media (Bozarth et al., 1998; Tsai et al., 2013).
There are huge potential values that remain uncovered in big data. As Manyika et al., (2013) indicates, 300 billion dollars of potential annual value can be generated in US healthcare if organisations or governments can capture big data’s value. Moreover, the commercial values of the personal location data all around the world are estimated to be 600 billion dollars annually (Davenport and Harris, 2007; LaValle et al., 2010). Different benefits can be gained for different industries, but it also can generate values across sectors (Mishra et al., 2013). The announcement of big data as the national priority task in supporting healthcare and national security by the White House in 2010 further emphasizes the essential role of big data as a national weapon (Mervis, 2012).
Currently, there is a variety of analytics techniques contains predictive analytics, data mining, case-based reasoning, exploratory data analysis, business intelligence, and machine learning techniques that could help firms to mine the unstructured data i.e. understand customers’ preferences and needs. However, the applications of existing techniques are limited (Tsikriktsis, 2005; Cohen et al., 2009). Wong (2012) points out that the existing techniques for big data analytic are, in general, likely to be mechanistic. Additionally, many researchers point out that big data analytic technique to aid the development of new products are relatively underemphasised (Ozer, 2011; Cheng et al., 2013; Manyika et al., 2013).
Clearly, there is a lack of analytical tools and techniques to assist firms to generate useful insights from data to drive strategy or improve performance (Yiu, 2012; Manyika et al., 2013). Thus, how could operations managers harvest big data to enhance supply chain innovation as well as to deliver better fact-based strategic decisions? Arlbjørn et al., (2011) state that supply chain innovation is a change within a supply chain network, supply chain technology, or supply chain process (or a combination of these) that can take place in a company function, within a company, in an industry or in a supply chain in order to enhance new value creation for the stakeholder. Many researchers pointed out that supply chain innovation is a vital instrument for improving the performance of a supply chain and it can provide firms with great benefits (Flint et al., 2005; Krabbe 2007). For example, it can significantly improve customer response times, lower inventories, shorter time to market for new products, improve decision making process as well as enabling a full supply chain visibility. Wong (2012) and Manyika et al., (2013) state that big data provides a venue for firms to improve their supply chain operations and innovation. With big data, firms can extract new ideas or understanding about their products, customers, and markets which are crucial to innovation. However, the main challenge to managers is to identify an analytic infrastructure that could harvest big data to support firms’ innovation capabilities.
Analytics is the practice of using data to generate useful insights that can help firms make better fact-based decisions with the ultimate aim of driving strategy and improving performance (Wong, 2012). This paper seeks to develop and test an analytic infrastructure for a firm to incorporate its own competence sets with other firms. A firm’s competence set (i.e. an accumulation of ideas, knowledge, information, and skills) is vital to its innovation capabilities (Yu and Zhang, 1993; Li, 1997; Chen, 2001; Schmenner and Vastag, 2006; Mishra and Shah, 2009). This research addresses the situation in which a firm is willing to harvest (i.e. from big data) and incorporate competence sets of others so that its innovation capabilities can be expanded.
To assist our understanding of harvesting big data to enhance innovation, this study will propose an analytics infrastructure for managing supply chain competence sets. Further, it will demonstrate how the proposed approach could be applied in a fast moving consumer fashion industry to assist managers to generate new product ideas, and identify the required competence sets to produce products in the most cost effective ways. Finally, the strength of the proposed approach, its limitations, and research implications of this work will be examined.
2.0 CHALLENGES IN BIG DATA HARVEST
Ohlhorst (2012) describes big data as having an immeasurable size of data, where the scale of data is too varied and the growth of the data is extremely quick, so that conventional information technologies cannot deal with the data efficiently. In the year 2000, only 800,000 petabytes (PB) of data were stored in the world (IBM, 2013). It is expected this number will reach 35 zettabytes (ZB) by 2020 (Wong, 2012; Yiu, 2012). The explosion of data leads to difficulty for traditional systems to store and analyse it (Huddar and Ramannavar, 2013; Zhan et al., 2014).
Furthermore, there are many different types of data, such as texts, weblogs, GPS location information, sensor data, graphs, videos, audio and more online data (Forsyth, 2012). These varieties of data require different equipment and technology to handle and store (Bughin et al., 2010). Moreover, data has become complex because the variety has shifted from traditional structured data to more semi-structured and unstructured data, from search indexes, emails, log files, social media forums, sensor data from systems, and so on (Mohanty et al., 2013). The challenge is that the traditional analytic technologies cannot deal with the variety (Zikopoulos and Eaton, 2012; Zhan et al., 2014). Eighty percent of data is now unstructured or semi-structured and almost impossible to analyse it (Syed et al., 2013). However, in the digital economy, a firm’s success will rely on its ability to draw insights from the various kinds of data available to it, which includes both traditional and non-traditional. The ability to analyse all types of data will create more opportunity and more value for an enterprise (Dijcks, 2013; IBM, 2013).
On top of the variety, huge amounts of data are generated every second and increasing amounts of data have very short life (Xu et al., 2013). These entire situation leads to the increased demand of businesses to make more real-time responses and decisions (Minelli, 2012). A review of literature (Cohen et al., 2009; Zikopoulos and Eaton, 2011; Huddar and Ramannavar, 2013) shows that there are various existing techniques i.e. Hadoop and MapReduce which is available to managers to harvest big data. Apache Hadoop is an open-source software framework that allows users to easily use a distributed computing platform. It is capable of dealing with large amounts of data in a reliable, efficient and scalable manner. Its reliability is enhanced by maintaining multiple working copies of data and redistributing the failed node. Hadoop can parallel process the data to increase speed, and it has high scalability because it can handle PB level data (Lam, 2010). Moreover, the massive applications of data processing can be run on the Apache Hadoop. The Hadoop provides high reliability and a high fault tolerance to applications (Vance, 2009). MapReduce is a programming model to deal with large-scale data sets. It can run parallel computing and can be applied on Hadoop. It is used for distributing large data sets across multiple servers (Dean and Ghemawat, 2008).
However, it is extremely hard for existing analytics to analyse high volume (and variety) of data in real time and produce useful information (Bisson et al., 2010). Although such techniques might help managers to produce a lot of information, they are unfocused, and hence inefficient (McAfee and Brynjolfsson, 2012). A lot of effort and time is needed to sort out the information generated and to identify those that are relevant and viable. What is required is an analytic infrastructure that can structure and relate various bits of information to the objectives being pursued.
Therefore, instead of just generating vast amount of information using existing software, managers need techniques to structure, and link various stream of data to create a coherent picture of particular problem – so that a better insights into the issue being analysed could be gained. There are several sophisticated analytic techniques such as connectance concept (TAPS), influence diagram, cognitive mapping, and induction graph that managers could apply to make visual representation of the problem being analysed (please see Figure 1).
Burbidges’ Connectance Concept
Generates a network picture between variables and objectives to provide an action plan process in order to help organisations to make decisions
Make the problem easy to understand
Provide more options for decision making
Cannot figure out the optimal choice.
Represents all causal relationships of a phenomenon in a manner that is non-ambiguous and probabilistic
May not be suitable to analyses complex problems that involve relationships that are qualitative in nature
Uses statements to build complex networks for a problem. Allows multiple foci.
Simple to apply
Could build a network from any focus
Could result in a very complex model
No structured approach for constructing network
Shows links between different level nodes and thus compose a graph structure.
Simple to understand and interpret
Possible scenarios can be added
Calculations can be overly complex
A variety of commercial software such as LINGO, etc.
Figure 1: Comparison of Causal Analytic Techniques
The Burbidge’s connectance concept (Burbidge, 1984) enables managers to create a network of variables based on the ‘cause and effect’ relationships. Recently, the vast Burbidges’ database has been computerised via Tool for Action Plan Selection (TAPS) by a team of researchers at Cambridge University (Tan and Platts, 2003; Tan and Platts, 2004). It has two basic functions: the first is connecting different variables, tools or objectives together and showing the clear relationship between each other (Tan and Platts, 2004); the second is to create a whole view of the action plan, after knowing the different sequences in achieving the target, it can help managers to choose the suitable action. This tool was adopted by many companies to solve manufacturing problems. In the big data environment, there are explosions of data and information, and big data analytics can figure out the relevant variables or competence sets, and classify them into different groups to enrich the TAPs network. However, although TAPS indicates how the actions can affect the objectives, it is a qualitative technique that unable to quantify the potential impact of each connectance.
Influence diagram is one of the most widely known and used cause-effect diagrams in operations management (Shachter, 1986; Smith, 1989; Guezguez et al., 2009). It is a systematic technique for identifying the possible root causes of a problem by breaking it down into components, and also the direction of the effect. An influence diagram attempts to represent all causal relationships in a manner that is non-ambiguous and probabilistic (Cobb and Shenoy, 2008). Cognitive mapping is used to explore and structure problems (Buzan, 1982). It allows an individual to acquire, store, recall, and decode information about the relative locations and attributes of phenomena in their everyday environment. It uses only text to build complex networks, which may have several foci (Fransoo and Wiers, 2006; Georgiou, 2009). Both influence diagram and cognitive mapping are useful techniques for managers to visually understanding ‘as it’ problems. However, both techniques lack analytical capabilities to process vast volume of data.
Induction graphs are a generalization of decision trees (Zighed and Rakotomalala, 2000). In a decision tree, the classification decision is made from root towards leaves without possible backward return from a node to a lower or higher level node in the tree. Induction graphs enable users to introduce links between different level nodes and thus compose a graph structure. This method is now much used in browsing data methods such as knowledge retrieval from the data which also called data-mining (Huyet and Paris, 2004).
Overall, these analytic mapping infrastructures are not necessarily optimised for the decision making task due to their general purposes. For example, Burbidge’s connectance concept and influence diagram only focused on the qualitative relationship, while induction graph might lead to complicated decision problem that is difficult to solve. And also, cognitive mapping might result in overly complex models since it allows the development of multiple foci (see Figure 1).
3.0 THE PROPOSED BIG DATA ANALYTIC INFRASTRUCTURE
Thus, a much better analytic infrastructure is needed to assist managers to better make use of the available big data to gain competitive advantages. Instead of just generating vast amount of information using existing software, what managers need are techniques to structure, and link various stream of data to create a coherent picture of a particular problem – so that a better insight into the issue being analysed could be gained. For example, having identified the products that could meet future markets from big data analysis; subsequently, how could managers identified the required competence sets to develop the new products? What managers need are an analytic infrastructure that use big data as inputs to make more informed strategic decisions.
Li (1997) proposed an analytic technique called deduction graph model that allows firms to incorporate their own competence sets with other firms. It provides a sequence of optimised expanding process in a visual way by linking different competence sets from various sources (Li et al., 2000). Although this approach has not been adopted in big data analytic area, we believe it provides the right analytic capabilities to help firms to harvest big data to enhance supply chain innovation.
The deduction graph model proposed by Li (1997) illustrates the competence sets expanding process vividly. It is an optimisation model to cooperate with other competence sets (Yu and Zhang, 1992). For example, let E be the problem needing to be solved. Let Tr be the truly needed competence set, Sk be the acquired competence set for solving the specific problem and intermediate skill (I) can increase the learning speed or connect the Tr and Sk, the model is dedicated to helping the decision-maker to obtain the Tr from the Sk. It can deal with multiple decision-makers and also considers the intermediate skills and cyclical relationship between skills (Li, 1997; Li et al., 2000). This analytic infrastructure is trying to build a deduction graph beginning from the starting node (Sk) to the ending node (Tr) through the intermediate nodes (I). Then it uses the 0-1 integer programming to get the optimised solution. Li’s deduction graph is an efficient mathematic method. It provides a learning network by connecting the related competence sets, and then it uses the optimisation programming to find optimal solutions to acquire the needed skills. It can provide more alternative process sequences to solve a problem.
Although Li’s deduction graph is a sophisticated mathematical model to deal with optimisation and information connectivity problems, its limitations are significant. Due to its powerful information processing capability, it requires varied and great amount of information to solve a particular problem. In addition, previous researches are mainly based on theoretical assumptions (i.e. managers can list all required information accurately; managers can freely purchase required competencies at listed prices from different sources). Compared to Li’s model, the proposed analytic infrastructure combines big data techniques and deduction graph to overcome the limitations of Li’s model. Instead of relying on theoretical assumptions, the proposed infrastructure is capable to harvest potential values from different sources of real company data. For example, existing data mining approaches can help firms to discover the unknown single skills or compound skills needed for new product development. Also, data transparency makes it easier to access other companies’ competence sets. In this way, the proposed analytic infrastructure can overcome Li (1997) deduction graph model’s limitations and offer many potential values to companies.
Figure 2 shows an analytic infrastructure framework based on deduction graph model that could be used by managers to enhance supply chain innovation capabilities. Basically, it involves a two-step process to operationalise the proposed framework: data management and data analytics. In particular, this paper is mainly focuses on data analytics.
Step one: Data management
First of all, it is essential for organisations to understand what information they need in order to create as much value as possible. This is because some valuable company data are created and captured at high cost but most of them are ignored finally. Thus, it is significant to meet their bulk storage requirements in big data management stage for experimental data bases, array storage for large-scale scientific computations, and large output files (Sakr et al., 2011). Data requirements could be different due to different organisations’ needs and problems. Then, a number of data pre-processing techniques, including data cleaning, data integration, data transformation and data reduction, can be applied to remove noise and correct inconsistencies from data sets. After that, data mining techniques can be used to help managers generate lots of useful information, involving internal skills (I), existing competence sets (Sk), needed competence sets (Tr) and the relevant skills as well as the learning cost data toward a specific issue. All these information captured is significant for the development of deduction graph models in step two.
Step two: Data analytics
Data analytics involves data interpretation and decision making. We use deduction graph model in this step, which illustrates the competence sets expansion process vividly (Li, 1999). As the internal skills (I), existing competence sets (Sk), needed competence sets (Tr) and the relevant skills as well as the learning cost data can all be acquired from step one via data mining. The harvested data will serve as inputs to the deduction graph, a unique mathematic model that can be built to address a particular problem. Then, managers can apply the deduction graph to visualise the expansion process and use LINGO software to obtain the optimal solution. Moreover, a knowledge network (we call it competence network) will be developed allowing managers to see various options to achieve their goals. Then, the optimisation programming could be used to help managers to find the optimal solution. The competence network also provides alternative paths to achieve a set goal. Thus, if the owner has more options for expanding its manufacturing process, it will be easier to make optimal decisions.