4. Training and Getting Started
For prospective data scientists looking to add IoT to their expertise, the usual data science skillset remains extremely relevant. Industrial data scientists should be strong in math and statistics, adept at executing quality cross-validation, experts in developing software in the core languages R and Python, and able to communicate analyses effectively to many audiences, whether it is a mechanic or an executive.
Following our recommendations in Section 3, prospective data scientists will need additional focus in interacting with experts. Industrial analytics isn't the only place where interfacing with experts is important. For example, data scientists working on medical applications may communicate directly with medical doctors. However, given the data challenges described in Section 3, and especially for predictive maintenance problems, it can be absolutely critical for data scientists to interface with an industry expert. Importantly, they must be able to do this while maintaining overall control of how a problem is being solved. To borrow a phrase from Meng, data scientists should strive to be "Proactive co-investigators/partners, not passive consultants". For the practicing data scientist, this means bringing data and plots to conversations with specific research questions in mind. Conversely, data scientists should avoid statements like: 'the expert said X, so I did X'; or questions like: 'does the expert want Y in the model?' Follow-up questions like 'was X justified by the data and our understanding of the problem?' or, 'does Y lead to any substantive improvements in the model?' will help data scientists create better solutions.
Given the importance of interaction with subject matter experts, data scientists with additional experience with machines and mechanics can significantly speed up model building as well. We have seen many cases where just knowing relative locations of components on a machine has potentially saved weeks of model-building time. To jumpstart this process, we have sent data scientists to formal training events intended for mechanics and other heavy equipment analysts.
For those looking to get their hands on sample data, NASA collects a number of data sets that track devices as they fail in either simulations or lab experiments. Turbofans, bearings, and batteries are some examples of data sets that are open to the public. These data sets are great for practicing cross-validation and playing with methods to find early patterns of failure in these devices. However, many of the data issues we mentioned in Section 3 may not be present in lab experiments. Practitioners getting started with these data sets should keep this in mind to make sure their methods do not become ineffective in real data scenarios.
Industry analysts have high hopes that IoT will bring transformations to many traditional industries. Using IoT data to change how heavy equipment is operated and maintained is a part of this expected transformation. While heavy equipment may not be many data scientists' traditional area of application, a passive approach to solving problems in this area may ultimately fall short of creating transformation. Data scientists will be successful in helping realize this future if they play proactive roles in defining the right problems, gathering the right data, and taking the lead in communication.