4. Success Factors and Challenges

4.1. Success Factors

Good implementation of big data solution to provide data analysis for marketing and business processes requires the following design considerations:

i. Recognizing elements of Gartner's vector model by identifying the characteristics of big data.

ii. Consider solutions from a number of major vendors like Cloudera, Hortonworks, IBM, and MapR and choose the solution that will support the environment to meet business objectives. The culture of big data giants like Amazon, Google, and Facebook should be considered as well.

iii. Identifying the risks of open source software. Evaluate competing solutions based on any or all of the criterion for their development, deployment and response times. Therefore, knowing which NoSQL database works best with which data type is essential.

iv. Recognizing Hadoop cluster elements and their functions.

v. Create a secure analytics platform to deliver data-driven insights to business users across the group.

vi. Develop a set of core requirements for its analytics platform. For example, the core requirements could focus on function, cost and time.

vii. Proof of concept is needed to reduce risk in the implementation process. This exercise confirms how the performance and scalability of the solution chosen will assure meeting the targets set at the beginning of the big data project. This exercise is done jointly by the solutions' expert with the organization IT team.

Moreover, big data and business analytics teams play a vital role in the success of a big data analytics project. The size and complexity of big data technology require highly motivated team members, who are smart and determined. In addition, a successful implementation of the big data solution requires a good team member with the right mission statement. As a result of the complexity of the supercomputing platform, all levels of the team must have an in-depth mastery of the big data ecosystem. Organizations must invest the time and money in developing their own expertise in the big data analytics team and computing infrastructure. The big data time members and their roles are briefly described below and include:

(a) The business team: It comprises of

i. Executives/Stakeholders: Executives are leaders in their business and industry, they generate business strategy and goals, find opportunity in crises, and lead through incidents. They also have the flexibility to pilot and have a strong overview of the big data ecosystem.

ii. Product Managers/Data Stewards: They provide leadership to achieve business goals and understand data, its value and limitations. Furthermore, they identify and define risk, are open to new opportunities, and maintain a working knowledge of the big data ecosystem.

(b) The analytics team
i. Data Scientist: The data scientist should be an academic scientist, a subject-matter expert in their area of business, and possess advanced skills in mathematics and statistical modeling. Moreover, a data scientist should be focused on research, analytic approaches and should be skilled in statistical programming languages.

ii. Business Analysts/Data Analysts: They should be aligned with business goals and directions. They produce a detailed analysis for business, report on data quality, and are skilled in a wide range of data modeling and data analytics tools. They have a working knowledge of the big data ecosystem.

(c) The Big Data architects team

i. Global architect/platform engineers: They are subject matter experts in supercomputing platforms and are skilled in data architecting. They are specialists in applicable use cases, outstanding in root cause analysis and are exceptional in performance tuning. A global architect has a broader knowledge of the big data ecosystem, while a platform engineer has a deeper understanding of the software running the supercomputing platform. Both need a good understanding of the data being ingested and digested by the distributed computing environment.

ii. Data architect/data wranglers: They possess industry knowledge, strong skills in mathematics and statistics, and are specialists in applicable use cases. They are also subject matter experts in data analytics, data visualization, NoSQL, and ETL.

(d) The Big Data Hadoop operators team
The real frontline troops in managing and operating a Hadoop cluster are;

i. Hadoop engineers: They are subject-matter experts in supercomputing platforms and experts in Java and Python. They can write and deploy Hadoop jobs, knowledgeable in the Hadoop cluster performance and implementation, and proficient in debugging and troubleshooting.

ii. Hadoop operators: They are subject-matter experts in the Hadoop cluster, Linux systems, and networking. They are also skilled in Kerberos, experts in troubleshooting, proficient in performance tuning, and knowledgeable in DC hardware.

In addition, the big data Hadoop operator team must have in-depth knowledge and experience working with the supercomputing platform.

In addition, the aforementioned considerations, the organization should also consider switching to global architecture where a supercomputing platform is best operated as a single entity and components are tightly coupled together rather than the usual enterprise architecture where each layer of the stack of component and clearly defines boundaries. Furthermore, the single team should have the responsibility for both the development and the operation of the supercomputing. The management, the platform engineers, the software developers, and the operators should work in a single team, frequently in a single location. The proximity of the team helps the team build working knowledge. This working attitude is the secret of the big data giants such as Yahoo, Google, and Facebook. Moreover, mastering the big data ecosystem by the team is another key criterion. Big data is built on the principles of supercomputing. The complexity of this platform mandates sophisticated knowledge of the big data ecosystem at all levels of the team. The operators, the engineers, the architects, and the business managers must be well versed in the big data ecosystem. Everyone involved in the project must be a generalist with a solid understanding of how a supercomputing platform works, and this includes the stakeholders and executive management. Organizations must invest the time and money in developing their own expertise in their supercomputing platform.