How to build, run and manage scalable data analytics and AI for enterprise

In Part 2 of our mini-series focused on building a data analytics workflow, we examine how data pipeline development can unlock real business value

The key challenge is how to provide a scalable platform that can provide AI or advanced analytics across the enterprise.

Artificial Intelligence (AI) and advanced analytics are helping to transform the Financial Services Industry (FSI) in the digital era. Many organisations are already making use of these technologies across a wide range of applications from risk management, fraud prevention and high-frequency trading to robotic process automation and personalising the customer experience.

Financial organisations have always been at the leading edge of technology due to the nature of their business. They are known for adopting newer technology much earlier than other industries. However, with progress can come barriers and the financial world has reached the challenge of how to deal with scalable AI and advanced analytics much earlier than other sectors.

"The key challenge is how to provide a scalable platform that can provide AI or advanced analytics across the enterprise," said Parviz Peiravi, Global CTO/Principal Engineer, Financial Services Industry at Intel. "The value of AI and analytics can only be unlocked if organisations work out how to operationalise data at scale."

 

Data preparation and mining

The first step is data discovery, but once the data is in place, what comes next? Businesses must decide how to develop this pipeline in order to gain real business insights from AI and analytics.

After data discovery, the second step in the process is deciding what type of data to use, and what features are important for each AI or analytics model. Based on the use case and the model needed, relevant features of the data can be chosen. It's then necessary to work out how to extract these features from the raw data and how to split it into usable training and test datasets.

The third step is feature selection engineering and pattern mining. Data silos are common and if the datasets are from different sources, some of the features may be missing so it's necessary to figure out how to fill those gaps. Pattern mining is then carried out to identify characteristic patterns within the data to determine what the model is going to explore. For step four, data is then transformed into a working set so that it can feed directly into the algorithm to generate the model. This is all part of the data preparation process. In order to develop a usable AI or analytics model, these extra steps are all vital after the initial data gathering and preprocessing. 

These data preparation steps are typically manual. Today, there are some aspects of the process that have been automated, but the challenge of how to fully automate the feature selection process is a hot topic. This is especially challenging as the models often require thousands of features. As a result, building an infrastructure that can support automated data preparation for model development is a key focus right now. This process of data preparation is all part of the DataOps workflow approach.

 

Analysis and model development

Once the data preparation stage is complete, we move on to step five – model development, where the model is trained using the dataset. After development, each model must then be tested and evaluated. The next challenge is how to put it into production. For an enterprise with hundreds of these models, this means hundreds of data pipelines.

Step six looks at how to operationalise hundreds of models. Not only do businesses have to ensure accuracy of the models, they also have to ensure that they are explainable and that they adhere to compliance. These models must be constantly monitored so that their level of performance and accuracy can be assessed, according to the requirement.

"As the data changes over time, the model behaviour may change as well, potentially producing inaccurate output," Peiravi. "This is known as model drifting, another aspect of the process which must be managed. And with thousands of models to manage, it's vital that businesses know how to find them. Repositories and registries are needed so that a certified, registered model can be run in a production environment, similar to the 'gold image' concept for virtual machines that has been used in IT for years."

This ModelOps approach looks at how businesses can operationalise a model from the stage where it is developed by data scientists to the point where it is handed over to the IT operational team to run it in production. ModelOps and DataOps are key to building, running and managing scalable AI and analytics for enterprise, using a CI/CD pipeline.

"What we are trying to do here is applying a software development philosophy, tooling and process to model development and data pipeline development,” said Peiravi.

The challenges of operationalising AI and analytics at scale

This process is still far from smooth, with organisations encountering a number of obstacles that prevent them from successfully operationalising AI projects. Some 47 per cent of respondents cite difficulty deploying into business processes and applications as the main barrier to delivering business value, according to a Gartner report1. The other most common barriers given included cultural resistance, lack of DevOps process, relevant skills and the inability to adequately secure or govern data and analytics inputs or outputs. Other key obstacles include, poor planning, lack of executive support and funding and the inability to address data quality and integrity issues, says the report.

Organisations also seem to have trouble with working collaboratively on the important steps needed to develop analytics and AI models, according to TDWI research2. Participants were asked to rate how well their organisation's stakeholders collaborate to complete key steps in the life cycles of AI and analytics projects. Results showed that they were strongest at identifying relevant data sources and identifying opportunities to achieve business benefits. However, only a very small percentage rated their organisations as 'excellent' while less than half (42%) gave high ratings for collaboration on development and testing of analytics models.

Overcoming these challenges will be vital for managing scalable data and AI in future. Organisations must also have suitable underlying infrastructure in place. Intel is providing fundamental technology to enable FSI businesses to build a solid, scalable platform, including everything from networking and storage technologies to computing infrastructure and memory technologies. Intel also offers specific technology for AI and advanced analytics, such as optimised frameworks, libraries and tools for open source and commercially available solutions.

"Intel technology covers all four of the crucial layers needed for AI and advanced analytics –

hardware infrastructure, system software, frameworks and applications," said Peiravi. "This enables FSI businesses to build a scalable platform to offer different capabilities needed for different projects." In addition, Intel is continuously working with a large number of ecosystem partners to launch new technologies which address the ever-increasing challenges of advanced analytics and AI for financial services industry.

In order to unlock the full potential of AI and advanced analytics, organisations may well need to move away from the process that they have used in the past. The challenge of operationalising advanced analytics and AI at scale demands a new approach. Taking a DataOps and ModelOps approach will enable FSI businesses to create agile, automated analytics and AI models that can be cost effectively deployed in a scalable and managed process according to internal policy and regulation. This will enable organisations to improve their data-driven decision making, resulting in real business value. While this approach is still a relatively new concept, it is something that is likely to be adopted on a much wider scale by businesses in the financial sector and beyond in the coming months and years.

In case you missed it, check out Part 1 - Unified data: Your business needs one beautifully connected house, not four random buildings – to find out more about the initial data discovery phase and the challenge of unifying data.

Product and Performance Information

1How to Operationalize Machine Learning and Data Science Projects: https://www.gartner.com/en/documents/3880054/how-to-operationalize-machine-learning-and-data-science-
2TDWI Best Practices Report: Faster Insights from Faster Data: https://tdwi.org/whitepapers/2020/04/data-all-sas-tdwi-bpr-faster-insights-from-faster-data.aspx