Cerved improves data quality and reduces costs with serverless AWS machine-learning
Cerved is a leading provider of business information in Italy and a leading rating agency in Europe. The company helps businesses, banks, institutions and individuals protect themselves from risk and achieve sustainable growth. Thanks to its unique repository of data and analytics, Cerved offers clients services, advice and digital platforms to manage risks and support data-driven growth.
The Challenge
Cerved wanted to improve accuracy, make maintenance easier, and gain the ability to quickly extend the functionality of its media monitoring service. Another key reason for moving to AWS was cost savings: moving to an operational expenditure (opex) approach to IT spending would eliminate the need for expensive on-premises infrastructure that is underutilised outside of peak periods. “Managing completely predefined environments simplifies development,” says Gabriele Sotto, data scientist at Cerved. “This approach allows us to be flexible and independent.” After starting the project in mid-2020, Cerved initially focused on building and implementing the new machine learning models for three main components of its media monitoring service for Italian companies, which categorise business articles by types of business events, recognise companies with different economic and financial activities in Italy, and recognise geographic locations across Italy.
Simplify Machine Learning Model Development
With the support of AWS Partner Claranet Consulting Services, Cerved was able to move beyond its expensive and inflexible on-premise , rules-based solution for tagging and categorising news articles. It now uses a serverless AWS infrastructure that simplifies the development, training, deployment, and maintenance of machine learning models for real-time automated media monitoring in the production environment.
Amazon Kinesis Data Firehose then collects information from the classification steps and ingests the results into an Amazon OpenSearch Service index. The results and article classifications are then presented to Cerved's editorial team for manual review through a customised user interface.
Building MLOps Skills
One challenge Cerved faced was that while they had strong in-house data scientist and data engineering skills, they lacked the DevOps skills for MLOps machine learning. This is where Claranet’s experience and expertise in DevOps and MLOps really helped support the project with advice on everything from API implementation to solution architecture.
Claranet helped Cerved design and automate the deployment of the machine learning models developed through serverless-as-code infrastructure. Claranet is also helping Cerved plan and design the monitoring and retraining pipelines for the machine learning models.
Claranet used a training operations approach to provide a learning path to develop Cerved’s AWS internal skills and expertise in these areas. “We provided some courses on big data and machine learning,” says Gianluigi Mucciolo, Senior Solutions Architect at Claranet.
A purpose-built ecosystem for machine learning
The main AWS services that Cerved uses in this project are AWS Lambda and Amazon Kinesis. It also uses Amazon Kinesis Data Streams for the different components of the media monitoring service that collects news articles from its many sources. Amazon SageMaker supports the machine learning tasks, where there is a training pipeline for many independent binary classification models.
These are then deployed as AWS Lambda layers. The different AWS Lambda functions then classify the news using multi-label classification, based on different categories of news topics. The core part of the system also matches and recognises companies and business entities based on custom neural networks and Cerved's largest Italian business information ecosystem, which includes more than six million active Italian companies. Through another custom model for NER (called entity recognition) the system recognises the locations mentioned in the articles as it draws from external sources such as the National Institute of Statistics.
“The big difference in using AWS services versus our previous on-premises systems is that the AWS ecosystem provides us with machine learning models, integrates those processes into our broader system, and manages every part of our pipeline from training to deployment,” Tavolaro explains. “It’s very easy to do. We’re challenging the standard approach to MLOps today by using serverless to give our teams better cost management and faster delivery of the artifact.”
The AWS ecosystem is making our system flexible and easier to maintain, as well as providing better quality for our customers and creating cost savings for Cerved.
In summary
With Claranet’s help, Cerved reduced infrastructure costs and improved news article categorisation accuracy for its media monitoring service by 25% using machine learning models. This was achieved by moving from on-premises systems to a serverless AWS environment for machine learning development.
Since implementing the redesigned machine learning models using the AWS serverless development environment, Cerved has achieved an average improvement of 25 percent in how accurately and precisely it automatically labels and categorises articles before they are sent to a team of editors for manual review . “This translates into time savings for the editorial team because fewer articles that have been mislabeled need to be removed,” says Divna Djordjevic, Data Scientist at Cerved. “And in the long run, this translates into cost savings and also allows the editorial team to focus on more difficult tasks.”
Another major benefit of using AWS is the cost savings on infrastructure, compared to the previous on-premise system. “Now, in the cloud with a serverless AWS solution, we can use the system only when we need it during the two to three hour period when the news breaks,” says Daniele Tavolaro, Data Engineer at Cerved. “So we only pay for the actual use during that period.” All of this helps Cerved provide better data quality, which helps customers make better decisions and ensure more sustainable growth.
Based on the success of the project to date, Cerved plans to expand its use of the MLOps serverless environment to add more machine learning models to other components of its media monitoring service. It also plans to expose these capabilities through APIs to offer new product lines for customers.
Find out about more cloud solutions tailored to your business needs today.