Codescoop is a Finnish startup with a Spanish origin that provides an open-source tool for analyzing, improving, and managing Big Software.
The problem was collecting data on hundreds of thousands of open-source components from various resources such as GitHub, GitLab, HackerNews, NPM. The parsers that functioned in Codescoop did not collect metadata as quickly as was required to maximize the productivity of the service, which prompted the search for better solutions.
Code components update for software product optimizing of Coodescoop customers typically was performed by looking for fresh releases in open sources. As a result, one of the key problems of corporate clients was the conflict over code components at the license compliance level and the slowdown in a software release.
The practice has shown that regular databases like MongoDB did not cope with the task of storing and quick searching among 100 million components. Therefore, it was necessary to search and implement more specific solutions for the storage of time series.
Finding the relevant code points based on specific criteria took a long time during the work with the variety of open-source components from Coodescoop clients. An automation tool was required to optimize this process because it would help to perform a flexible search of the necessary components in all open sources.
In the conditions of constant improvement and testing of the software product, it was required to deploy the test environment regularly. Maintaining regular functioning would require a constantly working cloud infrastructure, which would significantly increase costs. Therefore, the Codescoop team needed a solution that would allow them to launch systems directly at the time of testing.
In the conditions of constant improvement and testing of the software product, it was required to deploy the test environment regularly. Maintaining regular functioning would require a constantly working cloud infrastructure, which would significantly increase costs. Therefore, the Codescoop team needed a solution that would allow them to launch systems directly at the time of testing.
The implementation of the tool made it possible to automate the search for conflicts between licenses of open-source components and corporate requirements. Thanks to this, it was possible to accelerate the release of software products, minimize the risks of inconsistency with license compliance, and save corporate customers a significant part of the budget.
All the necessary infrastructure for raising the service was described in Terraform, which made installing on-premise solutions for large companies easy. As a result, the project team accelerated work on corporate projects several times, thanks to personal and flexible infrastructure management for urgent tasks.
To store the development history of the components, we used daily time series based on the data for the last 3 years. An array of 100 thousand components represented about 100 million time series, which was later used for various ML algorithms. To speed up work with large amounts of data and improve performance under high loads, we decided to use Google Bigtable, and later - HBase.
FTL has implemented the elasticsearch tool, which enables scalable, multi-threaded searches based on various metrics and component characteristics. As a result, developers got a possibility to find relevant elements for further work with them in a couple of clicks in the process of improving the software product.
To minimize costs, specialists developed an algorithm for automating the deployment of about 20 microservices and the necessary infrastructure. Thus, the test environment was active only at the time of direct work with the components and architecture, which significantly reduced the costs for the cloud infrastructure.
Considering that the project technical and software infrastructure had a throughput of 1 million components, specialists developed GO crawlers to withstand the required load. It became possible to significantly simplify and speed up the process of analyzing software product vulnerabilities.
countries covered by the Codescoop project
senior developers
largest business hubs supported in Finland
Development of convenient UI / UX system
The project team created a unique visualization. Thanks to it, even in the early stages of the software product life cycle, you can get a comprehensive analysis of the Big Software technical stack in diagrams and metrics. This saves a significant portion of the budget and time for managing critical operations, fixing failures, and implementing changes in the later stages of the product life cycle.
Development of convenient UI / UX system
Definition of key metrics and clustering of components into groups
Software Intelligence not only includes cross-system data collection algorithms but also allows you to analyze vulnerabilities and propose solutions to improve the software product. The Codescoop team, together with FTL specialists, developed a unique tool based on Python, machine learning, and artificial intelligence methods to create predictions for the development of software components in the time and technical plane. With its help, Big Software developers can make timely tactical and strategic decisions to improve and change the software product depending on the requirements.
Definition of key metrics and clustering of components into groups