Founded in 2018, we focus exclusively on enabling Enterprise AI @ scale. In a very short span of time, we have become one of the fastest growing companies in America.
We focus on questions that matter to businesses with big ambitions, empowering them to elevate outcomes across their value chain.
Services
Platforms
Mid-level Senior
Mumbai
Mid-level
Remote
The client is globally the largest repository that manages standard reference microorganisms, cell lines, and other materials for research and development purposes.
The client being a leading provider of biological materials, and standards, wanted to collect the citation details for its million+ products from different sources and link them in the organization’s portal.
The client needed a solution that would consolidate the citation details for the client’s products and make them available on their portal for easy access. The objective of the initiative was to develop a web crawling solution that would extract the citation details for client’s products from various sources, such as PubMed and EuropePMC, on a weekly basis. The solution would also extract author profiles and concerned hyperlinks and download and store research articles wherever applicable.
AiRo proposed to adopt Artificial intelligence by combining multiple AI technologies to create a solution that would yield the best possible benefits for clients. The proposed solution architecture included several layers that would extract information from different sources, download it, convert it to a common format, and insert it into a database.
The layers would use bespoke Python programming and Robotic Process Automation (RPA) to implement the web crawling solution. The proposed methodology included downloading data for each source, extracting information for each source, converting it to a common format, and inserting it into a database.
The solution architecture proposed by AiRo included several layers, such as the extraction of information, download of data, conversion to a common format, and database operations. The layers would use Python scripts and SQL Server DB operations to implement the web crawling solution.
The solution would use regex-based search on the data to extract the information and custom logic to convert it to a common format. The solution would also include pre-processing and cleaning of data where required. The web crawler would perform API calls with input query and date range and split the date range into months to download data iteratively through API calls with a gap of 3 seconds as suggested by some sources of information like PubMed and EuropePMC.
AiRo managed quality, risk and issues, change management, and governance, reporting, and communication plan. The web crawling solution was implemented in three environments: Development, Test, and Production.
The web crawling solution successfully fetched the citation details for client’s products from various sources on a weekly basis and uploaded them back to the client’s portal. The web crawler consolidated the citation details for client’s products and made them available on the client’s portal for easy access by users.
Extracted 2.4 Million citations within 2 months
Data integration with internal databases and E-Commerce CMS
Region*USUKIndiaOthers
I agree with Privacy policy. By clicking submit, you consent to allow Airo to store and process the personal information submitted above to provide you the content requested.
This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.
Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.