Dr. Ricardo Cunha, Institut für Umwelt & Energie, Technik & Analytik e. V. (IUTA)
In response to the increasing complexity of instrumental analysis data, current vendor processing software face user adoption challenges due to complicated user interfaces, the need to manually transfer data between software packages for specific purposes, and the lack of actual data processing algorithms. The open-source community is constantly striving to fill the gaps in vendor software with innovative data processing algorithms, but this often results in an overwhelming selection of loose scripts or even processing concepts in scientific publications that are useless to the power user. Non-target analysis (NTA) in environmental studies is an example where proprietary software lacks flexibility for different use cases, but the open-source community has developed innovative algorithms to mitigate this lack of flexibility. We therefore present the StreamFind R library to address these challenges by encouraging the development and integration of open-source algorithms into a harmonized platform for assembling interoperable analytical data processing workflows. At the same time, StreamFind aims to improve users' data literacy by providing a flexible, transparent and standard solution for gaining better insights from data. StreamFind is designed to be data agnostic, meaning that users can process different types of data (e.g., liquid chromatography (LC) coupled to UV, high resolution mass spectrometry (HRMS) and Raman spectroscopy (RS) data, tabular data and sensor data acquired via open communication protocols such as LADS OPC UA) within the same interface. We have developed a standardized and data agnostic workflow representation that follows FAIR principles. The workflow standardization will be demonstrated for NTA based on LC-HRMS and for the identification and quantification of monoclonal antibodies using LC-UV and LC-HRMS, respectively. In addition, quality assessment of (bio)pharmaceuticals using an innovative LC-RS dataset combined with machine learning will be shown to demonstrate the interoperability and potential of the workflows in StreamFind.
The StreamFind R library is available for installation from the ODEA project's GitHub repository (https://github.com/odea-project/StreamFind), accompanied by extensive documentation, tutorials and examples. Collaborative contributions to the project and the integration of additional open-source algorithms are encouraged, fostering a collective effort to advance data processing.
