Palestras e Seminários

20/09/2023

14:30

Auditório Fernão Stella de Rodrigues Germano - ICMC

Palestrante: Genoveva Vargas Solar

Responsável: Cristina Aguiar (Este endereço de email está sendo protegido de spambots. Você precisa do JavaScript ativado para vê-lo.)

Modo: Presencial

Salvar atividade no Google Calendar

Data science pipelines from head to toes: a formal & executable tool for all 
Status, problems, challenges, and open issues 1
 
Vast collections of heterogeneous data have become the backbone of scientific, analytic, and
forecasting processes. It is possible to compute mathematical models to understand and
predict phenomena by combining simulation techniques, artificial vision, and artificial
learning with data science techniques. Data must go through complex and repetitive
processing and analysis pipelines, namely data science pipelines, to achieve this ambitious
objective.
A data science pipeline is a set of processes that convert raw data into actionable answers to
research/business questions to provide insights and solutions [to experimental sciences]
problems and enable data-driven decisions. The objective is to automate the process of extracting
data from multiple sources, cleaning and transforming it, analyzing it, and presenting the results
in an understandable format. Data science pipelines can include machine learning, statistical
and numerical models, and data visualization and interpretation tools. Data scientists use
pipelines to automate the process flow automating repetitive tasks from raw data to
[scientific]/business insights to enable the reproducibility of results and share workflows
with other communities. 
Various frameworks are available for enacting data science pipelines, depending on the
project's specific needs. Some popular frameworks include Apache Airflow, Prefect,
Kubeflow, and MLFlow. The enactment of data science pipelines must balance the delivery of
different types of services such as (i) hardware (computing, storage, and memory), (ii)
communication (bandwidth and reliability) and scheduling, (iii) greedy analytics and mining
with high in-memory and computing cycles requirements. 
This talk introduces critical challenges and current results regarding the development of
data science pipelines and insists on how to consider efficient enactment strategies to
explore experimental sciences problems that can go beyond available analytics scales and
contribute to performing continuous online data-centric sciences experiments.
Speaker:
Genoveva Vargas Solar (http://www.vargas-solar.com) is a French Council of
Scientific Research (CNRS) principal researcher. She is a member of the DataBase
group of Laboratory on Informatics on Image and Information Systems (LIRIS). She
is a regular member of the Mexican Academia of Computing ( AMEXCOMP ). Her
particular education includes two PhDs and two master’s degree respectively in
Computing Science and Compared Literature (Mythocritics and mythanalysis) from
University of Grenoble, and several certificates on feminist and gender studies from
the National Autonomous University of Mexico (UNAM).
Genoveva Vargas-Solar is a gender equity officer of the G ender Equity Commission  at the
LIRIS lab. She represents EDBT Endowment (a major European conference in databases) in
the D&I database interconference initiative. She is a member of the  Tierra Común  activist
group and participates in the European project  Gender STI  as part of the CNRS partner
group. 
She contributes to the construction of service-based database/data science
management systems. The objective is to design data science workflows, new
queries, and enactment services guided by Service Level Objectives (SLO). Her work
mainly addresses data science queries exploiting graphs. She proposes query
evaluation methodologies, algorithms, and tools for composing, deploying, and
executing data science functions on just in time architectures (disaggregated data
centres). She conducts fundamental and applied research activities for addressing
these challenges on different architectures ARM, raspberry, cluster, cloud, and HPC.

CONECTE-SE COM A GENTE
 

© 2024 Instituto de Ciências Matemáticas e de Computação