diiP Seminars

The diiP Distinguished Lectures, as well as Seminars + Hands-On Workshops, are organized the 1st Wednesday of each month at 4pm (CET: Paris time). The seminars focus on different topics around data analytics, data science and data intelligence (including data management, machine learning, and deep learning).

In the Distinguished Lectures series, diiP will host invited speakers that are internationally recognized for their research and/or applied work, who will talk about their latest results.

The Seminars + Hands-On Workshops are animated by the diiP associate researchers, as well as by international experts in areas related to diiP. Several of the seminars include hands-on workshops, where participants will have the chance to learn how to use the techniques described in the seminar.

Please see below for the detailed agenda. The material related to the talks will appear below, as well.

Agenda

If you’re interested in attending a lecture or a workshop, please register (for free) by emailing us at diip[at]math-info.univ-paris5.fr with the date and title of the seminar you are interested in. You may also find below recordings and materials from previous seminars that already took place.

Distinguished Lectures

June 5, 2024: Synergy of Graph Data Management and Machine Learning in Explainability and Query Answering

Synergy of Graph Data Management and Machine Learning in Explainability and Query Answering

Who: Prof Arijit Khan (Aalborg University, Denmark)
When: June 5, 2024, at 4 PM (Central European time)
Where: Online (Zoom)

Abstract: Graph data, e.g., social and biological networks, financial transactions, knowledge graphs, and transportation systems are pervasive in the natural world, where nodes are entities with features, and edges denote relations among them. Machine learning and recently, graph neural networks become ubiquitous, e.g., in cheminformatics, bioinformatics, fraud detection, question answering, and recommendation over knowledge graphs. In this talk, I shall introduce our ongoing works about the synergy of graph data management and graph machine learning in the context of graph neural network explainability and query answering. In the first direction, I shall discuss how data management techniques can assist in generating user‐friendly, configurable, queryable, and robust explanations for graph neural networks. In the second direction, I shall provide an overview of our user‐friendly, deep learning‐based, scalable techniques and systems for querying knowledge graphs.

Bio: Arijit Khan is an Associate Professor at Aalborg University, Denmark. His PhD is from University of California, Santa Barbara, USA, and he did a post-doc in the Systems group at ETH Zurich, Switzerland. He has been an assistant professor in the School of Computer Science and Engineering, Nanyang Technological University, Singapore. His research is on data management and machine learning for the emerging problems in large graphs. He is an IEEE senior member and an ACM distinguished speaker. Arijit is the recipient of the IBM Ph.D. Fellowship (2012-13) and a VLDB Distinguished Reviewer award (2022). He is the author of a book on uncertain graphs and over 80 publications in top venues including ACM SIGMOD, VLDB, IEEE TKDE, IEEE ICDE, SIAM SDM, USENIX ATC, EDBT, The Web Conference (WWW), ACM WSDM, ACM CIKM, ACM TKDD, and ACM SIGMOD Record. Dr Khan is serving as an associate editor of IEEE TKDE 2019-2024 and ACM TKDD 2023-now, proceedings chair of EDBT 2020, IEEE ICDE TKDE poster track co-chair 2023, and ACM CIKM short paper track co-chair 2024.

Logistics:
Please contact diip[a]math-info.univ-paris5.fr to register and get access to the Zoom link.

Material:
Recording.

April 3, 2024: Retrieval Augmented Generative Question Answering for Personal Assistants

Retrieval Augmented Generative Question Answering for Personal Assistants

Who: Dr. Alessandro Moschitti (Amazon, USA)
When: April 3, 2024, at 4 PM (Central European time)
Where: Online (Zoom)

Abstract: Recent work has shown that Large Language Models (LLMs) can potentially answer any question with high accuracy, also providing justifications of the provided output. At the same time, other research work has shown that even the most powerful and accurate models, such as ChatGPT 4, generate hallucinations, which often invalidated their answers. Retrieval-augmented LLMs are currently a practical solution that can effectively solve the above-mentioned problem. However, the quality of grounding is essential in order to improve the model, since noisy context deteriorates the overall performance. In this talk, after we introduce LLMs and Question Answering (QA), and we will present our experience with Generative QA, which uses basic search engines and accurate passage rerankers to augment relatively small language models. We will provide a different but more direct interpretation of retrieval augmented LLMs and contextual grounding. Finally, we will show our latest techniques for Reinforcement Learning from Human Feedback proposed for fine-tuning LLMs that we developed in contemporary with the main stream effort of OpenAI.

Bio: Alessandro Moschitti is a Principal Research Scientist of Amazon Alexa, where he has been leading the science of Alexa information service since 2018. He designed the Alexa Question Answering (QA) system based on unstructured text and more recently the first Generative QA system, which extends the answering skills of Alexa. He obtained his Ph.D. in CS from the University of Rome in 2003, and then did his postdoc at The University of Texas at Dallas for two years. He was professor of the CS Dept. of the University of Trento, Italy, from 2007 to 2021. He participated to the Jeopardy! Grand Challenge with the IBM Watson Research center (2009 to 2011), and collaborated with them until 2015. He was a Principal Scientist of the Qatar Computing Research Institute (QCRI) for five years (2013-2018). His expertise concerns theoretical and applied machine learning in the areas of NLP, IR and Data Mining. He is well-known for his work on structural kernels and neural networks for syntactic/semantic inference over text, documented by around 350 scientific articles. He has received four IBM Faculty Awards, one Google Faculty Award, and five best paper awards. He was the General Chair of EACL 2023 and EMNLP 2014, a PC co-Chair of CoNLL 2015, and has had a chair role in more than 70 conferences and workshops. He is currently a senior action/associate editor of ACM Computing Survey and JAIR. He has led ~30 research projects, e.g., a 5-year research project with MIT CSAIL.

Logistics:
Please contact diip[a]math-info.univ-paris5.fr to register and get access to the Zoom link.

Material:
Recording.

March 6, 2024: On Data Ecology, Data Markets, the Value of Data, and Dataflow Governance

On Data Ecology, Data Markets, the Value of Data, and Dataflow Governance

Who: Prof. Raul Castro Fernandez (University of Chicago)
When: March 6, 2024, at 4 PM (Central European time)
Where: Room Turing Conseil (7th floor, Universite Paris Cite, 45 rue des Saints Peres, Paris 75006), and Online (Zoom)

Abstract: Data shapes our social, economic, cultural, and technological environments. Data is valuable, so people seek it, inducing data to flow. The resulting dataflows distribute data and thus value. For example, large Internet companies profit from accessing data from their users, and engineers of large language models seek large and diverse data sources to train powerful models. It is possible to judge the impact of data in an environment by analyzing how the dataflows in that environment impact the participating agents. My research hypothesizes that it is also possible to design (better) data environments by controlling what dataflows materialize; not only can we analyze environments but also synthesize them. In this talk, I present the research agenda on “data ecology,” which seeks to build the principles, theory, algorithms, and systems to design beneficial data environments. I will also present examples of data environments my group has designed, including data markets for machine learning, data-sharing, and data integration. I will conclude by discussing the impact of dataflows in data governance and how the ideas are interwoven with the concepts of trust, privacy, and the elusive notion of “data value.” As part of the technical discussion, I will complement the data market designs with the design of a data escrow system that permits controlling dataflows.

Bio: In my research, I ask what is the value of data and explore the potential of data markets to unlock that value. My group collaborates with economists, legal scholars, statisticians, and domain scientists. We build systems to share, discover, prepare, integrate, and process data. I have traditionally worked on distributed query processing systems and continue to do so. I have received a SIGMOD’23 Test-of-time-Award. I am an assistant professor in the Department of Computer Science and on the Committee of Data Science at The University of Chicago. Before UChicago, I did a postdoc at MIT with Sam Madden and Mike Stonebraker. And before that, I completed a PhD at Imperial College London with Peter Pietzuch.

Logistics:
Please contact diip[a]math-info.univ-paris5.fr to register and get access to the Zoom link.

Material:
Recording.

February 7, 2024: Responsible AI

Responsible AI

Who: Prof. Ricardo Baeza-Yates (Institute for Experiential AI @ Northeastern University)
When: February 7, 2024, at 4 PM (Central European time)
Where: Online (Zoom)

Abstract: In the first part, to set the stage, we cover irresponsible AI: (1) discrimination (e.g., facial recognition, justice); (2) phrenology (e.g., biometric based predictions); (3) limitations (e.g., human incompetence, minimal adversarial AI) and (4) indiscriminate use of computing resources (e.g., large language models). These examples do have a personal bias but set the context for the second part where we address three challenges: (1) principles & governance, (2) regulation and (3) our cognitive biases. We finish discussing our responsible AI initiatives and the near future.

Bio: Ricardo Baeza-Yates is Director of Research at the Institute for Experiential AI of Northeastern University. Before, he was VP of Research at Yahoo Labs, based in Barcelona, Spain, and later in Sunnyvale, California, from 2006 to 2016. He is co-author of the best-seller Modern Information Retrieval textbook published by Addison-Wesley in 1999 and 2011 (2nd ed), that won the ASIST 2012 Book of the Year award. From 2002 to 2004 he was elected to the Board of Governors of the IEEE Computer Society and between 2012 and 2016 was elected for the ACM Council. In 2009 he was named ACM Fellow and in 2011 IEEE Fellow, among other awards and distinctions. He obtained a Ph.D. in CS from the University of Waterloo, Canada, in 1989, and his areas of expertise are web search and data mining, information retrieval, bias on AI, data science and algorithms in general.

Logistics:
Please contact diip[a]math-info.univ-paris5.fr to register and get access to the Zoom link.

Material:
Recording.

December 6, 2023: Testing System Intelligence

Testing System Intelligence

Who: Prof. Joseph Sifakis (Verimag, CNRS)
When: December 6, 2023, at 4 PM (Central European time)
Where: Online (Zoom)

Abstract: We discuss the adequacy of system intelligence tests and practical problems raised by their implementation. We propose the replacement test as the ability of a system to replace successfully another system performing a task in a given context. We show how this test can be used to compare aspects of human and machine intelligence that cannot be taken into account by the Turing test. We argue that building systems passing the replacement test involves a series of technical problems that are outside the scope of current AI. We present a framework for implementing the proposed test and validating system properties. We discuss the inherent limitations of AI system validation and advocate new theoretical foundations for extending existing rigorous test methods. We suggest that the replacement test, based on the complementarity of skills between human and machine, can lead to a multitude of intelligence concepts reflecting the ability to combine data-based and symbolic knowledge to varying degrees.

Bio: Professor Joseph Sifakis is Emeritus Research Director at Verimag. He has been a full professor at Ecole Polytechnique Fédérale de Lausanne (EPFL) for the period 2011-2016. He is the founder of the Verimag laboratory in Grenoble, a leading laboratory in the area of safety critical systems that he directed for 13 years.
Joseph Sifakis has made significant and internationally recognized contributions to the design of trustworthy systems in many application areas, including avionics and space systems, telecommunications, and production systems. His current research focuses on autonomous systems, in particular self-driving cars and autonomous telecommunication systems. In 2007, he received the Turing Award, recognized as the “highest distinction in computer science”, for his contribution to the theory and application of model checking, the most widely used system verification technique.
Joseph Sifakis is a member of six academies and a frequent speaker at international scientific, technical and public forums.

Logistics:
Please contact diip[a]math-info.univ-paris5.fr to register and get access to the Zoom link.

Material:
Recording.

October 4, 2023: Deep Learning of Seismograms

Deep Learning of Seismograms

Who: Dr. S. Mostafa Mousavi (Google, Stanford University)
When: October 4, 2023, at 4 PM (Central European time)
Where: Online (Zoom)

Abstract: Seismology is the study of seismic waves to understand the sources of those waves – such as earthquakes, explosions, volcanic eruptions, glaciers, landslides, ocean waves, thunderstorms, etc.- and to infer the structure and properties of planetary interiors. The availability of large-scale labeled datasets and the suitability of deep neural networks for seismic data processing have pushed deep learning to the forefront of fundamental, long-standing research investigations in seismology. However, some aspects of applying deep-learning to seismology are likely to prove instructive for the geosciences more broadly. In my talk, I will present some of the recent progress in AI-based seismic monitoring and how they improve our understanding of Earth’s physical processes.

Bio: Mostafa Mousavi is a research scientist at Google and an Adjunct Professor at Stanford University. His research focuses on extracting insights about Earth and its physical processes from weak seismic signals through innovative methodological solutions. He is interested in pattern recognition in large sensor datasets and data-driven scientific discovery in earthquake seismology. He develops domain-aware algorithms, incorporating state-of-the-art techniques from signal processing, artificial intelligence, and statistics to extract scientifically valuable information from large-scale seismic datasets. His goal is to reach a deeper understanding of the information carried out by high-dimensional seismic data and use them to characterize source/path/site effects and understand the dynamics of the seismicity at local and regional scales. He received his Ph.D. from the University of Memphis in 2017 and was a postdoctoral fellowship at Stanford University from 2017 to 2019.

Logistics:
Please contact diip[a]math-info.univ-paris5.fr to register and get access to the Zoom link.

Materials:
Recording

June 7, 2023: Data Science Trends and Non-Trends: A Confluence of Complexity

Data Science Trends and Non-Trends: A Confluence of Complexity

Who: Prof. Michael J. Franklin (University of Chicago)
When: June 7, 2023, at 9:30 AM (Central European time)
Where: Room 580F, Halle aux Farines, 10 rue Françoise Dolto (registration is mandatory)

Abstract:The field of Data Science continues to expand in scope, scale and importance. New flexible data architectures are reducing the friction for collecting, managing and accessing data. Open data standards and cloud computing are enabling rapid innovation leading to new products and entire new product categories. ML and AI are finally showing their ability to solve problems for non-expert users across many fields. This expansion brings the potential for increased value, but also brings with it an increase in the intricacy of data environments in terms of performance, accuracy, explainability, and management. This talk will survey several of these trends and discuss their implications for users as well as for those of us who are developing educational and research programs in this fast moving landscape.

Bio:MICHAEL J. FRANKLIN is the Liew Family Chair of Computer Science and Sr. Advisor to the Provost for Computation and Data Science at the University of Chicago where he also serves as Faculty Co-Director of the Data Science Institute. Previously he was Thomas M. Siebel Professor of Computer Science at the University of California, Berkeley and served a term as Chair of the Computer Science Division. He was Co-Director of the Algorithms, Machines and People Laboratory (AMPLab) and is one of the original creators of Apache Spark, a leading open source platform for advanced data analytics and machine learning that was initially developed at the lab. He is a Member of the American Academy of Arts and Sciences and is a Fellow of the ACM and the American Association for the Advancement of Science. He received the 2022 ACM SIGMOD Systems Award with the team that developed Spark, and is a two-time recipient of the ACM SIGMOD “Test of Time” award. He holds a Ph.D. from the Univ. of Wisconsin (1993).

Logistics:
Please register here for the diiP workshop in order to attend.

April 5, 2023: The promise of language models for language sciences? Let's chat!

The promise of language models for language sciences? Let’s chat!

Who: Prof. Benoît Crabbé (Université Paris Cité)
When: April 5, 2023, at 4 PM (Central European time)
Where: Online (Zoom)

Abstract:

The field of Computational linguistics is currently is going through a period of paradigm shift.
Large foundational language models are now ubiquitous, with chat GPT creating the last buzz.

If you ask chat GPT its promises for the future of language sciences, you get the somewhat confident reply: “Large language models like myself hold great promise for the field of linguistics.
They offer improved language understanding, access to vast amounts of data, automatic language analysis, and the ability to test linguistic theories.
These tools can help linguists to gain new insights into how language works, identify patterns in language usage, and refine their linguistic theories.”

In this talk I will put these claims in perspective with some key modeling directions in computational linguistics: modeling language structure
and modeling language in relation with the world knowledge. And I will explain how we eventually end up with the current language models.
We will show that given what they are, current language models achieve sometimes surprising results with respect to the modeling of language structure
and highlight some potential research perspectives in language sciences and some of their current limitations.

Bio: Benoît Crabbé is professor of computational linguistics at the Université Paris Cité. He is head of the UFR Linguistics and affiliated in research to the LLF lab (CNRS and Université Paris Cité).
His research interests are in computational linguistics and more specifically in natural language understanding, natural language parsing and deep learning.
He is also involved in empirical and experimental issues in linguistics and in cognitive science related to modelling the structure of natural languages.

Logistics:
Please contact diip[a]math-info.univ-paris5.fr to register and get access to the Zoom link.

February 1, 2023: From Bounded Rationality to Ecological Rationality

From Bounded Rationality to Ecological Rationality

Who: Dr. Gerd Gigerenzer (Max Planck Institute for Human Development)
When: February 1, 2023, at 4 PM (Central European time)
Where: Online (Zoom)

Abstract:

Herbert Simon’s bounded rationality stands for a research program based on three principles: (i) to study the process of actual decision making (as opposed to as-if models of expected utility maximization), (ii) to study how decisions are made in situations of uncertainty and intractability (as opposed to risk and ambiguity alone), and (iii) to study how minds adapt to environments (as opposed to modeling solely the mind or the environment). These principles clashed with the doctrine of expected utility maximization. Bounded rationality was hijacked by economists who reinterpreted it to mean optimization under constraints, and by psychologists who reinterpreted it as the study of cognitive biases, that is, deviations from optimization. Thanks to this contradictory double-takeover, Simon’s revolutionary program was silenced. My colleagues and I have revived and extended Simon’s lost program, using the term ecological rationality to avoid any confusion. The study of ecological rationality is both descriptive and prescriptive. It investigates the repertoire of heuristics individuals or institutions have at their disposal (their adaptive toolbox) as well as the conditions under which each heuristic is successful and thus should be used, as measured by real-world criteria (the ecological rationality of heuristics). The study of the adaptive toolbox relies on observation and experimentation; the study of the conditions under which various heuristics should be used relies on mathematical analysis and computer simulation. This combination of descriptive and prescriptive analysis offers a novel perspective for decision making and cognitive science in general. It provides the proper tools for individuals and institutions to deal with everyday situations of uncertainty rather than risk.

Bio: Gerd Gigerenzer is Director of the Harding Center for Risk Literacy at the University of Potsdam, Faculty of Health Sciences Brandenburg and partner of Simply Rational – The Institute for Decisions. He is former Director of the Center for Adaptive Behavior and Cognition (ABC) at the Max Planck Institute for Human Development and at the Max Planck Institute for Psychological Research in Munich, Professor of Psychology at the University of Chicago and John M. Olin Distinguished Visiting Professor, School of Law at the University of Virginia. In addition, he is Member of the Berlin-Brandenburg Academy of Sciences, the German Academy of Sciences and Honorary Member of the American Academy of Arts and Sciences and the American Philosophical Society. He was awarded honorary doctorates from the University of Basel and the Open University of the Netherlands, and is Batten Fellow at the Darden Business School, University of Virginia. Awards for his work include the AAAS Prize for the best article in the behavioral sciences, the Association of American Publishers Prize for the best book in the social and behavioral sciences, the German Psychology Award, and the Communicator Award of the German Research Foundation. His award-winning popular books Calculated Risks, Gut Feelings: The Intelligence of the Unconscious, and Risk Savvy: How to Make Good Decisions have been translated into 21 languages. His academic books include Simple Heuristics That Make Us Smart, Rationality for Mortals, Simply Rational, and Bounded Rationality (with Reinhard Selten, a Nobel Laureate in economics). In Better Doctors, Better Patients, Better Decisions (with Sir Muir Gray) he shows how better informed doctors and patients can improve healthcare while reducing costs. Together with the Bank of England, he is working on the project “Simple heuristics for a safer world.” Gigerenzer has trained U.S. federal judges, German physicians, and top managers in decision making and understanding risks and uncertainties.

Materials
Recording

December 7, 2022: Self-designing Data Systems for the AI Era

Self-designing Data Systems for the AI Era

Who: Prof. Stratos Idreos (Harvard University)
When: December 7, 2022, at 4 PM (Central European time)
Where: Online (Zoom)

Abstract:

Data systems are everywhere. A data system is a collection of data structures and algorithms working together to achieve complex data processing tasks. For example, with data systems that utilize the correct data structure design for the problem at hand, we can reduce the monthly bill of large-scale data applications on the cloud by hundreds of thousands of dollars. We can accelerate data science tasks by dramatically speeding up the computation of statistics over large amounts of data. We can train drastically more neural networks within a given time budget, improving accuracy. However, knowing the right data system design for any given scenario is a notoriously hard problem; there is a massive space of possible designs, while no single design is perfect across all data, AI models, and hardware contexts. In addition, building a new system may take several years for any given (fixed) design.

We will discuss our quest for the first principles of AI system design. We will show that it is possible to reason about this massive design space. This allows us to create a self-designing system that can take drastically different shapes to optimize for the workload, hardware, and available cloud budget using a grammar for systems. These shapes include designs that are discovered automatically and do not (always) exist in the literature or industry, yet they can be more than 10x faster for modern AI and big data applications. We will discuss examples from diverse AI areas, including image storage and classification, neural networks, statistics, and big data systems.

Bio: Stratos Idreos is an associate professor of Computer Science at Harvard University, where he leads the Data Systems Laboratory. For his Ph.D. thesis on adaptive indexing, Stratos was awarded the 2011 ACM SIGMOD Jim Gray Doctoral Dissertation award and the 2011 ERCIM Cor Baayen award from the European Research Council on Informatics and Mathematics. In 2015 he was awarded the IEEE TCDE Rising Star Award from the IEEE Technical Committee on Data Engineering for his work on adaptive data systems, and in 2022 he received the ACM SIGMOD Test of Time award for the NoDB concept. Stratos is also a recipient of the National Science Foundation Career award and the Department of Energy Early Career award. Stratos was PC Chair of ACM SIGMOD 2021 and IEEE ICDE 2022, he is the founding editor of the ACM/IMS Journal of Data Science and the chair of the ACM SoCC Steering Committee. Finally, Stratos received the 2020 ACM SIGMOD Contributions award for his work on reproducible research.

materials:
recording

November 2, 2022: Computational design of enzyme repertoires

Computational design of enzyme repertoires

Who: Dr. Sarel Fleishman (Weizmann Institute of Science)
When: November 2, 2022, at 4 PM (Central European time)
Where: Online (Zoom)

Abstract: We recently developed methods that combine phylogenetic analysis and Rosetta atomistic design calculations to design highly optimized variants of natural proteins. Our methods have been used by thousands of users worldwide to generate stable therapeutic enzymes, vaccine immunogens, and highly active enzymes for a range of needs in basic and applied research. We now present a machine-learning strategy to design and economically synthesize millions of active-site variants that are likely to be stable, foldable and active. We applied this approach to the chromophore-binding pocket of GFP to generate more than 16,000 active designs that comprise as many as eight mutations in the active site. The designs exhibit extensive and potentially useful changes in every experimentally measured parameter, including brightness, stability and pH sensitivity. We also applied this strategy to design millions of glycoside hydrolases that exhibit significant backbone changes in the active site. Here too, we isolated more than 10,000 catalytically active and very diverse designs. Contrasting active and inactive designs illuminates areas for improving enzyme design methodology. This new approach to high-throughput design allows the systematic exploration of sequence and structure spaces of enzymes, binders and other functional proteins.

Bio: Sarel Fleishman is an associate professor at the Weizmann Institute of Science. His research team develops a computational protein-design methodology to address both fundamental and “real-world” challenges in biochemistry and protein engineering. As a postdoc with David Baker in Seattle (2007-2011), Sarel developed the first accurate methods for designing protein binders, culminating in the design of broad-specificity influenza inhibitors. At the Weizmann Institute (2011-), his team developed protein design methods to the level of accuracy and reliability required to design large and complex proteins such as enzymes, antibodies, and vaccine immunogens — a protein that was designed in the Fleishman lab has recently been approved for mass production as a vaccine for malaria. Among Sarel’s academic awards was the Clore Ph.D. Fellowship (2003-2006), the Science Magazine award for a young molecular biologist (2008), a postdoctoral fellowship (2006-2009) and a career-development award (2012-2015) from the Human Frontier Science Program, European Research Council Starting and Consolidator Grants (ongoing), the Alon Fellowship, the Henri Gutwirth Prize, and the Weizmann Scientific Council Award.

materials:
recording

May 4, 2022: Surface enhanced Raman scattering (SERS) sensors: combining machine learning and nanosciences

Surface enhanced Raman scattering (SERS) sensors: combining machine learning and nanosciences

Who: Prof. Jean-François Masson (Université de Montréal)
When: May 4, 2022, at 4 PM (Central European time)
Where: Online (Zoom)

Abstract: SERS and Raman spectroscopy yields large data sets with information-rich spectra. Classical linear methods are limited, especially for SERS spectra of single molecules, where the spectra are highly dependent on the orientation of molecules on surfaces and for large data sets. Methods from data sciences are increasingly used to classify spectra into categories and predict SERS spectra for new data based on trained algorithms. For example, we recently introduced the concept of SERS optophysiology, which combines a SERS nanosensor on the tip of a pulled fiber to provide spatially and temporally specific molecular information near or inside biological material. To accomplish this, a SERS nanofiber decorated with a dense and well dispersed array of Au NP has been developed for the measurement of neurotransmitters and other metabolites in proximity of cells. The nanosensors are thus highly compatible with current physiology experiments also relying on similar nanosensors based on electrochemistry and electrophysiology. Specifically, we will show that the SERS optophysiology nanosensor can measure a panel of metabolites near cells in a single experiment. The SERS spectra of these neurotransmitters were identified with a barcoding data processing method, processed with TensorFlow using a convolutional neural network architecture. This machine-learning driven data processing significantly improved the positive assignment rates for a series of metabolites and allows for complex measurements of the cell’s biochemistry. In addition to these untargeted SERS nanosensors, we also designed molecularly specific sensors to measure pH, H2O2 and heavy metals inside cells using the same nanosensor architecture. This suite of SERS nanosensor will open the door to survey molecular changes in proximity of healthy and diseased cells.

Bio: Jean-François Masson is full professor of Chemistry at the Université de Montréal. He studied chemistry at the Université de Sherbrooke (BSc), Arizona State University (PhD) and Georgia Tech (postdoc). His laboratory develops new plasmonic materials, instruments, and surface chemistry for the detection of broad range of molecules directly in crude samples, which are then translated to functional sensors for a series of biological, environmental and industrial applications. He has published more than 125 research articles and his research has led to filing more than 10 patents on various instrumental, materials or surface chemistry innovations for biosensing. He is an Associate Editor for ACS Sensors. In 2015, he co-founded Affinité Instruments, a Canadian start-up company commercializing surface plasmon resonance (SPR) instrumentation. Jean-Francois received several awards including the Tomas Hirschfeld award (2005), a NSERC discovery accelerator (2011), the Fred Beamish award (2013) and the McBryde Medal (2019) of the Canadian Society for Chemistry, and an Alexander von Humboldt fellowship, Germany (2013-2014) for research at the Max-Planck Institute. In 2017, he was named Fellow of the Royal Society of Chemistry – UK and more recently, he was named in the 2018 power list of the top 40 under 40 analytical scientists and the 2019 power list of the top 100 most influential analytical scientists from The Analytical Scientist – UK.

Materials:
slides
recording

April 6, 2022: Outsourcing astrophysics data analysis to the real experts

Outsourcing astrophysics data analysis to the real experts

Who: Christopher Messenger (University of Glasgow, UK)
When: April 6, 2022, at 4 PM (Central European time)
Where: Online (Zoom)

Abstract:
As gravitational wave astrophysicists we find ourselves on the rising wave of machine learning sweeping through the physical sciences. In the past ~5 years many of us have embraced the world of machine learning and tried to apply it to our most challenging problems including the detection of gravitational wave signals buried deep in our detector noise. However, we are domain-experts in astrophysics and statistical analysis, not necessarily experts in machine learning. In this talk I will walk through the steps we took to outsource our problem to a collection of the world’s best machine learning experts – for free.

There is a large and growing community of data analysis experts across all sectors – academic, industrial and commercial. Specifically, one such group of experts that focus on the application of machine learning can be found at Kaggle, a Google-owned data-science company. In addition to providing a host of online resources and support for anyone interested in data-science, Kaggle hosts competitions whereby individuals or teams are invited to solve data analysis problems where winners can earn “medals” and in some cases large cash prizes. We turned to Kaggle who helped us set up our first data challenge – the task of finding gravitational wave signals from binary black hole mergers.

Bio:
Chris Messenger is a senior lecturer at the University of Glasgow, UK. He has been a long time member of the global gravitational wave collaboration (currently) known as the LIGO-Virgo-Kagra Collaboration (LVK). He obtained his undergraduate degree from the University of Birmingham, UK where he also completed his PhD on the topic of gravitational wave detection for continuously emitting sources. He then worked as a postdoctoral researcher at the University of Glasgow, UK, at the Albert Einstein Institute in Hannover, Germany, and Cardiff University, UK before returning to Glasgow University as a Lord Kelvin Adam Smith Fellow. His current research interests lie in the field of gravitational wave cosmology and the introduction of machine learning and quantum computation to the problems of detection and Bayesian parameter estimation for any and all types of gravitational wave signal.

Materials:
video recording

PowerPoint presentation

February 2, 2022: Finding Approximately Repeated Patterns in Time Series: The most Useful, and yet most Underutilized Primitive in Time Series Analytics

Finding Approximately Repeated Patterns in Time Series: The most Useful, and yet most Underutilized Primitive in Time Series Analytics

Who: Prof. Eamonn Keogh (University of California Riverside)
When: February 2, 2022, at 4 PM (Central European time)
Where: Online (Zoom)

Abstract: Time series data mining is the task of finding patterns, regularities, and outliers in massive datasets. Given the ubiquity of time series in medicine, science, and industry, time series data mining is of increasing importance. In this talk I shall argue that the simple primitive of time series motif discovery, the task of finding approximately repeated patterns with a dataset, is the most useful core operation in all of time series data mining. In particular, it can be used as a primitive to enable many other useful tasks, such as summarization, segmentation, classification, clustering and anomaly detection. I will argue my case with examples of motif discovery in datasets as diverse as penguin behavior, cardiology, and astronomy.

Bio: Eamonn Keogh is a distinguished professor and Ross Family Chair in the Department of Computer Science and Engineering. He specializes in time series data mining, finding patterns, regularities, and outliers in massive datasets. He developed some of the most commonly used definitions, algorithms and data representations used in this area. These contributions include SAX, PAA, Time Series Shapelets, Time Series Motifs, the LBkeogh lower bound, and the Matrix Profile. These ideas have been used by thousands of academic, industrial, and scientific researchers worldwide, including NASA’s Jet Propulsion Laboratory, which uses Keogh’s ideas to find anomalies in observations of the magnetosphere collected by the Cassini spacecraft in orbit around Saturn. In the week following this talk, he will be presented with the 2021 IEEE ICDM Research Contributions Award.

Materials:
slides
video recording

November 3, 2021: Promises and challenges of massive-scale AI - the case of large language models

Promises and challenges of massive-scale AI – the case of large language models

Who: Laurent Daudet, CTO and co-founder at LightOn, Professor (on leave) of physics (Université Paris Cité)
When: November 3, 4pm (Paris time).
Where: hybrid: online (zoom) and at Room Turing Conseil, 45 rue des Saints Pères 75006 Paris

title:
Promises and challenges of massive-scale AI – the case of large language models

abstract:
OpenAi’s GPT-3 language model has triggered a new generation of Machine Learning models. Leveraging Transformers architectures at billion-size parameters trained on massive unlabeled datasets, these language models achieve new capabilities such as text generation, question answering, or even zero-shot learning – tasks the model has not been explicitly trained for. However, training these models represent massive computing tasks, now done on dedicated supercomputers. Scaling up these models will require new hardware and optimized training algorithms.

At LightOn – a spinoff of university research -, we develop a set of technologies to address these challenges. The Optical Processing Unit (OPU) technology makes some matrix-vector multiplications in a massively parallel fashion, at record-low power consumption. Now accessible on-premises or through the cloud, the OPU technology has been used by engineers and researchers worldwide in a variety of applications, for Machine Learning and scientific computing. We also train in an efficient manner large language models, such as PAGnol (demo at https://pagnol.lighton.ai ), the largest language model in French, that can be used for various research and business applications.

short bio:
Laurent Daudet is currently employed as CTO at LightOn, a startup he co-founded in 2016, where he manages cross-disciplinary R&D projects, involving machine learning, optics, signal processing, electronics, and software engineering. Laurent is a recognized expert in signal processing and wave physics, and is currently on leave from his position of Professor of Physics at the Université Paris Cité. Prior to that or in parallel, he has held various academic positions: fellow of the Institut Universitaire de France, associate professor at Université Pierre et Marie Curie, Visiting Senior Lecturer at Queen Mary University of London, UK, Visiting Professor at the National Institute for Informatics in Tokyo, Japan. Laurent has authored or co-authored more than 200 scientific publications, has been a consultant to various small and large companies, and is a co-inventor in several patents. He is a graduate in physics from Ecole Normale Supérieure in Paris, and holds a PhD in Applied Mathematics from Marseille University.

logistics:
Please send an email to diip[at]math-info.univ-paris5.fr with the name and date of the seminar you’re interested in to register and receive the zoom link.
Participants are invited to follow the seminar either online or in person at: Room Turing Conseil, 7th floor (make a right on exiting the elevators and left at the end of the corridor), 45 rue des Saints Pères; visitors should have an ID with them.

material

Video recording

Presentation slides

October 6, 2021: Building Data Equity Systems

Building Data Equity Systems

Who: Prof Julia Stoyanovich (New York University)
When: October 6, 4pm (Paris time).
Where: online (zoom)

title:
Building Data Equity Systems

abstract:
Equity as a social concept — treating people differently depending on their endowments and needs to provide equality of outcome rather than equality of treatment — lends a unifying vision for ongoing work to operationalize ethical considerations across technology, law, and society. In my talk I will present a vision for designing, developing, deploying, and overseeing data-intensive systems that consider equity as an essential requirement. I will discuss ongoing technical work in scope of the “Data, Responsibly” project, and will place this work into the broader context of policy, education, and public outreach activities.

short bio:
Julia Stoyanovich is an Institute Associate Professor of Computer Science & Engineering at the Tandon School of Engineering, Associate Professor of Data Science at the Center for Data Science, and Director of the Center for Responsible AI at New York University (NYU). Her research focuses on responsible data management and analysis: on operationalizing fairness, diversity, transparency, and data protection in all stages of the data science lifecycle. She established the “Data, Responsibly” consortium and served on the New York City Automated Decision Systems Task Force, by appointment from Mayor de Blasio. Julia developed and has been teaching courses on Responsible Data Science at NYU, and is a co-creator of an award-winning comic book series on this topic. In addition to data ethics, Julia works on the management and analysis of preference and voting data, and on querying large evolving graphs. She holds M.S. and Ph.D. degrees in Computer Science from Columbia University, and a B.S. in Computer Science and in Mathematics & Statistics from the University of Massachusetts at Amherst. She is a recipient of an NSF CAREER award and a Senior Member of the ACM.

material:

presentation slides

video recording

May 5, 2021: Deploying a Data-Driven COVID-19 Screening Policy at the Greek Border

Deploying a Data-Driven COVID-19 Screening Policy

Who: Prof Kimon Drakopoulos (University of Southern California)
When: May 5, 4pm (Paris time).
Where: online (zoom)

title:
Deploying a Data-Driven COVID-19 Screening Policy at the Greek Border

abstract:
In collaboration with the Greek government, we designed and deployed a nation-wide COVID-19 screening protocol for travelers to Greece. The goals of the protocol were to combine limited demographic information about arriving travelers with screening results from recently tested travelers to i) judiciously allocate Greece’s limited testing budget to identify asymptomatic, infected travelers and ii) quickly identify hotspots and spikes in other nations to inform immigration/border policies in real-time. This talk details i) the operations of our designed system (including border screening, database management, closed-loop feedback, and liaising with contact-tracing teams), ii) a novel, batched, contextual bandit algorithm tailored to the unique features of this problem and iii) an empirical assessment of the benefits of the deployed system from the summer/fall 2020.

short bio:
Kimon Drakopoulos is an Assistant Professor of Data Sciences and Operations at USC Marshall School of Business, where he researches complex networked systems, control of contagion, information design and information economics. He completed his Ph.D. in the Laboratory for Information and Decision Systems at MIT, focusing on the analysis and control of epidemics within networks. His current research revolves around controlling contagion, epidemic or informational as well as the use of information as a lever to improve operational outcomes in the context of testing allocation, fake news propagation and belief polarization.

lecture material:
video recording

Seminars + Hands-On Workshops

November 16, 2022: Deep Domain Adaptation and Generalization

Deep Domain Adaptation and Generalization

Who: Dr Shen Liang (Université Paris Cité, diiP)
When: October 19, 4 PM (Central Eastern Time)
Where: online (zoom)

title:
Deep Domain Adaptation and Generalization

abstract:
In real-world applications, deep learning models are often faced with challenges from multi-source data with heterogeneous features. For example, in biomedicine, electrocardiography (ECG) signals of different patients can differ drastically even if they suffer from the same heart condition, thus a computer-aided diagnosis model that works well for one patient may work poorly for another; in astrophysics, simulation is widely used for neutrino event reconstruction, yet the distribution of simulated data often fails to align with that of real data, thus an event reconstruction model trained on simulated data may not be trustworthy on real data. Two effective solutions to the problem with multi-source data are domain adaptation and domain generalization. Domain adaptation attempts to transfer a model trained on one or multiple data sources to a data source where some data is already available, while domain generalization attempts to generalize a model training on multiple data sources to unknown future data. In this seminar, I will introduce some of the most commonly used methodologies for domain adaptation and generalization, and provide suggestions on when to and when not to apply these techniques in the face of multi-source data. Note that this seminar requires the audience to have basic knowledge on transfer learning and multi-task learning, which can be found in the seminar on June 18th.

short bio:
Shen Liang is a research associate at the Data Intelligence Institute of Paris (diiP) and affiliated with the Université Paris Cité. He has worked on a variety of data management and mining problems including time series analysis, semi-supervised learning, knowledge-guided deep learning and GPU-accelerated computation within various fields such as healthcare, manufacturing, geosciences and astrophysics. He holds a PhD in software engineering from Fudan University, China.

materials
recording

June 15, 2022: Deep Transfer Learning and Multi-task Learning

Deep Transfer Learning and Multi-task Learning

Who: Dr Shen Liang (Université Paris Cité, diiP)
When: June 15, 4 PM (Central Eastern Time)
Where: online (zoom)

title:
Deep Transfer Learning and Multi-task Learning

abstract:
This tutorial provides an overview of two important and correlated (in many cases intersectional) topics in deep learning: transfer learning, and multi-task learning. Transfer learning focuses on transferring knowledge learned in one problem to a different yet related problem, while multi-task learning attempts to solve multiple tasks in one stroke by exploiting the commonalities across the multiple tasks. In this tutorial, I will present an overview of the motivations, methodologies, as well as applications of these two learning paradigms in fields such as natural language processing and computer vision. I will also provide two concrete examples of these paradigms in a hands-on workshop.

materials:
video recording
slides
example code

May 18, 2022: Knowledge-guided Data Science

Knowledge-guided Data Science

Who: Dr Shen Liang (Université Paris Cité, diiP)
When: May 18, 4 PM (Central Eastern Time)
Where: online (zoom)

title:
Knowledge-guided Data Science

abstract:
This tutorial presents an overview of knowledge-guided data science, a rising methodology in machine learning which fuses data with domain knowledge. We will present numerous case studies on this methodology to showcase how to unleash its potential in real-world data science applications.

materials
recording (not available)
slides
workshop data with Jupyter notebook

April 13, 2022: Generative Models

Generative Models + Hands-On Workshop

Who: Dr Foula Vagena (Université Paris Cité, diiP)
When: April 13, 4 PM (Central Eastern Time)
Where: online (zoom)

title:
Generative Models + Hands-On Workshop

abstract:
Generative modeling is a field in machine learning that involves automatically discovering and learning regularities or patterns in the data in such a way that the ML model can generate or output new examples that plausibly could have been drawn from the original dataset. In statistical machine learning a generative model explicitly describes the joint probability distribution of input (X) and output (Y) variables, i.e. P(Y, X). Various such models are commonly used in practice such as Naive Bayes, HMMs and MRFs. Deep generative models (DGMs) are neural networks with many hidden layers trained to approximate complicated, high-dimensional and at times unknown probability distributions using a large number of samples. The literature on DGMs is growing rapidly and some advances have reached the public sphere, for example, the recent successes in generating realistic-looking images, voices, or movies; so-called deep fakes, usually employing variational autoencoders (VAE), or generative adversarial networks (GAN). In this tutorial we give an overview of generative modeling with the aim (1) to provide a broad overview of the field and (2) to the possible extent, identify the common ground as well as main differences of the two approaches. The tutorial will conclude with a code example of (1) a statistical generative model (Naive Bayes) and (2) a simple DL generative model (GAN).

Hands-On Workshop on “Generative Naïve Bayes + GAN examples”.

short bio:
Zografoula Vagena is a research associate at the Data Intelligence Institute of Paris (diiP) and affiliated with the Université Paris Cité. She has been a data science researcher and practitioner for over ten years. She has worked on different analytics problems including forecasting, image processing, graph analytics, multidimensional data analysis, text processing, recommendation systems, sequential data analysis and optimization within various fields such as transportation, healthcare, retail, finance/insurance and accounting. She has also performed research in the intersection of data management and analytics, and was a primary contributor of the MCDB/SimSQL systems that blended data management with Bayesian statistics. She holds a PhD in data management from the University of California, Riverside.

materials
example code

PowerPoint presentation

recording

March 16, 2022: Statistical Machine Learning: Interence and Learning

Statistical Machine Learning: Interence and Learning

Who: Dr Foula Vagena (Université Paris Cité, diiP)
When: March 16, 4 PM (Central Eastern Time)
Where: online (zoom)

title:
Statistical Machine Learning: Interence and Learning

abstract:
Machine learning is a branch of artificial intelligence (AI) focused on building applications that learn from data and improve their accuracy over time without being programmed to do so. When statistical techniques and machine learning are combined together create a powerful tool for analysing various kinds of data in many computer science/engineering areas including, image processing, speech processing, natural language processing, robot control, as well as in fundamental sciences such as biology, medicine, astronomy, physics, and materials. In this second part of the tutorial we will focus on inference and learning for Probabilistic Graphical Models (PGMs). Inference enables us to “query”a trained PGM to obtain relevant information based on given evidence (i.e. observations). For example one can perform inference on a manufacturing PGM to find the probability of large delivery delays due to a hurricane. Most inference algorithms strive to attain satisfactory computing performance while maintaining accuracy guarantees. Finally learning PGMs given training data is discussed, with the purpose to convey the main challenges of the task and touch upon the most popular solutions. The tutorial will conclude with computing examples of (1) a Bayesian Network and (2) Markov Random Field. (please be advised that this tutorial builds upon material presented on the first tutorial on PGMs)

logistics:

Please send an email to diip[at]math-info.univ-paris5.fr with the name and date of the seminar you’re interested in to register and receive the Zoom link.

material

Source code for example

Recording

Slides

February 16, 2022: Statistical Machine Learning: Overview and Applications

Statistical Machine Learning: Overview and Applications

Who: Dr Foula Vagena (Université Paris Cité, diiP)
When: February 16, 4pm (Central Eastern Time)
Where: online (zoom)

title:
Statistical Machine Learning: Overview and Applications

abstract:
Machine learning is a branch of artificial intelligence (AI) focused on building applications that learn from data and improve their accuracy over time without being programmed to do so. When statistical techniques and machine learning are combined together create a powerful tool for analysing various kinds of data in many computer science/engineering areas including, image processing, speech processing, natural language processing, robot control, as well as in fundamental sciences such as biology, medicine, astronomy, physics, and materials. In this tutorial, which consists of two parts, we will explain the basics of statistical modeling in machine learning and introduce Probabilistic Graphical Models (PGMs), which blend ideas from statistical and computer science to produce powerful modeling techniques. In this first part of the tutorial, we will first motivate the need for PGMs and discuss applications where the formers have been very successful. We will then provide details on modeling using PGMs and in the process describe some of the most illustrative examples of them.

materials:
recording
slides

January 19, 2022: Graph Based Data Science: Opportunities Challenges and Techniques

Graph Based Data Science: Opportunities Challenges and Techniques

Who: Dr Foula Vagena (Université Paris Cité, diiP)
When: January 19, 4pm (Central Eastern Time)
Where: online (zoom)

title:
Ensemble Learning: Theory and Techniques + Hands-on workshop

abstract:
Graph based data science lets us leverage the power of relationships and structure in data to improve model prediction and answer previously intractable questions. In this tutorial we will first introduce the graph as a versatile data representation and summarize the different analytics tasks that can be performed over graph structured data. We will go on to detail the different ML/AI tasks that become possible by leveraging using the graph structure of data and describe recent relevant algorithms and techniques. The tutorial will conclude with a demonstration of exploratory analysis over graph data followed by an illustrative link prediction example.

material:
Recording of the seminar

Source code for example

PowerPoint presentation

December 15, 2021: Ensemble Learning: Theory and Techniques

Ensemble Learning: Theory and Techniques

Who: Dr Foula Vagena (Université Paris Cité, diiP)
When: December 15, 4pm (Paris time)
Where: online (zoom)

title:
Ensemble Learning: Theory and Techniques + Hands-on workshop

abstract:
Ensemble learning is the process by which multiple models, such as classifiers or experts, are combined to solve a particular computational intelligence problem. Ensemble learning is primarily used to improve the (classification, prediction, function approximation, etc.) performance of a model, or reduce the likelihood of an unfortunate selection of a poor one. By strategically combining multiple models one can produce a new predictive model with reduced variance, bias and improved predictions. In this tutorial we will explain the bias-variance tradeoff and describe how popular ensemble techniques (such as bagging, boosting, stacking etc) handle it. We will conclude the tutorial with an illustrative prediction task using various ensemble models.

The Hands-On Workshop will focus on examples of ensemble models.

material

Recording of the seminar

Source code for example

November 17, 2021: Deep Learning for Sequential Data: Models and Applications

Deep Learning for Sequential Data: Models and Applications

Who: Dr Foula Vagena (Université Paris Cité, diiP)
When: November 17, 4pm (Paris time)
Where: online (zoom) and at Room Turing Conseil, 45 rue des Saints-Pères, 75006 Paris

title:
Deep Learning for Sequential Data: Models and Applications + Hands-on workshop

abstract:
Recurrent neural networks (RNNs) are a family of specialized neural networks for processing sequential data. They can scale to much longer sequences than would be practical for networks without sequence-based specialization and most of them can also process sequences of variable length. In this tutorial we will first describe the high level RNN architecture and outline its most popular variations. We will then explain the main challenge the handling of data sequentail presents, namely long term dependenceis and summarize the different mechanisms that are employed to tackle it (i.e. gated architectures, attention mechanisms). We will go on to describe applications where RNN have been succesfule employed and we will conclude the tutorial with an illustrative RNN-supported timeseries prediction example.

The Hands-On Workshop will focus on RNN supported timeseries prediction.

logistics: Please send an email to diip[at]math-info.univ-paris5.fr with the name and date of the seminar you’re interested in to register and receive the zoom link.

Participants are invited to follow the seminar either online or in person at: Room Turing Conseil, 7th floor (make a right on exiting the elevators and left at the end of the corridor), 45 rue des Saints Pères; visitors should have an ID with them.

materials:

Video recording

Presentation slides

July 7, 2021: Convolutional Neural Networks: An Overview and Applications

Convolutional Neural Networks: An Overview and Applications

Who: Dr Foula Vagena (Université Paris Cité, diiP)
When: July 7, 4pm (Paris time)
Where: online (zoom)

title:
Convolutional Neural Networks: An Overview and Applications + Hands-on workshop

abstract:
Convolutional neural networks (CNNs), are a specialized kind of neural network for processing data that has a known, grid-like topology. Examples include time-series data, which can be thought of as a 1D grid taking samples at regular time intervals, and image data, which can be thought of as a 2D grid of pixels. Such networks have been tremendously successful in practical applications. They employ a mathematical operation called convolution, a specialized kind of linear operation. In this tutorial we will first describe the convolutional operation and explain how this is leveraged to form CNN architectures. We will then describe applications where CNNs have been very succesful and provide a summary of well known CNN architectures. The tutorial will conclude with an illustrative hands-on example of a CNN-supported image classification task.

The Hands-On Workshop will focus on CNN supported image classification.

material:

Video recording

presentation slides

source code for examples

June 2, 2021: Deep Learning: An overview using Multi Layer Perceptrons (MLPs) + Hands-On Workshop

Deep Learning: An overview using Multi Layer Perceptrons (MLPs) + Hands-On Workshop

Who: Dr Foula Vagena (Université Paris Cité, diiP)
When: June 2, 4pm (Paris time)
Where: online (zoom)

title:
An overview using Multi Layer Perceptrons (MLPs)

abstract:
Deep Learning (DL for short) is a field of machine learning that is concerned with algorithms based on (artificial) neural networks and representation learning. The quintessential example of a deep learning model is the feedforward deep network or multilayer perceptron (MLP). In this tutorial we will provide an overview DL and present its main components (e.g. tensors, layered composition of models/functions). We will focus on MLPs and using that we will delineate and explain the main steps/concepts for DL-based modeling. The tutorial will conclude with an illustrative hands-on example of MLP-supported regression and classification models.
The hands-on workshop will focus on MLP supported regression + classification examples.

seminar material:
video recording

presentation slides

source code for examples

April 7, 2021: Data Science: A high level overview + Hands-On Workshop

Data Science: A high level overview + Hands-On Workshop

Who: Dr Foula Vagena (Université Paris Cité, diiP)
When: April 7, 4pm (Paris time).
Where: online (zoom)

title:
Data Science: A high level overview

abstract:
Data science is the area of study which involves extracting insights from data using various scientific methods, algorithms, and processes. In this tutorial we will explain the need for data science and provide an overview of the field, its main components, the opportunities that it creates as well as its major challenges. We will then describe the main steps of performing data science starting from an analytics problem up to the point of communicating the results. We will then summarize applications where data science has traditionally been employed and provide examples of data science popular tools. The tutorial will conclude with an illustrative hands-on example of the data science process.
The hands-on workshop will focus on Image Analysis + Segmentation using a pre-trained Mask R-CNN model.

seminar material:
video recording
presentation slides
source code for examples

diiP Seminars

Agenda

Distinguished Lectures

June 5, 2024: Synergy of Graph Data Management and Machine Learning in Explainability and Query Answering

April 3, 2024: Retrieval Augmented Generative Question Answering for Personal Assistants

March 6, 2024: On Data Ecology, Data Markets, the Value of Data, and Dataflow Governance

February 7, 2024: Responsible AI

December 6, 2023: Testing System Intelligence

October 4, 2023: Deep Learning of Seismograms

June 7, 2023: Data Science Trends and Non-Trends: A Confluence of Complexity

April 5, 2023: The promise of language models for language sciences? Let's chat!

The promise of language models for language sciences? Let’s chat!

February 1, 2023: From Bounded Rationality to Ecological Rationality

From Bounded Rationality to Ecological Rationality

December 7, 2022: Self-designing Data Systems for the AI Era

Self-designing Data Systems for the AI Era

November 2, 2022: Computational design of enzyme repertoires

Computational design of enzyme repertoires

May 4, 2022: Surface enhanced Raman scattering (SERS) sensors: combining machine learning and nanosciences

Surface enhanced Raman scattering (SERS) sensors: combining machine learning and nanosciences

April 6, 2022: Outsourcing astrophysics data analysis to the real experts

Outsourcing astrophysics data analysis to the real experts

February 2, 2022: Finding Approximately Repeated Patterns in Time Series: The most Useful, and yet most Underutilized Primitive in Time Series Analytics

Finding Approximately Repeated Patterns in Time Series: The most Useful, and yet most Underutilized Primitive in Time Series Analytics

November 3, 2021: Promises and challenges of massive-scale AI - the case of large language models

Promises and challenges of massive-scale AI – the case of large language models

October 6, 2021: Building Data Equity Systems

Building Data Equity Systems

May 5, 2021: Deploying a Data-Driven COVID-19 Screening Policy at the Greek Border

Deploying a Data-Driven COVID-19 Screening Policy

Seminars + Hands-On Workshops

November 16, 2022: Deep Domain Adaptation and Generalization

Deep Domain Adaptation and Generalization

June 15, 2022: Deep Transfer Learning and Multi-task Learning

Deep Transfer Learning and Multi-task Learning

May 18, 2022: Knowledge-guided Data Science

Knowledge-guided Data Science

April 13, 2022: Generative Models

Generative Models + Hands-On Workshop

March 16, 2022: Statistical Machine Learning: Interence and Learning

Statistical Machine Learning: Interence and Learning

February 16, 2022: Statistical Machine Learning: Overview and Applications

Statistical Machine Learning: Overview and Applications

January 19, 2022: Graph Based Data Science: Opportunities Challenges and Techniques

Graph Based Data Science: Opportunities Challenges and Techniques

December 15, 2021: Ensemble Learning: Theory and Techniques

Ensemble Learning: Theory and Techniques

November 17, 2021: Deep Learning for Sequential Data: Models and Applications

Deep Learning for Sequential Data: Models and Applications

July 7, 2021: Convolutional Neural Networks: An Overview and Applications

Convolutional Neural Networks: An Overview and Applications

June 2, 2021: Deep Learning: An overview using Multi Layer Perceptrons (MLPs) + Hands-On Workshop

Deep Learning: An overview using Multi Layer Perceptrons (MLPs) + Hands-On Workshop

April 7, 2021: Data Science: A high level overview + Hands-On Workshop

Data Science: A high level overview + Hands-On Workshop

À lire aussi

diiP Summer School: June 10-14, 2024

diiP Projects Day: December 6th, 2023

2024 internship opportunities for Master’s students

LRCS is looking for an AI Engineer for the PEPR Battery OpenStorm Project