The purpose of the ML4DH journal club is to keep abreast of the latest developments in this interdisciplinary field and to critically evaluate published research. During the meetings, participants will summarize an article and then discuss its strengths, weaknesses and implications for future research.
The club meets every two weeks to discuss recent scientific articles related to these topics. It is organized with the support of SCAI, CERES and ObTIC.
In addition to discussing the papers, the club will organize other activities to promote exchange and collaboration. These activities could include, for example:
- Presentations by club members or guests on their current research projects;
- Workshops or training sessions to learn new skills or techniques related to machine learning and digital humanities;
- Digital projects or hackathons to explore new applications of machine learning in the humanities;
- Lectures by invited experts.
For each session, three readings may be presented, each lasting no more than 30 minutes (including discussion). A club member will suggest a scientific article to present in advance. He or she may then elaborate on the following points:
- Status of the work presented in relation to the state of the art;
- Methodology, tools and resources used;
- Results obtained.
The exchanges can eventually bring a critical look at the research presented.
February 9, 2023: Ljudmila Petkovic "Detection of intertextual phenomena"
Abstract: It is now possible to automatically detect, with techniques inspired by plagiarism detection, textual fragments evoking, due to their similarities, quotations or reuses. However, when the size of the corpus is large, the number of similarities detected is so large that it is confusing. Moreover, frozen expressions or clichés bury the most interesting repetitions. In a similar way, we can find, on writers' hard disks, very similar files corresponding either to duplications or to different states of the same writing. Here again, the number of similar files appears dizzying. To overcome these difficulties, we propose to represent large masses of textual similarities on graphs and to take advantage of mathematical operators on graphs, in particular the detection of "communities" or minimal spanning trees, to group them in a meaningful way.
To find the next topics and, if you wish, to register as a contributor, go to the shared document.
Free entry for those who just want to listen.