An introduction to modern tools for collaborative science
Best practices in co-developing and co-authoring
- Luca Heltai (luca.heltai@sissa.it)
The official course page, with the schedule and up-to-date information is
https://www.math.sissa.it/course/phd-course/collaborative-science-2022-2023
Course information
Collaboration (or the action of working with someone to produce something) is one of the fundamental pillars in Science.
While some types of collaborative efforts are done synchronously (either "in person", during coffee breaks, using blackboards, whiteboards, workshops, etc., or "remotely" via phone calls, skype/zoom/meets meetings, email exchanges, etc.), most of the time they require separate efforts from the various participants, which then need to be synchronized.
Let's take the action of writing a paper (using LaTeX) with a co-author as an example. John and Jane have discussed a new and interesting topic on the black board, and now they want to write it down.
Jane starts up, and prepares a draft, with title, sections, and so on. John waits for Jane to be finished, then starts working on the same document. Wait. How do Jane and John exchange information (in this case, a LaTeX document)?
A workflow could look like the following:
- Jane works on a document. Then sends it by email to John.
- John receives the document. He changes it. John sends it back to Jane. Jane receives it and replaces the old document with the new one, or saves it as a new file (with a meaningful name, like
document_v2.txt
). - Repeat.
In a simple and linear world, this would work fine. However:
- Working in parallel is difficult. With more than two authors, life becomes cumbersome, to say the least.
- Overwriting the wrong file is always a risk, and agreeing on the right naming convention is often more difficult than it looks like.
- We'll end up with our directories full of files like
document_final_final_2_really_final.tex
- Often we end up with files that don’t compile, and we need to start “chasing for errors”, before we can even start working on our task
In this short course I'll discuss modern tools used to automate the above process, and to introduce fail-safe habits that will make your life as a researcher much easier. In particular, the main topics of this course will be:
-
Verion control systems
- An historical overview (RCS -> CVS -> SVN -> GIT)
- Using GIT for single user projects
- Using GIT for small collaborative projects
- Using GIT for large collaborative projects
- Github examples
-
Testing systems
- Unit tests
- Functional tests
- Integration tests
-
Continuous integration systems
- Combining version control systems and test systems
- Github actions
-
Container systems
- Docker
- Singularity
- Integrating docker or singularity with github actions
- Integrating docker with Visual Studio Code
At the end of the course, you will be able to setup a collaborative environment that will make your day-to-day work much safer, and much more effective, both if you work alone, or if you collaborate with hundreds of other people.
We'll work through some examples based on the LaTeX typesetting language, and on the python programming language. You will end up with two template repositories (one for a LaTeX paper, and one for a python project) that will allow you to bootstrap your day-to-day research with state-of-the-art tools for collaborative science development.
Particular attention will be dedicated to reproducibility, providing a safe way to share programming environments (containers) both with colleagues, with automated testing systems (github actions/travis CI/etc.), and with cluster programming environments (Ulysses) including ways to guarantee reproducibility of your results
References and Text Books:
-
Pro Git, Scott Chacon, [E-Book-ENG]
-
Git Immersion [Interactive Site]
-
Learn git branching [Interactive Site]
A final note
This repository contains assignements, workspaces, and other material for the course "An introduction to modern tools for collaborative science"
New material will be uploaded frequently,
Remember to set a second remote, either to our seed
git remote add teacher https://github.com/luca-heltai/collaborative-science-2022-2023.git
or (if using ssh keys in your github account)
git remote add teacher
git@github.com:luca-heltai/collaborative-science-2022-2023.git
and to update before the lectures:
git pull teacher
main
Please consider contributing pull requests to correct typos, or better document the material in this repository!