Molecule Retrieval with text query
Contrastive learning
This project is developed for the course “Advanced Learning for Text and Graph Data” of Prof. Michalis Vazirgiannis at the Master MVA ENS Paris-Saclay. The code is designed for a Kaggle challenge where the objective is to retrieve a molecule from a given text query and a list of molecules represented as graphs. Notably, no reference or textual information about the molecules is provided, making the task challenging. The code is available at Molecule retrieval
Files
- f_during_train.py:
- Python script for training the model with a procedure to increase the batch size during learning. We implemented several contrastive losses.
- main.py:
- Python script for training the model with a fixed batch size.
- test_models.ipynb:
- Jupyter notebook for testing the models and analyzing the dataset.
- Model.py:
- Contains the implementation of the model with different architecture (a Graph encoder and a Text encoder).
Usage
-
Run
f_during_train.pyfor training the model (all the argument are available in the parser function):python f_during_train.py -loss_type lifted_structured_loss -init_bs 80 -final_bs 210 -n_ep_update 2 -conv_layer ChebConv -lr 3e-5Acknowledgments
This project is developed as part of the “Advanced Learning for Text and Graph Data” course. Special thanks to the instructors who designed the dataset and provided several starting files such as the dataloader.
For any inquiries or issues, please contact Lucas Gascon, Hippolyte Pilchen or Pierre Fihey at (forename).(name)@polytechnique.edu