Scene Graph Generation
in Operating Rooms

Demo

4D-OR Dataset (Özsoy et al.)

4D-OR Dataset

We gratefully acknowledge the contributions of 4D-OR benchmark (Özsoy et al.), which significantly facilitates our work.

Based on 4D-OR, we have reformatted the dataset to be more user-friendly.

Reformatting of SGG Annotations

Unlike the original 4D-OR dataset, we standardized the SGG annotations to align with popular open-world SGG datasets such as Visual Genome (Krishna et al., 2017) and Open Images (Kuznetsova et al., 2020). This adjustment makes the data more accessible and easier to process and apply.

Image 1

Reformatted annotations for object detections in ORs

Image 2

Reformatted annotations for triplet relations in ORs

Reformatting of Overall Structure

The original 4D-OR dataset was organized based on different surgical segments, mixing data from various modalities within separate surgical segment folders. To facilitate processing, we reorganized the dataset based on modality, categorizing input data into distinct modalities (e.g., 2D multi-view images, 3D point clouds, textual annotations). This reorganization enables more efficient multimodal processing.

Image 3

Reformatted multi-view 2D image inputs

Image 4

Reformatted 3D point cloud inputs

Training and Inference

Download the Processed reformatted 4D-OR provided below. The data folder should be like this:

S2Former-OR/data/: 
    /images/: unzip 4d_or_images_multiview_reltrformat.zip
    /points/: unzip points.zip
    /infer/: unzip infer.zip
    /train.json: from reltr_annotations_8.3.zip
    /val.json: from reltr_annotations_8.3.zip
    /test.json: from reltr_annotations_8.3.zip
    /rel.json: from reltr_annotations_8.3.zip
    

After that, you can use the scripts in S2Former-OR and TriTemp-OR for training and inference. Please note that you need to first run the provided inference script and upload your inferred predictions here for evaluation on the test set.

Methods

...

Version 1: S2Former-OR (TMI 2024)

Overview of the proposed single-stage multi-view bi-modal S2Former-OR for scene graph generation from operating rooms.

...

Version 2: TriTemp-OR (MICCAI 2024)

Overview of the proposed TriTemp-OR for scene graph generation in ORs.

Comparison of OR-SGG Models

Qualitative results of S2Former-OR and existing OR-SGG models on 4D-OR test set.

Comparison of OR-SGG Models

Qualitative results of TriTemp-OR and existing OR-SGG models on 4D-OR test set.

Results

Comparison of OR-SGG Models

Detailed comparisons of S2Former-OR with existing OR-SGG models on 4D-OR test set.

Comparison of OR-SGG Models

Detailed comparisons of TriTemp-OR with existing OR-SGG models on 4D-OR test set.

Citation

Jialun Pei, Diandian Guo, Jingyang Zhang, Manxi Lin, Yueming Jin and Pheng Ann Heng. S2Former-OR: Single-Stage Bimodal Transformer for Scene Graph Generation in OR. TMI, 2024.[Arxiv][Github][Reformated 4D-OR Dataset]

Diandian Guo, Manxi Lin, Jialun Pei, He Tang, Yueming Jin and Pheng Ann Heng. Tri-modal Confluence with Temporal Dynamics for Scene Graph Generation in Operating Rooms. MICCAI, 2024.[Arxiv][Github]

Bibtex

@article{s2former2024,
               author={Jialun Pei and Diandian Guo and Jingyang Zhang and Manxi Lin and Yueming Jin and Pheng Ann Heng},
               title={S2Former-OR: Single-Stage Bimodal Transformer for Scene Graph Generation in OR},
               booktitle={TMI},
               year={2024}
}

@inproceedings{tritemp2024,
               author={Diandian Guo and Manxi Lin and Jialun Pei and He Tang and Yueming Jin and Pheng Ann Heng},
               title={Tri-modal Confluence with Temporal Dynamics for Scene Graph Generation in Operating Rooms},
               booktitle={MICCAI},
               year={2024}
}

Visitors