Download

Resources

CompGuessWhat?! extends the original GuessWhat?! datasets with a rich semantic representations in the form of scene graphs associated with every image used as reference scene for the guessing games. The code associated with this work can be found at the following link: GitHub.

In the following we report shortcuts to all the material that has been developed to implement the CompGuessWhat?! dataset following the GROLLA framework:

  • CompGuessWhat?! games: download link.

  • CompGuessWhat?! scene graphs: download link. The scene graphs can be manipulated using a dedicated Python API.

  • Attribute features for every target object in the games: download link.

  • Dialogue state representations for all the models presented in the paper: download link. The code to reproduce the experiments for the attribute prediction task can be found in the comp_probing repository.

  • Reference set of games for the zero-shot scenario: download link.

  • The images used in the zero-shot scenario belong to the nocaps dataset.

Integration with 🤗 nlp

For ease of use, CompGuessWhat?! is now part of 🤗 nlp. Both the original CompGuessWhat?! split and the zero-shot split based on NOCAPS are available at the following link. In this release, we do not integrate the VisualGenome data but we provide for every image of the original split the VisualGenome ID that can be use to access to the associated scene graph. Scene graphs can be downloaded from here.

Data format

The CompGuessWhat?! games are saved in GZIP format and they are stored as JSONL files. Each line of the file represents a game in the same format of the original GuessWhat?! dataset:

{
   "category":"dog",
   "id":109093,
   "image":{
      "file_name":"COCO_val2014_000000252745.jpg",
      "coco_url":"http://mscoco.org/images/252745",
      "height":640,
      "width":632,
      "flickr_url":"http://farm4.staticflickr.com/3265/2628245811_251596cc6c_z.jpg",
      "id":252745,
      "vg_id": 2405073,
      "vg_url": "https://cs.stanford.edu/people/rak248/VG_100K_2/2405073.jpg"
   },
   "object_id":6118,
   "objects":[
      {
         "category":"person",
         "area":26691.59765,
         "bbox":[
            296.86,
            175.72,
            265.45,
            203.49
         ],
         "category_id":1,
         "segment":[
            ... SEGMENT IN MSCOCO FORMAT ...
         ],
         "id":190053
      },
      {
         "category":"dog",
         "area":21977.06995,
         "bbox":[
            138.34,
            460.07,
            292.21,
            164.2
         ],
         "category_id":18,
         "segment":[
            ... SEGMENT IN MSCOCO FORMAT ...
         ],
         "id":6118
      },
      {
         "category":"person",
         "area":31238.60215,
         "bbox":[
            490.43,
            40.37,
            122.24,
            540.76
         ],
         "category_id":1,
         "segment":[
            ... SEGMENT IN MSCOCO FORMAT ...
         ],
         "id":217169
      }
   ],
   "qas":[
      {
         "answer":"Yes",
         "question":"is it alive?",
         "id":459381
      },
      {
         "answer":"No",
         "question":"is it human?",
         "id":459385
      }
   ],
   "questioner_id":44249,
   "status":"success",
   "timestamp":"2016-07-28 13:34:41"
}

For the zero-shot split, we use a slightly different format for the games due to the different image source namely OpenImages. In addition, the zero-shot split does not have gold dialogues because the models are supposed to play the games on their own and generate dialogues. Therefore, the qas field is missing.

{
   "id":0,
   "image":{
      "coco_url":"https://s3.amazonaws.com/nocaps/test/0003d84e0165d630.jpg",
      "date_captured":"2018-11-06 11:04:33",
      "file_name":"0003d84e0165d630.jpg",
      "height":768,
      "id":4500,
      "license":0,
      "open_images_id":"0003d84e0165d630",
      "width":1024
   },
   "objects":[
      {
         "id":"0003d84e0165d630_2",
         "category":"plant",
         "category_id":0,
         "bbox":[
            540.8,
            153.60000000000002,
            104.0,
            56.000002559999956
         ],
         "segment":[
            {
               "MaskPath":"0003d84e0165d630_m018xm_3263473c.png",
               "LabelName":"/m/018xm",
               "BoxID":"3263473c",
               "BoxXMin":"0.000000",
               "BoxXMax":"0.198437",
               "BoxYMin":"0.666667",
               "BoxYMax":"0.941667",
               "PredictedIoU":"0.00000",
               "Clicks":""
            }
         ],
         "area":5824.000266239997,
         "IsOccluded":0,
         "IsTruncated":0
      },
      {
         "id":"0003d84e0165d630_3",
         "category":"plant",
         "category_id":0,
         "bbox":[
            640.0,
            137.59999488,
            86.39999999999998,
            72.00000767999998
         ],
         "segment":[
            {
               "MaskPath":"0003d84e0165d630_m018xm_3263473c.png",
               "LabelName":"/m/018xm",
               "BoxID":"3263473c",
               "BoxXMin":"0.000000",
               "BoxXMax":"0.198437",
               "BoxYMin":"0.666667",
               "BoxYMax":"0.941667",
               "PredictedIoU":"0.00000",
               "Clicks":""
            }
         ],
         "area":6220.800663551997,
         "IsOccluded":0,
         "IsTruncated":0
      },
      {
         "id":"0003d84e0165d630_8",
         "category":"person",
         "category_id":1,
         "bbox":[
            232.0,
            153.60000000000002,
            70.39999999999998,
            179.20000512000001
         ],
         "segment":[
            {
               "MaskPath":"0003d84e0165d630_m018xm_3263473c.png",
               "LabelName":"/m/018xm",
               "BoxID":"3263473c",
               "BoxXMin":"0.000000",
               "BoxXMax":"0.198437",
               "BoxYMin":"0.666667",
               "BoxYMax":"0.941667",
               "PredictedIoU":"0.00000",
               "Clicks":""
            }
         ],
         "area":12615.680360447997,
         "IsOccluded":0,
         "IsTruncated":0
      },
      {
         "id":"0003d84e0165d630_9",
         "category":"person",
         "category_id":1,
         "bbox":[
            235.2,
            20.800000512,
            65.60000000000002,
            184.000009728
         ],
         "segment":[
            {
               "MaskPath":"0003d84e0165d630_m018xm_3263473c.png",
               "LabelName":"/m/018xm",
               "BoxID":"3263473c",
               "BoxXMin":"0.000000",
               "BoxXMax":"0.198437",
               "BoxYMin":"0.666667",
               "BoxYMax":"0.941667",
               "PredictedIoU":"0.00000",
               "Clicks":""
            }
         ],
         "area":12070.400638156803,
         "IsOccluded":0,
         "IsTruncated":0
      },
      {
         "id":"0003d84e0165d630_10",
         "category":"person",
         "category_id":1,
         "bbox":[
            452.8,
            27.199999488,
            71.99999999999994,
            201.600010752
         ],
         "segment":[
            {
               "MaskPath":"0003d84e0165d630_m018xm_3263473c.png",
               "LabelName":"/m/018xm",
               "BoxID":"3263473c",
               "BoxXMin":"0.000000",
               "BoxXMax":"0.198437",
               "BoxYMin":"0.666667",
               "BoxYMax":"0.941667",
               "PredictedIoU":"0.00000",
               "Clicks":""
            }
         ],
         "area":14515.200774143988,
         "IsOccluded":0,
         "IsTruncated":0
      },
      {
         "id":"0003d84e0165d630_11",
         "category":"person",
         "category_id":1,
         "bbox":[
            753.6,
            11.199999744,
            104.0,
            284.799995136
         ],
         "segment":[
            {
               "MaskPath":"0003d84e0165d630_m018xm_3263473c.png",
               "LabelName":"/m/018xm",
               "BoxID":"3263473c",
               "BoxXMin":"0.000000",
               "BoxXMax":"0.198437",
               "BoxYMin":"0.666667",
               "BoxYMax":"0.941667",
               "PredictedIoU":"0.00000",
               "Clicks":""
            }
         ],
         "area":29619.199494144,
         "IsOccluded":0,
         "IsTruncated":0
      },
      {
         "id":"0003d84e0165d630_12",
         "category":"ball",
         "category_id":0,
         "bbox":[
            0.0,
            512.0000256,
            203.2,
            211.19996928
         ],
         "segment":[
            {
               "MaskPath":"0003d84e0165d630_m018xm_3263473c.png",
               "LabelName":"/m/018xm",
               "BoxID":"3263473c",
               "BoxXMin":"0.000000",
               "BoxXMax":"0.198437",
               "BoxYMin":"0.666667",
               "BoxYMax":"0.941667",
               "PredictedIoU":"0.00000",
               "Clicks":""
            }
         ],
         "area":42915.833757696004,
         "IsOccluded":0,
         "IsTruncated":0
      },
      {
         "id":"0003d84e0165d630_13",
         "category":"mammal",
         "category_id":0,
         "bbox":[
            227.2,
            153.60000000000002,
            68.80000000000001,
            182.39999999999998
         ],
         "segment":[
            {
               "MaskPath":"0003d84e0165d630_m018xm_3263473c.png",
               "LabelName":"/m/018xm",
               "BoxID":"3263473c",
               "BoxXMin":"0.000000",
               "BoxXMax":"0.198437",
               "BoxYMin":"0.666667",
               "BoxYMax":"0.941667",
               "PredictedIoU":"0.00000",
               "Clicks":""
            }
         ],
         "area":12549.12,
         "IsOccluded":0,
         "IsTruncated":0
      },
      {
         "id":"0003d84e0165d630_16",
         "category":"mammal",
         "category_id":0,
         "bbox":[
            763.2,
            11.199999744,
            94.39999999999998,
            279.99999513599994
         ],
         "segment":[
            {
               "MaskPath":"0003d84e0165d630_m018xm_3263473c.png",
               "LabelName":"/m/018xm",
               "BoxID":"3263473c",
               "BoxXMin":"0.000000",
               "BoxXMax":"0.198437",
               "BoxYMin":"0.666667",
               "BoxYMax":"0.941667",
               "PredictedIoU":"0.00000",
               "Clicks":""
            }
         ],
         "area":26431.999540838395,
         "IsOccluded":0,
         "IsTruncated":0
      }
   ],
   "object_id":"0003d84e0165d630_12",
   "status":"incomplete"
}

Citation

Please cite our work in case you decide to use our data and code:

@inproceedings{suglia-etal-2020-compguesswhat,
    title = "{C}omp{G}uess{W}hat?!: A Multi-task Evaluation Framework for Grounded Language Learning",
    author = "Suglia, Alessandro  and
      Konstas, Ioannis  and
      Vanzo, Andrea  and
      Bastianelli, Emanuele  and
      Elliott, Desmond  and
      Frank, Stella  and
      Lemon, Oliver",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.682",
    pages = "7625--7641",
    abstract = "Approaches to Grounded Language Learning are commonly focused on a single task-based final performance measure which may not depend on desirable properties of the learned hidden representations, such as their ability to predict object attributes or generalize to unseen situations. To remedy this, we present GroLLA, an evaluation framework for Grounded Language Learning with Attributes based on three sub-tasks: 1) Goal-oriented evaluation; 2) Object attribute prediction evaluation; and 3) Zero-shot evaluation. We also propose a new dataset CompGuessWhat?! as an instance of this framework for evaluating the quality of learned neural representations, in particular with respect to attribute grounding. To this end, we extend the original GuessWhat?! dataset by including a semantic layer on top of the perceptual one. Specifically, we enrich the VisualGenome scene graphs associated with the GuessWhat?! images with several attributes from resources such as VISA and ImSitu. We then compare several hidden state representations from current state-of-the-art approaches to Grounded Language Learning. By using diagnostic classifiers, we show that current models{'} learned representations are not expressive enough to encode object attributes (average F1 of 44.27). In addition, they do not learn strategies nor representations that are robust enough to perform well when novel scenes or objects are involved in gameplay (zero-shot best accuracy 50.06{\%}).",
}
Previous Post Next Post