Statistical Dataset Search

  • March 23, 2021
  1. We proposed a method to identify the statistical dataset that a text is citing to. For example, given a sentence such as “The number of recognized criminal offenses in 2017 was about 910k,” we can identify from the words “criminal offenses” that the sentence is citing dataset from the Metropolitan Police Department’s crime statistics.
  2. We created an evaluation benchmark for statistical dataset retrieval.
  3. We created an evaluation benchmark to identify the cells of cited statistical datasets in text.


Makoto P. Kato

Associate Professor


  • 1. 中野 優,加藤 誠.誤引用検証ための被引用統計データの検索.第 13 回データ工学と情報マネジメントに関するフォーラム.DEIM 2021.
  • 2. Makoto P. Kato, Hiroaki Ohshima, Ying-Hsang Liu, Hsin-Liang Chen. Overview of the NTCIR-15 Data Search Task. Proceedings of the 15th NTCIR Conference on Evaluation of Information Access Technologies (NTCIR-15) , pp. 267-273, 2020.
  • 3. 中野 優,加藤 誠.クエリと文書のフィールドを考慮した被引用統計データの検索.情報処理学会論文誌 データベース 14, 2021.
  • 4. Makoto P Kato, Hiroaki Ohshima, Ying-Hsang Liu, Hsin-Liang Chen: A Test Collection for Ad-hoc Dataset Retrieval. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021) , 2021.
  • 5. 中野 優,加藤 誠.被引用統計データのセル特定データセットの構築.第 14 回データ工学と情報マネジメントに関するフォーラム.DEIM 2022.