Statistical Dataset Search

  • March 23, 2021

We proposed a method to identify the statistical dataset that a text is citing to. For example, given a sentence such as “The number of recognized criminal offenses in 2017 was about 910k,” we can identify from the words “criminal offenses” that the sentence is citing dataset from the Metropolitan Police Department’s crime statistics.


Yu Nakano

Doctoral Course Student

Makoto P. Kato

Associate Professor


  • 1. 中野優,加藤誠.誤引用検証ための被引用統計データの検索.第 13 回データ工学と情報マネジメントに関するフォーラム.DEIM 2021.
  • 2. Makoto P. Kato, Hiroaki Ohshima, Ying-Hsang Liu, Hsin-Liang Chen. Overview of the NTCIR-15 Data Search Task. Proceedings of the 15th NTCIR Conference on Evaluation of Information Access Technologies (NTCIR-15) , pp. 267-273, 2020.
  • 3. 中野 優,加藤 誠.クエリと文書のフィールドを考慮した被引用統計データの検索.情報処理学会論文誌 データベース 14, 2021.
  • 4. Makoto P Kato, Hiroaki Ohshima, Ying-Hsang Liu, Hsin-Liang Chen: A Test Collection for Ad-hoc Dataset Retrieval. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021) , 2021.