Lecture by D. Frolov on "Annotation of a Document Collection by Finding Thematic Fuzzy Clusters and Parsimoniously Lifting Them in a Domain Taxonomy"
On Wednesday, May 16 the all-Russian seminar "Mathematical methods of decision analysis in economics, finance and politics" was held. D. Frolov gave a lecture on "Annotation of a Document Collection by Finding Thematic Fuzzy Clusters and Parsimoniously Lifting Them in a Domain Taxonomy".
Authors: D. Frolov (FCS HSE Moscow), S. Nascimento (New University of Lisbon), T. Fenner (University of London), B. G. Mirkin (FCS HSE Moscow)
Information retrieval currently heavily relies on expert judgement of the relevance of retrieved documents to the query. This paper attempts do without this manual component by shifting the emphasis to the issue of interpretation of the set of retrieved documents from the issue of their relevance. We propose a multistage approach including the following steps:
1. Domain taxonomy making.
2. Preparing table T of "taxonomy topic - document" relevance estimates.
3. Building fuzzy clusters of topics to correspond to the structure of the document collection.
4. Parsimoniously lifting of the thematic clusters to higher ranks of the taxonomy.
We apply this method to the analysis of about 18000 papers published in 17 Springer Computer Science journals 1998-2017, using a taxonomy of the data science domain, developed based on the Classification of Computer Sciences (2012) by ACM.