Automatically Generating a Legal Thesaurus
Hugo de Vos
(Radboud University)
Saturday (April 22nd), 13:45-14:10
Lipsius 147

Thesauri are a useful tool for improving the quality search engines. The use of a thesaurus makes a search engine return more results and suppress results that are less relevant. A problem with thesauri however, is that they are labor intensive to make and maintain and are therefore expensive. For this reason I study different methods for automatically generating a thesaurus from a large corpus of text. As a case study I took the legal domain for the availability of large amounts of text and the particular need for automatic thesaurus generation in this field. The research consists of two stages: 1) selecting candidate terms for the thesaurus and 2) find synonym relations and hypernym (parent-child) relations. For stage one I use a classic method based on the log-likelihood of a term and for stage two I compare bootstrapping algorithms with word embedding methods. 

I will present the results of the first stage and look forward to the second stage of the project.