A supervised approach to quantifying sentence similarity: with application to evidence based medicine
Following the Evidence Based Medicine (EBM) practice, practitioners make use of the existing evidence to make therapeutic decisions. This evidence, in the form of scientific statements, is usually found in scholarly publications such as randomised control trials and systematic reviews. However, finding such information in the overwhelming amount of published material is particularly challenging. Approaches have been proposed to automatically extract scientific artefacts in EBM using standardised schemas. Our work takes this stream a step forward and looks into consolidating extracted artefacts-i.e., quantifying their degree of similarity based on the assumption that they carry the same rhetorical role. By semantically connecting key statements in the literature of EBM, practitioners are not only able to find available evidence more easily, but also can track the effects of different treatments/outcomes in a number of related studies. We devise a regression model based on a varied set of features and evaluate it both on a general English corpus (the SICK corpus), as well as on an EBM corpus (the NICTA-PIBOSO corpus). Experimental results show that our approach performs on par with the state of the art on the general English and achieves encouraging results on the biomedical text when compared against human judgement.
|ISBN||1932-6203 (Electronic) 1932-6203 (Linking)|
|Authors||Hassanzadeh, H. ; Groza, T. ; Nguyen, A. ; Hunter, J.;|
|Responsible Garvan Author|
|Publisher Name||PLoS One|
|URL link to publisher's version||http://www.ncbi.nlm.nih.gov/pubmed/26039310|
|OpenAccess link to author's accepted manuscript version||https://publications.gimr.garvan.org.au/open-access/13182|