Understanding and exploiting user intent in community question answering

Chen, Long (2014) Understanding and exploiting user intent in community question answering. PhD thesis, Birkbeck, University of London.

[img]
Preview
PDF
cp_phd_thesis_longchen.pdf - Full Version

Download (3MB) | Preview
Print Copy Information: http://vufind.lib.bbk.ac.uk/vufind/Record/486914

Abstract

A number of Community Question Answering (CQA) services have emerged and proliferated in the last decade. Typical examples include Yahoo! Answers, WikiAnswers, and also domain-specific forums like StackOverflow. These services help users obtain information from a community - a user can post his or her questions which may then be answered by other users. Such a paradigm of information seeking is particularly appealing when the question cannot be answered directly by Web search engines due to the unavailability of relevant online content. However, question submitted to a CQA service are often colloquial and ambiguous. An accurate understanding of the intent behind a question is important for satisfying the user's information need more effectively and efficiently. In this thesis, we analyse the intent of each question in CQA by classifying it into five dimensions, namely: subjectivity, locality, navigationality, procedurality, and causality. By making use of advanced machine learning techniques, such as Co-Training and PU-Learning, we are able to attain consistent and significant classification improvements over the state-of-the-art in this area. In addition to the textual features, a variety of metadata features (such as the category where the question was posted to) are used to model a user's intent, which in turn help the CQA service to perform better in finding similar questions, identifying relevant answers, and recommending the most relevant answerers. We validate the usefulness of user intent in two different CQA tasks. Our first application is question retrieval, where we present a hybrid approach which blends several language modelling techniques, namely, the classic (query-likelihood) language model, the state-of-the-art translation-based language model, and our proposed intent-based language model. Our second application is answer validation, where we present a two-stage model which first ranks similar questions by using our proposed hybrid approach, and then validates whether the answer of the top candidate can be served as an answer to a new question by leveraging sentiment analysis, query quality assessment, and search lists validation.

Item Type: Thesis (PhD)
Copyright Holders: The copyright of this thesis rests with the author, who asserts his/her right to be known as such according to the Copyright Designs and Patents Act 1988. No dealing with the thesis contrary to the copyright or moral rights of the author is permitted.
School/Department: School of Business, Economics & Informatics > Computer Science & Information Systems
Depositing User: ORBIT Editor
Date Deposited: 17 Sep 2014 11:34
Last Modified: 17 Sep 2014 11:43
URI: http://bbktheses.da.ulcc.ac.uk/id/eprint/77

Actions (ORBIT staff only)
View Item View Item