My work in the area of opinion analysis explores how honesty and sentiment are expressed in online social media text, with an emphasis on detecting deceptive opinion spam—fictitious opinions that have been deliberately written to sound authentic, in order to deceive the reader. My work in this area has produced the first publicly available gold standard corpus of deceptive opinion spam, shown the efficacy of training supervised text classifiers to detect deception, and introduced a Bayesian framework for obtaining community-level estimates of the rates of deception in online review portals, e.g., TripAdvisor and Yelp. A demo of our deception research is available at ReviewSkeptic.com.
My work in the area of information retrieval focuses on improving search within collections of short documents, e.g., tweets, by exploiting term co-occurrence ("topic modeling") in conjunction with explicit topic indicators, e.g., hashtags, to improve topic identification and relevance assessment. This work has been applied at IBM Research for entry into NIST's Text REtrieval Conference (TREC 2012).