当前位置:首页 学术交流 > 学术讲座

University of Central Arkansas盛胜利博士学术报告

    时间:2012年7月11日上午9:00-10:30
    地点:学术活动中心二楼小报告厅
    报告人:盛胜利 博士
    单位:University of Central Arkansas,USA
    题目:Get Another Labels? Improving Data Quality and Data Mining Using Multiple Noisy Labelers
    摘要:This talk presents the repeated acquisition of labels for data items when the labeling is imperfect. We examine the improvement (or lack thereof) in data quality via repeated labeling, and focus especially on the improvement of training labels for supervised induction. With the outsourcing of small tasks becoming easier, for example via Rent-A-Coder or Amazon’s Mechanical Turk, it often is possible to obtain less-than-expert labeling at low cost. With low-cost labeling, preparing the unlabeled part of the data can become considerably more expensive than labeling. We present repeated-labeling strategies of increasing complexity, and show several main results. (i) Repeated-labeling can improve label quality and model quality, but not always. (ii) When labels are noisy, repeated labeling can be preferable to single labeling even in the traditional setting where labels are not particularly cheap. (iii) As soon as the cost of processing the unlabeled data is not free, even the simple strategy of labeling everything multiple times can give considerable advantage. (iv) Repeatedly labeling a carefully chosen set of points is generally preferable, and we present a robust technique that combines different notions of uncertainty to select data points for which quality should be improved. The bottom line: the results show clearly that when labeling is not perfect, selective acquisition of multiple labels is a strategy that data miners should have in their repertoire; for certain label-quality/cost regimes, the benefit is substantial.
    报告人简介:
盛胜利博士(University of Central Arkansas,USA,Assistant Professor),1999年7月于苏州大学获硕士学位,2003年12月于加拿大新宾士威克大学获硕士学位,2007年8月于加拿大西安大略大学获博士学位,2007年9月至2009年8月间于美国纽约大学斯特恩商学院做博士后研究员。研究领域为数据挖掘、机器学习、人工智能、数据安全和决策支持,及其在商业、工业、生物信息学、医疗信息学、软件工程等领域的应用。
现任International Journal of Information Systems in the Service Sector (IJISSS) (board member)、Central European Journal of Computer Science (Editor)、CCSC (Publicity chair),并多次在多家高级国际学术会议和期刊担任评审委员会委员,国际学期刊包括IEEE TKDE、IEEE TSMC、JML、ACM TIST、ACM TKDD、DMKD、IJITDM、JCST、IJISSS、INFORMS JOC 等。国际学术会议包括 Ubicomp、KDD、ICML、ICDM、IJCAI、BIBM、WI、PKDD、PAKDD、CCSC、DMIN、HCOMP、NAACL、INFORM等。在加拿大和美国的十余年间,盛胜利博士参与和主持了多项加拿大自然科学与工程研究理事会资助课题和美国自然科学基金,在国际学术会议和期刊上共发表论文30多篇。

              欢迎广大师生前往交流!