Brief Bio

  • Ph.D. candidate at Nanjing University of Science & Technology
  • Apply Natural Language Processing (NLP), Machine Learning (ML) and Deep Learning (DL) to solve research problems
  • Areas of Research:Document clustering and classification, Text embedding and graph embedding, Scientific software recommendation, Citation recommendation (Dissertation Topic)

Education

  • Ph.D. in Information Science
    Nanjing University of Science & Technology, China (2014 - 2020)
    Supervisor:Chengzhi Zhang
  • B.M. in Information Management and System
    Nanjing University of Science & Technology, China (2010 - 2014)
  • Exchange Student in Economic Management
    Ajou University, South Korea (Spring 2012)

Experience

Data Scientist Intern, Tencent, Shenzhen (07/2020 - Now)

  • Customer Research & User Experience Design Center
  • Data preprocessing, Name entity extraction, Knowledge graph

Algorithm Engineer Intern, Aegis, Nanjing (03/2020 - 04/2020)

  • Data preprocessing, Name entity extraction

Algorithm Engineer Intern, Bytedance, Beijing (11/2018 - 01/2019)

  • Department of Data and NLP
  • Data preprocessing, modeling training and query classification service construction

Visiting Scholar, Indiana Univerisity Bloomington, United States (10/2017 - 10/2018)

  • Employed at Cyberinfrastructure for Network Science (CNS) Center, Department of Information and Library Science, School of Informatics, Computing, and Engineering
  • Supervisor: Katy BornerXiaozhong Liu

Publication

  • Shutian Ma, Chengzhi Zhang*, and Xiaozhong Liu. Chronological Citation Recommendation with Time Preference. Scientometrics. 2020. (under review)
  • Shutian Ma, Chengzhi Zhang*, and Xiaozhong Liu. A review of citation recommendation: from textual content to enriched context. Scientometrics. 2020, 122(3): 1445-1472.(SSCI)
  • Katy Börner, Olga Scrivner, Leonard E. Cross, Michael Gallant, Shutian Ma, Adam S. Martin, Elizabeth Record, Haici Yang, and Jonathan M. Dilger. Mapping the co-evolution of artificial intelligence, robotics, and the internet of things over 20 years (1998-2017). arXiv preprint arXiv:2006.02366 (2020).
  • Shutian Ma, Heng Zhang, Tianxiang Xu, Jin Xu, Shaohu Hu, and Chengzhi Zhang. IR&TM-NJUST@ CLSciSumm-19. (BIRNDL2019), July 2019, Paris, France.
  • Jin Xu, Chengzhi Zhang and Shutian Ma. Ensemble System for Identification of Cited Text Spans: Based on Two Steps of Feature Selection, CCIR 2019
  • Zheng Gao, Vincent Malic, Shutian Ma, and Patrick Shih. How to Make a Successful Movie: Factor Analysis from both Financial and Critical Perspectives [C]. In: Proceedings of the International Conference on Information, Springer, Cham, 2019: 669-678. (EI)
  • Heng Zhang, Shutian Ma, and Chengzhi Zhang. Using Full-text of Academic Articles to Find Software Clusters [C]. In: Proceedings of the 17th International Conference on Scientometrics and Informetrics (ISSI 2019), Rome, Italy, 2019(small)
  • Shutian Ma, Heng Zhang, Jin Xu, Chengzhi Zhang*. NJUST @ CLSciSumm-18. In: Proceedings of the 3nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2018), July 2018, Michgen, USA. (EI)
  • Shutian Ma, Yingyi Zhang, Chengzhi Zhang*. Using multiple Web resources and inference rules to classify Chinese word semantic relation. Information Discovery and Delivery 46.2 (2018): 120-126. (SSCI)
  • Shutian Ma, and Chengzhi Zhang. Using Full-text Academic Articles and Wikipedia to Find Alternative Free Bioinformatics Software. SIGMET 2018.
  • Shutian Ma, Jin Xu, Chengzhi Zhang*. Automatic identification of cited text spans: a multi-classifier approach over imbalanced dataset[J]. Scientometrics, 2018, 116(2): 1303-1330.(SSCI)
  • Katy Börner, Olga Scrivner, Mike Gallant, Shutian Ma, Xiaozhong Liu, Keith Chewning, Lingfei Wu, and James A. Evans. “Skill discrepancies between research, education, and jobs reveal the critical need to supply soft skills for the data economy.” Proceedings of the National Academy of Sciences115, no. 50 (2018): 12630-12637.(SCI)
  • Shutian Ma, Jin Xu, Jie Wang and Chengzhi Zhang*. NJUST @ CLSciSumm-17. In: Proceedings of the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2017), Aug, 2017, Tokyo, Japan. (Winner of CL-SciSumm 2017) (EI)
  • Shutian Ma, Chengzhi Zhang*. Document Representation and Clustering Models for Bilingual Documents Clustering. In: Proceedings of 2017 Annual Meeting of the Association for Information Science and Technology (ASIST’2017), Washington, DC, USA, 2017.
  • Shutian Ma, Chengzhi Zhang*. Using Full-text to Evaluate Impact of Different Software Groups. In: Proceedings of the 16th International Conference on Scientometrics and Informetrics (ISSI 2017), Wuhan, China, 2017. (EI)
  • Shutian Ma, Chengzhi Zhang*. Documents Representation for Comparable Corpora Clustering: A Preliminary Study. In: Proceedings of iConference2017, March 22-25, Wuhan, China, 2017.
  • Qiangbing Wang, Shutian Ma, Chengzhi Zhang*. Predicting Users’ Demographic Characteristics in a Chinese Social Media Network. The Electronic Library. 2017, 35(4): 758-769. (SSCI)
  • Jie Wang, Shutian Ma, Chengzhi Zhang*. CitationAS: A Summary Generation Tool Based on Clustering of Retrieved Citation Content. In: Proceedings of Second Workshop on Mining Scientific Papers: Computational Linguistics and Bibliometrics (CLBib-2017), Wuhan, China, 2017. (EI)
  • Yingyi Zhang, Guo Chen, Chengzhi Zhang*, Shutian Ma. Analyzing scientific user tagging behavior on academic blogs according to tag’s content characteristics - a preliminary study. In: Proceedings of iConference2017, March 22-25, Wuhan, China, 2017.
  • Shutian Ma, Xiaoyong Zhang, Chengzhi Zhang*. NLPCC 2016 Shared Task: Chinese Words Similarity Measure via Ensemble Learning based on Multiple Resources. In: Proceedings of the Fifth Conference on Natural Language Processing and Chinese Computing & The Twenty Fourth International Conference on Computer Processing of Oriental Languages (NLPCC-ICCPOL 2016). Kunming, China, 2016: 862–869. (EI)
  • Shutian Ma, Chengzhi Zhang*, Daqing He. Document Representation Methods for Clustering Bilingual Documents. In: Processing of the 2016 Annual Meeting of the Association for Information Science and Technology (ASIST’2016), Copenhagen, Denmark, 2016.
  • Shutian Ma, Chengzhi Zhang*. Automatic Collection of the Parallel Corpus with Little Prior Knowledge. In: Proceedings of the 13th China National Conference on Computational Linguistics (CCL2014), Wuhan, China, 2014: 95-106. (EI)
  • 章成志, 徐津, 马舒天.(2019). 学术文本被引片段的自动识别研究(Automatic Identification of Cited Spans in Academic Articles). 情报理论与实践:1-11.
  • 章成志, 马舒天, 揭春雨, & 姚旭晨. (2018). 基于双语 URL 匹配模式可信度的平行网页识别研究(Detection of Parallel Web Pages Based on the Automatically Discovered Bilingual URL Pairing Patterns). 中文信息学报, 32(3), 91-100.

Competition

Scientific Document Summarization, CL-SciSumm, 2017-2019

  • Collaborated with two colleagues (Leader)
  • Placed 1st among 9 systems in Task 1A (2017) and placed 2nd among 9 systems in Task 1B (2018)
  • Shared Task presented at BIRNDL 2017/2018/2019, co-located after the ACM SIGIR Conference

The Evaluation of Text Sourcing Technology, SMP-ETST, 2018

  • Collaborated with two colleagues (Leader), placed the 3rd among 13 systems
  • Shared Tasks in Conference on Social Media Processing in 2018 (SMP CUP 2018)

Chinese Word Semantic Relation Classification, NLPCC, 2017

  • Collaborated with one colleague (Leader), placed the 8th among 17 systems
  • Shared Tasks in the 6th Conference on Natural Language Processing and Chinese Computing (NLPCC 2017)

Chinese Word Similarity Measurement, NLPCC-ICCPOL, 2016

  • Collaborated with one colleague (Leader), placed the 11th among 24 systems
  • Shared Tasks in Conference on Natural Language Processing and Chinese Computing & International Conference on Computer Processing of Oriental Languages (NLPCC-ICCPOL 2016)

Award

  • 2018年研究生国家奖学金一等奖(National scholarship for Postgraduates, First Prize, 2018)
  • 2018年工信部创新创业奖学金一等奖(Innovation and entrepreneurship scholarship of Ministry of Industry and Information Technology, First Prize, 2018)
  • 2016年第六届全国情报学博士生论坛二等奖(The 6th National Information Science Doctoral Forum, 2016)
  • 2015年全国情报学博士生学术论坛优秀论文一等奖(The National Information Science Doctoral Forum held by Beijing University, 2015)
  • 2015年第五届全国情报学博士生论坛暨2015中国信息资源管理论坛优秀论文三等奖(The 5th National Information Science Doctoral Forum and 2015 China Information Resource Management Forum, 2015)