Readings

Some of these are hand picked but in general good sources for papers come form:

Challenge Papers

These are papers that respond to the MSR Mining Challenge. These are short papers.

MSR Papers

Longer MSR full conference technical papers:

ICSE Papers

FSE Papers

ICSME

EMSE: Empirical Software Engineering Journal

https://link.springer.com/journal/10664/articles

Transactions on Software Engineering Journal

TSE Search

TOSEM: ACM Transactions on Software Engineering and Methodology

https://dl.acm.org/loi/tosem

1 Papers

1.1 Full Papers

1.1.1 A Contextual Approach towards More Accurate Duplicate Bug Report Detection

Anahita Alipour, Abram Hindle, and Eleni Stroulia (University of Alberta, Canada)

1.1.2 An Empirical Study of End-user Programmers in the Computer Music Community

Gregory Burlet and Abram Hindle (University of Alberta, Canada)
http://webdocs.cs.ualberta.ca/~gburlet/files/MSR2015_musiccoders.pdf

1.1.3 An empirical study on the evolution of design patterns Aversano, L., Canfora, G., Cerulo, L., Del Grosso, C., and Di Penta, M. . In Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering

1.1.4 Analysing Software Repositories to Understand Software Evolution, Marco D’Ambros, Harald Gall, Michele Lanza, and Martin Pinzger

1.1.5 Automatic identification of bug-introducing changes by Sunghun Kim, Thomas Zimmermann, Kai Pan, E., and James Whitehead, Jr.

1.1.6 Beyond Lines of Code: Do We Need More Complexity Metrics?, by Israel Herraiz and Ahmed E. Hassan

1.1.7 BugCache for Inspections : Hit or Miss?

1.1.8 Bugs as Inconsistent Behavior: A General Approach to Inferring Errors in Systems Code.

Dawson R. Engler, David Yu Chen, Andy Chou
SOSP 2001: 57-72
http://doi.acm.org/10.1145/502034.502042

1.1.9 Change Impact Graphs: Determining the Impact of Prior Code Changes German, D.M., Robles, G, and Hassan, A. , Journal of Information and Software Technology (INFSOF), Volume 51, Number 10, pages 1394–1408, Oct 2009.

1.1.10 Characteristics of Useful Code Reviews: An Empirical Study at Microsoft

Amiangshu Bosu, Michaela Greiler and Christian Bird
(University of Alabama, United States, Microsoft Research, United States)
http://www.amiangshu.com/papers/CodeReview-MSR-2015.pdf

1.1.11 Clones: What is that smell?

Foyzur Rahman, Christian Bird, Premkumar T. Devanbu
MSR 2010:72-81
http://dx.doi.org/10.1109/MSR.2010.5463343

1.1.12 Copy-Paste as a Principled Engineering Tool, by Michael Godfrey and Cory Kapser + ‘Cloning Considered Harmful’ Considered Harmful, by Cory J. Kapser and Michael W. Godfrey. Proc. of the 2006 Working Conference on Reverse Engineering (WCRE-06), 23-28 October, Benevento, Italy.

1.1.13 Cross versus Within-Company Cost Estimation Studies: A Systematic Review.

Barbara A. Kitchenham, Emilia Mendes, Guilherme Horta Travassos
IEEE Trans. Software Eng. 33(5): 316-329 (2007)
http://dx.doi.org/10.1109/TSE.2007.1001

1.1.14 Evidence-Based Failure Prediction, by Nachi Nagappan and Thomas Ball + A Validation of Object-Oriented Design Metrics as Quality Indicators, by Victor R. Basili, Lionel C. Briand, and Walcelio L. Melo, IEEE Trans. on Software Engineering, 22(10, October 1996.

1.1.15 Gerrit Software Code Review Data from Android

Murtuza Mukadam, Christian Bird, and Peter C. Rigby
(Concordia University, Canada; Microsoft Research, USA)

1.1.16 GreenMiner: A Hardware Based Mining Software Repositories Software Energy Consumption Framework

Abram Hindle, Alex Wilson, Kent Rasmussen, Jed Barlow, Joshua Campbell and Stephen Romansky
(University of Alberta, Canada)
http://webdocs.cs.ualberta.ca/~hindle1/2014/gm.pdf

1.1.17 Hipikat: recommending pertinent software development artifacts, by Davor Cubranic and Gail C. Murphy

1.1.18 How Well do Experienced Software Developers Predict Software Change?, by Mikael Lindvall and Kristian Sandahl, Journal of Systems and Software, 43(1), Jan 1998.

1.1.19 Identifying Changed Source Code Lines from Version Repositories by Gerardo Canfora, Luigi Cerulo, Massimiliano Di Penta. Proceedings of the Fourth International Workshop on Mining Software Repositories, 2007 (best paper award).

1.1.20 Identifying reasons for software change using historic databases by Audris Mockus and Larry G. Votta

1.1.21 Improving the Effectiveness of Test Suite Through Mining Historical Data

Jeff Anderson, Saeed Salem and Hyunsook Do
(Microsoft, USA; North Dakota State University, USA)

1.1.22 Macro-level software evolution: a case study of a large software compilation Jesus M. Gonzalez-Barahona, Gregorio Robles, et al Journal of mpirical Software Engineering, Volume 14, Number 3 / June, 2009. Extended version of best paper award.

1.1.23 Measuring the Progress of Projects Using the Time Dependence of Code Changes, by Omar Alam, Bram Adams and Ahmed E. Hassan.

1.1.24 Mining Android App Usages for Generating Actionable GUI-based Execution Scenarios

Mario Linares-Vásquez, Martin White, Carlos Eduardo Bernal Cardenas, Kevin Moran and Denys Poshyvanyk (The College of William and Mary, United States) http://www.cs.wm.edu/~denys/pubs/MSR'15-MonkeyLab-CRC.pdf

Christian Bird, Alex Gourley, Premkumar T. Devanbu, Michael Gertz, Anand Swaminathan
MSR 2006:137-143
http://doi.acm.org/10.1145/1137983.1138016

1.1.26 Mining Energy-Aware Commits

Irineu Moura, Gustavo Pinto, Felipe Ebert and Fernando Castor
(Federal University of Pernambuco, Brazil)
http://gustavopinto.org/lost+found/msr2015.pdf

1.1.27 Mining Energy-Greedy API Usage Patterns in Android Apps: an Empirical Study

Mario Linares-Vásquez, Gabriele Bavota, Carlos Eduardo Bernal Cardenas, Rocco Oliveto, Massimiliano Di Penta and Denys Poshyvanyk
(College of William and Mary, USA; University of Sannio, Italy; Universidad Nacional de Colombia, Colombia; University of Molise, Italy)
http://www.cs.wm.edu/~denys/pubs/MSR14-Android-energy-CRC.pdf

1.1.28 Mining Questions About Software Energy Consumption

Gustavo Pinto, Fernando Castor and Yu David Liu
(Federal University of Pernambuco, Brazil; SUNY Binghamton, USA)
http://gustavopinto.github.io/lost+found/msr.pdf

1.1.30 Mining version histories to guide software changes, Thomas Zimmermann, Peter Weißgerber, Stephan Diehl, Andreas Zeller

1.1.31 Novel applications of Machine Learning in Testing by Lionel Briand

1.1.32 Open Borders? Immigration in Open Source Projects.

Christian Bird, Alex Gourley, Premkumar T. Devanbu, Anand Swaminathan, Greta Hsu
MSR 2007:6
http://doi.ieeecomputersociety.org/10.1109/MSR.2007.23

1.1.33 Scalable statistical bug isolation by Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan

1.1.35 Software Bertillonage: Finding the provenance of an entity, by Julius Davies, Abram J. Hindle, Daniel M. German, Michael W. Godfrey.

1.1.36 Studying Developers Copy and Paste Behavior

Tarek Ahmed, Weiyi Shang and Ahmed Hassan
(Queen’s University, Canada)

1.1.37 Syntax Errors Just Aren’t Natural: Improving Error Reporting with Language Models

Joshua Campbell, Abram Hindle and José Nelson Amaral
(University of Alberta, Canada)
http://webdocs.cs.ualberta.ca/~joshua2/syntax.pdf

1.1.38 The Evidence for Design Patterns, by Walter Tichy + Design Pattern Detection Using Similarity Scoring, N. Tsantalis, A. Chatzigeorgiou, G. Stephanides, and S. T. Halkidis, IEEE Trans. on Software Engineering, November 2006.

1.1.39 The Impact of Code Review Coverage and Code Review Participation on Software Quality: A Case Study of the Qt, VTK, and ITK Projects

Shane Mcintosh, Yasutaka Kamei, Bram Adams and Ahmed E. Hassan
(Queen’s University, Canada; Kyushu University, Japan; Polytechnique Montréal, Canada)
http://sail.cs.queensu.ca/publications/pubs/msr2014-mcintosh.pdf

1.1.40 The Past, Present, and Future of Software Evolution, Michael W. Godfrey and Daniel M. German. Invited paper in Proc. of Frontiers of Software Maintenace track at the 2008 IEEE Intl. Conf. on Software Maintenance (ICSM-08), October 2008, Beijing, China.

1.1.41 The Promises and Perils of Mining Git. In Proceedings of the Sixth Working Conference on Mining Software Repositories (MSR 09), Vancouver, Canada, 2009. Christian Bird, Peter C. Rigby, Earl T. Barr, David J. Hamilton, Daniel M. German, Prem Devanbu.

Christian Bird, Peter C. Rigby, Earl T. Barr, David J. Hamilton, Daniel M. Germán, Premkumar T. Devanbu
MSR 2009:1-10
http://dx.doi.org/10.1109/MSR.2009.5069475

1.1.42 The Promises and Perils of Mining GitHub

Eirini Kalliamvakou, Georgios Gousios, Kelly Blincoe, Leif Singer, Daniel German and Daniela Damian
(University of Victoria, Canada; Delft University of Technology, Netherlands)

1.1.43 The secret life of bugs: Going past the errors and omissions in software repositories, by Jorge Aranda and Gina Venolia, Proc. of the 2009 Intl. Conf. on Software Engineering (ICSE-09), Vancouver, May 2009.

1.1.44 The Top Ten List: Dynamic Fault Prediction, by Ahmed E. Hassan and Richard C. Holt, Proc. of the 2005 IEEE Intl. Conf. on Software Maintenance (ICSM-05), Budapest, Hungary, Sept. 2005.

1.1.45 Toward Deep Learning Software Repositories

Martin White, Christopher Vendome, Mario Linares-Vásquez and Denys Poshyvanyk
(College of William and Mary, United States)
http://www.cs.wm.edu/~denys/pubs/MSR'15-DeepLearning-CRC.pdf

1.1.46 Towards Building a Universal Defect Prediction Model

Feng Zhang, Audris Mockus, Iman Keivanloo and Ying Zou
(Queen’s University, Canada; Avaya Labs Research, USA)
http://post.queensu.ca/~zouy/files/msr2014.pdf

1.1.47 Understanding the impact of code and process metrics on post-release defects: A case study on the Eclipse project, Emad Shihab, Zhen Ming Jiang, Walid M. Ibrahim, Bram Adams, Ahmed E. Hassan, Proc. of the 2010 ACM-IEEE Intl. Symposium on Empirical Software Engineering and Measurement (ESEM-10), Bolzano-Bolzen, Italy, Sept 2010.

1.1.48 Using information fragments to answer the questions developers ask.

Thomas Fritz, Gail C. Murphy
ICSE 2010: 175-184
http://doi.acm.org/10.1145/1806799.1806828

1.1.49 Using Software Dependencies and Churn Metrics to Predict Field Failures: An Empirical Case Study Nachiappan Nagappan, Thomas Ball

1.1.50 Visualizing software changes, Stephen G. Eick, Todd L. Graves, Alan F. Karr, Audris Mockus, and Paul Schuster

1.1.51 What is the Gist? Understanding the Use of Public Gists on GitHub

Weiliang Wang, Germán Poo-Caamaño, Evan Wilde and Daniel German
(University of Victoria, Canada)

1.1.52 What’s Hot and What’s Not: Windowing Developer Topic Analysis? by Abram J. Hindle, Michael W. Godfrey, Richard C. Holt.

1.1.53 When do changes induce fixes?: Jacek Śliwerski International Max Planck Research School, Saarbrücken, Germany

Thomas Zimmermann Saarland University, Saarbrücken, Germany, Andreas Zeller Saarland University, Saarbrücken, Germany

1.1.54 Who Should Fix This Bug?, by John Anvik, Lyndon Hiew and Gail C. Murphy, Proc. of the 2006 Intl. Conference on Software Engineering (ICSE-06), Shanghai, May 2006.

1.1.55 Will My Patch Make It? And How Fast?: Case Study on the Linux Kernel

Yujuan Jiang, Bram Adams, and Daniel M. German (Polytechnique Montréal, Canada; University of Victoria, Canada)
http://mcis.polymtl.ca/publications/2013/msr_jojo.pdf

1.1.56 Yesterday’s Weather: Guiding Early Reverse Engineering Efforts by Summarizing the Evolution of Changes, Tudor Girba, Stephane Ducasse, Michele Lanza, Proc. 20th IEEE Int’l Conference on Software Maintenance (ICSM'04), September 2004, pp. 40-49.

1.1.57 An Evaluation of Open-Source Software Microbenchmark Suites for Continuous Performance Assessment

Christoph Laaber and Philipp Leitner.
MSR 2018

1.1.58 SOTorrent: Reconstructing and Analyzing the Evolution Stack Overflow Posts

Sebastian Baltes, Lorik Dumani, Christoph Treude and Stephan Diehl.
MSR 2018

1.1.59 Data-Driven Search-based Software Engineering

Vivek Nair, Amritanshu Agrawal, Jianfeng Chen, Wei Fu, George Mathew, Tim Menzies, Leandro Minku, Markus Wagner and Zhe Yu.
MSR 2018

1.1.60 CLEVER: Combining Code Metrics with Clone Detection for Just-In-Time Fault Prevention and Resolution in Large Industrial Projects

Mathieu Nayrolles and Abdelwahab Hamou-Lhadj.
MSR 2018

Mario Linares-Vasquez, Gabriele Bavota and Camilo Escobar-Velasquez
Universidad de los Andes, Università della Svizzera italiana (USI)
MSR 2017

1.1.62 Extracting Code Segments and Their Descriptions from Research Articles [preprint]

Preetha Chatterjee, Benjamin Gause, Hunter Hedinger and Lori Pollock
University of Delaware

1.1.63 Structure and Evolution of Package Dependency Networks [preprint]

Riivo Kikas, Georgios Gousios, Marlon Dumas and Dietmar Pfahl
University of Tartu, Delft University of Technology
MSR 2017

1.1.64 The Impact Of Using Regression Models to Build Defect Classifiers [preprint]

Gopi Krishnan Rajbahadur, Shaowei Wang, Yasutaka Kamei and Ahmed E. Hassan
Queen’s University, Kyushu University
MSR 2017

1.1.65 Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments [preprint]

Fouad Nasser A. Al Omran and Christoph Treude
University of Adelaide
MSR 2017

1.1.66 GreenOracle: Estimating Software Energy Consumption with Energy Measurement Corpora

Shaiful Chowdhury and Abram Hindle
University of Alberta
MSR 2016

1.1.67 Mining Performance Regression Inducing Code Changes in Evolving Software

Qi Luo, Denys Poshyvanyk and Mark Grechanik
The College of William and Mary, University of Illinois at Chicago
MSR 2016

1.1.68 An Empirical Study on the Practice of Maintaining Object-Relational Mapping Code in Java Systems

Tse-Hsun Chen, Weiyi Shang, Jinqiu Yang, Ahmed E. Hassan, Michael W. Godfrey, Mohamed Nasser and Parminder Flora
Queen’s University, Concordia University, University of Waterloo, BlackBerry

1.1.69 Software Ingredients: Detection of Third-party Component Reuse in Java Software Release

Takashi Ishio, Raula Gaikovina Kula, Tetsuya Kanda, Daniel German and Katsuro Inoue
Osaka University, University of Victoria

1.1.70 A Look at the Dynamics of the JavaScript Package Ecosystem

Erik Wittern, Philippe Suter and Shriram Rajagopalan
IBM T.J. Watson Research Center

1.1.71 A Large-Scale Study On Repetitiveness, Containment, and Composability of Routines in Source Code

Anh Nguyen, Hoan Nguyen and Tien Nguyen
Iowa State University

1.1.72 A survey of machine learning for big code and naturalness

M Allamanis, ET Barr, P Devanbu, C Sutton
ACM Computing Surveys (CSUR) 51 (4), 81

1.1.73 Are deep neural networks the best choice for modeling source code?

VJ Hellendoorn, P Devanbu
Proceedings of the 2017 11th Joint Meeting on Foundations of Software …

1.1.74 Towards Accurate Duplicate Bug Retrieval Using Deep Learning Techniques

DOI: 10.1109/ICSME.2017.69
Conference: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME)

1.1.75 An Evaluation of Open-Source Software Microbenchmark Suites for Continuous Performance Assessment

Christoph Laaber University of Zurich, Philipp Leitner Chalmers | University of Gothenburg
http://www.ifi.uzh.ch/dam/jcr:ccf1399a-2d57-4ff9-a3b0-59d69616d5d3/msr18-author-version.pdf

1.1.76 CLEVER: Code Metrics with Clone Detection for Just-In-Time Fault Prevention and Resolution in Large Industrial Projects

Mathieu Nayrolles, Abdelwahab Hamou-Lhadj
http://users.encs.concordia.ca/~abdelw/papers/MSR18.pdf

1.1.77 Leveraging Historical Versions of Android Apps for Efficient and Precise Taint Analysis

John Jenkins Washington State University, Haipeng Cai Washington State University Pullman
http://chapering.github.io/pubs/msr18.pdf

1.1.78 Understanding the Usage, Impact, and Adoption of Non-OSI Approved Licenses

Rômulo Manciola Meloca - UFRGS, Gustavo Pinto - UFPA, Leonardo Pontes Baiser , Marco Mattos , Ivanilton Polato , Igor Wiese - Federal University of Technology - Paraná (UTFPR), Daniel M. German
http://gustavopinto.org/lost+found/msr2018b.pdf

1.1.79 How Swift Developers Handle Errors

Nathan Cassee , Gustavo Pinto, Fernando Castor, Alexander Serebrenik
http://gustavopinto.org/lost+found/msr2018a.pdf

1.1.80 What are your Programming Language’s Energy-Delay Implications?

Stefanos Georgiou, Maria Kechagia, Panos Louridas , Diomidis Spinellis
https://doi.org/10.1145/3196398.3196414 https://stefanos1316.github.io/my_curriculum_vitae/GKLS18.pdf

1.1.81 Automatically Assessing Code Understandability Reanalyzed: Combined Metrics Matter

Asher Trockman, Keenen Cates , Mark Mozina , Tuan Nguyen , Christian Kästner, Bogdan Vasilescu
https://cmustrudel.github.io/papers/msr18understandability.pdf https://2018.msrconf.org/details/msr-2018-papers/36/Automatically-Assessing-Code-Understandability-Reanalyzed-Combined-Metrics-Matter

1.1.82 Data-Driven Search-based Software Engineering

Vivek Nair , Amritanshu Agrawal, Jianfeng Chen , Wei Fu, George Mathew, Tim Menzies, Leandro Minku , Markus Wagner , Zhe Yu

1.1.83 The Open-Closed Principle of Modern Machine Learning Frameworks

Houssem Ben Braiek , Foutse Khomh, Bram Adams
http://swat.polymtl.ca/~foutsekh/docs/MSR-Houssem.pdf

1.1.84 A Benchmark Study on Sentiment Analysis for Software Engineering Research

Nicole Novielli, Daniela Girardi, Filippo Lanubile
https://doi.org/10.1145/3196398.3196403 https://arxiv.org/abs/1803.06525

1.1.85 Natural Language or Not (NLoN) - package for Software Engineering Text Analysis Pipeline

Mika Mäntylä, Fabio Calefato, Maëlick Claes
https://arxiv.org/pdf/1803.07292.pdf

1.1.86 Deep Learning Similarities from Different Representations of Source Code

Michele Tufano, Cody Watson , Gabriele Bavota, Massimiliano Di Penta, Martin White , Denys Poshyvanyk
http://www.cs.wm.edu/~mtufano/publications/C9.pdf

1.1.87 Bayesian Hierarchical Modelling for Tailoring Metric Thresholds

1.1.88 SCOR: Source Code Retrieval With Semantics and Order

1.1.89 PathMiner : A Library for Mining of Path-Based Representations of Code

Vladimir Kovalenko, Egor Bogomolov, Timofey Bryksin, Alberto Bacchelli
https://doi.org/10.5281/zenodo.2595271 https://zenodo.org/record/2595271#.XI0rPdiEZ-E https://2019.msrconf.org/details/msr-2019-papers/38/PathMiner-A-Library-for-Mining-of-Path-Based-Representations-of-Code

1.1.90 Import2vec: learning embeddings for software libraries

Bart Theeten, Frederik Vandeputte, Tom Van Cutsem
https://arxiv.org/abs/1904.03990

1.1.91 Semantic Source Code Models Using Identifier Embeddings

Vasiliki Efstathiou, Diomidis Spinellis
https://github.com/vefstathiou/scode-ft-embeddings/blob/master/MSR_2019_preprint.pdf

1.1.92 Exploring Word Embedding Techniques to Improve Sentiment Analysis of Software Engineering Texts

Eeshita Biswas, K. Vijay-Shanker, Lori Pollock
https://www.researchgate.net/publication/333389939_Exploring_Word_Embedding_Techniques_to_Improve_Sentiment_Analysis_of_Software_Engineering_Texts

1.1.93 Cleaning StackOverflow for Machine Translation

Masfiqur Rahman, Peter Rigby, Dharani Palani, Tien N. Nguyen

1.1.94 Predicting Good Configurations for GitHub and Stack Overflow Topic Models

Christoph Treude, Markus Wagner
https://cs.adelaide.edu.au/~christoph/msr19a.pdf

1.1.95 Time Present and Time Past: Analyzing the Evolution of JavaScript Code in the Wild

Dimitris Mitropoulos, Panos Louridas , Vitalis Salis, Diomidis Spinellis
https://dimitro.gr/assets/papers/MLSS19.pdf

1.1.96 The Software Heritage Graph Dataset: public software development under one roof

Antoine Pietri, Diomidis Spinellis, Stefano Zacchiroli
https://upsilon.cc/~zack/research/publications/msr-2019-swh.pdf

1.1.97 World of Code: An Infrastructure for Mining the Universe of Open Source VCS Data

Yuxing Ma, Christopher Bogart, Sadika Amreen, Russell Zaretzki, Audris Mockus

1.1.98 Crossflow: A Framework for Distributed Mining of Software Repositories

Dimitris Kolovos, Patrick Neubauer, Konstantinos Barmpis , Nicholas Matragkas, Richard Paige
https://drive.google.com/file/d/1pc81TTSbgMaq08mgh7fqw9DQ5Dg2waor

1.1.99 GreenHub Farmer: Real-world data for Android Energy Mining

Rui Pereira, Marco Couto, João Paulo Fernandes, Bruno Cabral, Hugo Matalonga, Simão Melo de Sousa, Fernando Castor
http://greenlab.di.uminho.pt/wp-content/uploads/2019/04/GreenHubFarmerFinal.pdf

1.1.100 GreenSource: a large-scale collection of Android code, tests and energy metrics

Rui Rua, Marco Couto, João Saraiva

1.1.101 The Emergence of Software Diversity in Maven Central

César Soto-Valero, Amine Benelallam, Nicolas Harrand, Olivier Barais, Benoit Baudry
https://arxiv.org/pdf/1903.05394.pdf

1.1.102 A Dataset of Parametric Cryptographic Misuses

Anna-Katharina Wickert, Michael Reif, Michael Eichberg, Anam Dodhy, Mira Mezini
https://akwick.github.io/publication/a-dataset-of-parametric-cryptographic-misuses/a-dataset-of-parametric-cryptographic-misuses.pdf https://2019.msrconf.org/details/msr-2019-Data-Showcase/12/A-Dataset-of-Parametric-Cryptographic-Misuses

1.1.103 Tracing Back Log Data to its Log Statement: From Research to Practice

Daan Schipper, Mauricio Aniche, Arie van Deursen
https://pure.tudelft.nl/portal/en/publications/tracing-back-log-data-to-its-log-statement-from-research-to-practice(9fc4a63c-57bf-4a80-aca2-48f5a8fb08a3).html

1.1.104 Using Large-Scale Anomaly Detection on Code to Improve Kotlin CompilerMSR - Technical Paper

Timofey Bryksin, Victor Petukhov, Ilya Alexin, Stanislav Prikhodko, Alexey Shpilman, Vladimir Kovalenko, Nikita Povarov
https://arxiv.org/abs/2004.01618 https://2020.msrconf.org/details/msr-2020-papers/8/Using-Large-Scale-Anomaly-Detection-on-Code-to-Improve-Kotlin-Compiler

1.1.105 An Empirical Study of Method Chaining in JavaMSR - Technical Paper

Tomoki Nakamaru, Tomomasa Matsunaga, Tetsuro Yamazaki, Soramichi Akiyama, Shigeru Chiba
https://static.csg.ci.i.u-tokyo.ac.jp/papers/20/nakamaru-msr2020.pdf https://2020.msrconf.org/details/msr-2020-papers/2/An-Empirical-Study-of-Method-Chaining-in-Java

1.1.106 A Tale of Docker Build Failures: A Preliminary StudyMSR - Technical Paper

Yiwen Wu, Yang Zhang, Tao Wang, Huaimin Wang
https://www.researchgate.net/publication/339840362_An_Empirical_Study_of_Build_Failures_in_the_Docker_Context https://2020.msrconf.org/details/msr-2020-papers/44/A-Tale-of-Docker-Build-Failures-A-Preliminary-Study

1.1.107 LogChunks: A Data Set for Build Log AnalysisMSR - Data Showcase

Carolin Brandt, Annibale Panichella, Andy Zaidman, Moritz Beller
https://pure.tudelft.nl/portal/files/71450840/paper.pdf https://2020.msrconf.org/details/msr-2020-Data-showcase/2/LogChunks-A-Data-Set-for-Build-Log-Analysis

1.1.108 A Dataset of DockerfilesMSR - Data Showcase

Jordan Henkel, Christian Bird, Shuvendu K. Lahiri, Thomas Reps
https://2020.msrconf.org/details/msr-2020-Data-showcase/15/A-Dataset-of-Dockerfiles

1.1.109 Detecting Video Game-Specific Bad Smells in Unity ProjectsMSR - Technical Paper

Antonio Borrelli, Vittoria Nardone, Giuseppe Di Lucca, Gerardo Canfora, Massimiliano Di Penta
http://www.ing.unisannio.it/mdipenta/papers/unitysmells.pdf https://2020.msrconf.org/details/msr-2020-papers/15/Detecting-Video-Game-Specific-Bad-Smells-in-Unity-Projects

1.1.110 The Scent of Deep Learning Code: An Empirical StudyMSR - Technical Paper

Hadhemi Jebnoun, Masud Rahman, Foutse Khomh, Houssem Ben Braiek
http://homepage.usask.ca/~masud.rahman/papers/hadhemi-MSR2020.pdf https://2020.msrconf.org/details/msr-2020-papers/40/The-Scent-of-Deep-Learning-Code-An-Empirical-Study

1.1.111 A Soft Alignment Model for Bug DeduplicationMSR - Technical Paper

Irving Muller Rodrigues, Daniel Aloise, Eraldo Rezende Fernandes, Michel Dagenais
https://amdls.dorsal.polymtl.ca/system/files/MSR2020.pdf https://2020.msrconf.org/details/msr-2020-papers/31/A-Soft-Alignment-Model-for-Bug-Deduplication

1.1.112 Large-Scale Manual Validation of Bugfixing ChangesMSR - Registered Reports

Steffen Herbold, Alexander Trautsch, Benjamin Ledel
https://osf.io/acnwk https://2020.msrconf.org/details/msr-2020-Registered-Reports/1/Large-Scale-Manual-Validation-of-Bugfixing-Changes

1.1.113 An Empirical Study on Regular Expression BugsMSR - Technical Paper

Peipei Wang, Chris Brown, Jamie Jennings, Kathryn Stolee
https://wangpeipei90.github.io/papers/msr2020_preprint.pdf https://2020.msrconf.org/details/msr-2020-papers/25/An-Empirical-Study-on-Regular-Expression-Bugs

1.1.114 SoftMon: A Tool to Compare Similar Open-source Software from a Performance PerspectiveMSR - Technical Paper

Shubhankar Suman Singh, Smruti Ranjan Sarangi
http://www.cse.iitd.ac.in/~shubhankar/MSR6.pdf https://2020.msrconf.org/details/msr-2020-papers/5/SoftMon-A-Tool-to-Compare-Similar-Open-source-Software-from-a-Performance-Perspectiv

1.1.115 A Study of Potential Code Borrowing and License Violations in Java Projects on GitHubMSR - Technical Paper

Yaroslav Golubev, Maria Eliseeva, Nikita Povarov, Timofey Bryksin
https://arxiv.org/pdf/2002.05237.pdf https://2020.msrconf.org/details/msr-2020-papers/16/A-Study-of-Potential-Code-Borrowing-and-License-Violations-in-Java-Projects-on-GitHub

1.1.116 Did You Remember To Test Your Tokens?MSR - Technical Paper

Danielle Gonzalez, Michael Rath, Mehdi Mirakhorli
https://doi.org/10.1145/3379597.3387471 https://arxiv.org/pdf/2006.14553.pdf https://2020.msrconf.org/details/msr-2020-papers/32/Did-You-Remember-To-Test-Your-Tokens-

1.1.117 Embedding Java Classes with code2vec: Improvements from Variable ObfuscationMSR - Technical Paper

Rhys Compton, Eibe Frank, Panos Patros, Abigail Koay
https://doi.org/10.1145/3379597.3387445 https://arxiv.org/abs/2004.02942 https://2020.msrconf.org/details/msr-2020-papers/6/Embedding-Java-Classes-with-code2vec-Improvements-from-Variable-Obfuscation

1.1.118 Can We Use SE-specific Sentiment Analysis Tools in a Cross-Platform Setting?MSR - Technical Paper

Nicole Novielli, Fabio Calefato, Davide Dongiovanni, Daniela Girardi, Filippo Lanubile
https://doi.org/10.1145/3379597.3387446 http://collab.di.uniba.it/nicole/wp-content/uploads/sites/6/2020/03/MSR_2020_cross_within_benchmark_preprint.pdf https://2020.msrconf.org/details/msr-2020-papers/7/Can-We-Use-SE-specific-Sentiment-Analysis-Tools-in-a-Cross-Platform-Setting-

1.1.119 Ethical Mining – A Case Study on MSR Mining ChallengesACM SIGSOFT Distinguished Paper AwardMSR - Technical Paper

1.1.120 From Innovations to Prospects: What Is Hidden Behind Cryptocurrencies?MSR - Technical Paper

Ang Jia, Ming Fan, Xi Xu, Di Cui, Wenying Wei, Zijiang Yang, Kai Ye, Ting Liu
https://doi.org/10.1145/3379597.3387439 https://github.com/island255/MSR2020_cryptocurrency https://2020.msrconf.org/details/msr-2020-papers/45/From-Innovations-to-Prospects-What-Is-Hidden-Behind-Cryptocurrencies-

1.1.121 Identifying Versions of Libraries used in Stack Overflow Code Snippets

Ahmed Zerouali Vrije Universiteit Brussel, Camilo Velázquez-Rodríguez Vrije Universiteit Brussel, Coen De Roover Vrije Universiteit Brussel
http://soft.vub.ac.be/Publications/2021/vub-tr-soft-21-02.pdf

1.1.122 PSIMiner: A Tool for Mining Rich Abstract Syntax Trees from Code

Egor Spirin JetBrains Research; National Research University Higher School of Economics, Egor Bogomolov JetBrains Research, Vladimir Kovalenko JetBrains Research, Timofey Bryksin JetBrains Research, Saint Petersburg State University
https://arxiv.org/abs/2103.09766

1.1.123 How Java Programmers Test Exceptional Behavior

Diego Marcilio USI Università della Svizzera italiana, Carlo A. Furia Università della Svizzera italiana (USI)
https://dvmarcilio.github.io/papers/msr2021.pdf

1.1.124 On the Naturalness and Localness of Software Logs

Sina Gholamian University of Waterloo, Paul A. S. Ward University of Waterloo
https://ece.uwaterloo.ca/~sgholami/gholamian2021naturalness.pdf

1.1.125 A Large-Scale Comparison of Python Code in Jupyter Notebooks and Scripts

Konstantin Grotov, Sergey Titov, Vladimir Sotnikov, Yaroslav Golubev, Timofey Bryksin
https://arxiv.org/pdf/2203.16718.pdf

1.1.126 On the Violation of Honesty in Mobile Apps: Automated Detection and Categories

Humphrey Obie, Idowu Oselumhe Ilekura, Hung Du, Mojtaba Shahin, John Grundy, Li Li, Jon Whittle, Burak Turhan,
https://arxiv.org/pdf/2203.07547.pdf

1.1.127 Operationalizing Threats to MSR Studies by Simulation-Based Testing

Johannes Härtel, Ralf Laemmel
https://ieeexplore.ieee.org/document/9796185/

1.1.128 LineVul: A Transformer-based Line-Level Vulnerability Prediction

Michael Fu, Chakkrit Tantithamthavorn
https://ieeexplore.ieee.org/document/9796256

1.1.129 Painting the Landscape of Automotive Software

Sangeeth Kochanthara, Yanja Dajsuren, Loek Cleophas, Mark van den Brand
https://arxiv.org/pdf/2203.08936.pdf

1.1.130 Find something from MSR Conference:

1.1.131 Find something from SIGSOFT:

https://www.sigsoft.org/awards/distinguishedPaperAward.html#winners

1.2 SIGSOFT Winners:

1.2.1 A Tale from the Trenches: Cognitive Biases and Software Development

Souti Chattopadhyay, Nicholas Nelson, Audrey Au, Natalia Morales, Christopher Sanchez, Rahul Pandita, Anita Sarma:

1.2.2 An Empirical Study on Program Failures of Deep Learning Jobs

Ru Zhang, Wencong Xiao, Hongyu Zhang, Yu Liu, Haoxiang Lin, Mao Yang

1.2.3 Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code

Rafael-Michael Karampatsis, Hlib Babii, Romain Robbes, Charles Sutton, Andrea Janes

1.2.4 Context-aware In-process Crowdworker Recommendation

Junjie Wang, Ye Yang, Song Wang, Yuanzhe Hu, Dandan Wang, Qing Wang:

1.2.5 Here We Go Again: Why Is It Difficult for Developers to Learn Another Programming Language?

Nischal Shrestha, Colton Botta, Titus Barik, Chris Parnin

1.2.6 Time-travel Testing of Android Apps

Zhen Dong, Marcel Böhme, Lucia Cojocaru, Abhik Roychoudhury

1.2.7 Towards the Use of the Readily Available Tests from the Release Pipeline as Performance Tests. Are We There Yet?

Zishuo Ding, Jinfu Chen, Weiyi Shang

1.2.8 Translating Video Recordings of Mobile App Usages into Replayable Scenarios

Carlos Bernal-Cárdenas, Nathan Cooper, Kevin Moran, Oscar Chaparro, Andrian Marcus, Denys Poshyvanyk

1.2.9 Unblind Your Apps: Predicting Natural-Language Labels for Mobile GUI Components by Deep Learning

Jieshan Chen, Chunyang Chen, Zhenchang Xing, Xiwei Xu, Liming Zhu, Guoqiang Li, Jinshui Wang

1.2.10 White-box Fairness Testing through Adversarial Sampling

Peixin Zhang, Jingyi Wang, Jun Sun, Guoliang Dong, Xinyu Wang, Xingen Wang, Jin Song Dong, Ting Dai

1.2.11 An Empirical Study of Quick Remedy Commits

Fengcai Wen, Csaba Nagy, Michele Lanza, and Gabriele Bavota

1.2.12 A Self-Attentional Neural Architecture for Code Completion with Multi-Task Learning

Fang Liu, Ge Li, Bolin Wei, Xin Xia, Zhiyi Fu, and Zhi Jin

1.2.13 Automating Just-In-Time Comment Updating

Zhongxin Liu, Xin Xia, Meng Yan, Shanping Li

1.2.14 Broadening Horizons of Multilingual Static Analysis: Semantic Summary Extraction from C Code for JNI Program Analysis

Sungho Lee, Hyogun Lee, Sukyoung Ryu

1.2.15 ChemTest: An Automated Software Testing Framework for an Emerging Paradigm

Michael C. Gerten, James I. Lathrop, Myra Cohen, Titus H. Klinge:

1.2.16 Problems and Opportunities in Training Deep Learning Software Systems: An Analysis of Variance

Viet Hung Pham, Shangshu Qian, Jiannan Wang, Thibaud Lutellier, Jonathan Rosenthal, Lin Tan, Yaoliang Yu, Nachiappan Nagappan

1.2.17 Scalable Multiple-View Analysis of Reactive Systems via Bidirectional Model Transformations

Christos Tsigkanos, Nianyu Li, Zhi Jin, Zhenjiang Hu, Carlo Ghezzi

1.2.18 Summary-Based Symbolic Evaluation for Smart Contracts

Yu Feng, Emina Torlak, Rastislav Bodik

1.2.19 Team Discussions and Dynamics During DevOps Tool Adoptions in OSS Projects

Likang Yin, Vladimir Filkov

1.3 Challenge Papers

1.3.1 Challenges from 2020 or earlier

[Data] A Repository with 44 Years of Unix Evolution
Diomidis Spinellis (Athens University of Economics and Business, Greece)

http://www.dmst.aueb.gr/dds/pubs/conf/2015-MSR-Unix-History/html/Spi15c.html
A comparative exploration of FreeBSD bug lifetimes.
Gargi Bougie, Christoph Treude, Daniel M. Germán, Margaret-Anne D. Storey
A newbie’s guide to eclipse APIs.
Reid Holmes, Robert J. Walker
A Tale of Two Browsers.
Olga Baysal, Ian J. Davis, Michael W. Godfrey
An initial study of the growth of eclipse defects.
Hongyu Zhang
Analyzing the evolution of eclipse plugins.
Michel Wermelinger, Yijun Yu
Apples vs. oranges?: an exploration of the challenges of comparing the source code of two software systems.
Daniel M. Germán, Julius Davies
Assessment of issue handling efficiency.
Bart Luijten, Joost Visser, Andy Zaidman
Author entropy vs. file size in the gnome suite of applications.
Jason R. Casebolt, Jonathan L. Krein, Alexander C. MacLean, Charles D. Knutson, Daniel P. Delorey
Cloning and copying between GNOME projects.
Jens Krinke, Nicolas Gold, Yue Jia, David Binkley
Co-Evolution of Project Documentation and Popularity within Github
Karan Aggarwal, Abram Hindle and Eleni Stroulia (University of Alberta, Canada) http://webdocs.cs.ualberta.ca/~hindle1/2016/msr14-Documentation.pdf
Do comments explain codes adequately?: investigation by text filtering.
Yukinao Hirata, Osamu Mizuno
Evaluating process quality in GNOME based on change request data.
Holger Schackmann, Horst Lichter
Finding file clones in FreeBSD Ports Collection.
Yusuke Sasaki, Tetsuo Yamamoto, Yasuhiro Hayase, Katsuro Inoue
Forecasting the Number of Changes in Eclipse Using Time Series Analysis.
Israel Herraiz, Jesús M. González-Barahona, Gregorio Robles
Going Green: An Exploratory Analysis of Energy- Related Questions
Haroon Malik, Peng Zhao and Michael Godfrey (University of Waterloo, Canada)
Impact of the Creation of the Mozilla Foundation in the Activity of Developers.
Jesús M. González-Barahona, Gregorio Robles, Israel Herraiz
Local and Global Recency Weighting Approach to Bug Prediction.
Hemant Joshi, Chuanlei Zhang, Srini Ramaswamy, Coskun Bayrak
Mining Eclipse Developer Contributions via Author-Topic Models.
Erik Linstead, Paul Rigor, Sushil Krishna Bajracharya, Cristina Videira Lopes, Pierre Baldi
Mining security changes in FreeBSD.
Andreas Mauczka, Christian Schanes, Florian Fankhauser, Mario Bernhart, Thomas Grechenig
Mining StackOverflow to Filter out Off-topic IRC Discussion
Shaiful Chowdhury and Abram Hindle (University of Alberta, Canada) http://webdocs.cs.ualberta.ca/~hindle1/2015/shaiful-mining_so.pdf
Mining the coherence of GNOME bug reports with statistical topic models.
Erik Linstead, Pierre Baldi
On the use of Internet Relay Chat (IRC) meetings by developers of the GNOME GTK+ project.
Emad Shihab, Zhen Ming Jiang, Ahmed E. Hassan
Perspectives on bugs in the Debian bug tracking system.
Julius Davies, Hanyu Zhang, Lucas Nussbaum, Daniel M. Germán
Predicting Defects and Changes with Import Relations.
Adrian Schröter
Predicting Eclipse Bug Lifetimes.
Lucas D. Panjer
Security and Emotion: Sentiment Analysis of Security Discussions on GitHub
Daniel Pletea, Bogdan Vasilescu and Alexander Serebrenik (Eindhoven University of Technology, Netherlands)
Summarizing developer work history using time series segmentation: challenge report.
Harvey P. Siy, Parvathi Chundi, Mahadevan Subramaniam
System compatibility analysis of Eclipse and Netbeans based on bug data.
Xinlei (Oscar) Wang, Eilwoo Baik, Premkumar T. Devanbu
Towards a simplification of the bug report form in eclipse.
Israel Herraiz, Daniel M. Germán, Jesús M. González-Barahona, Gregorio Robles
Visualizing Gnome with the Small Project Observatory.
Mircea Lungu, Jacopo Malnati, Michele Lanza
What topics do Firefox and Chrome contributors discuss?
Mario Luca Bernardi, Carmine Sementa, Quirino Zagarese, Damiano Distante, Massimiliano Di Penta
Which Non-functional Requirements do Developers Focus on? An Empirical Study on Stack Overflow using Topic Analysis
Jie Zou, Ling Xu, Weikang Guo, Meng Yan, Dan Yang and Xiaohong Zhang (Chongqing University, China)
On the Differences between Unit and Integration Testing in the TravisTorrent Dataset [preprint]
Manuel Gerardo Orellana Cordero, Gulsher Laghari, Alessandro Murgia and Serge Demeyer University of Antwerp
Cost-effective Build Outcome Prediction Using Cascaded Classifiers
Ansong Ni and Ming Li Nanjing University
Sentiment Analysis of Travis CI Builds
Rodrigo Souza and Bruno Silva Salvador University - UNIFACS, Federal University of Bahia
A Time Series Analysis of TravisTorrent: To Everything There is a Season
Abigail Atchison, Christina Berardi, Natalie Best, Elizabeth Stevens and Erik Linstead Chapman University
On the Interplay between Non-Functional Requirements and Builds on Continuous Integration [preprint]
Klérisson Paixão, Crícia Z. Felício, Fernanda Delfim and Marcelo Maia Instituto Federal do Triângulo Mineiro, Universidade Federal de Uberlândia, UFU
The Impact of the Adoption of Continuous Integration on Developer Attraction and Retention
Yusaira Khan, Yash Gupta, Keheliya Gallaba and Shane McIntosh McGill University
The Hidden Cost of Code Completion: Understanding the Impact of the Recommendation-list Length on its Efficiency
Ariel Rodriguez, Fumiya Tanaka, Yasutaka Kamei.
Do Practitioners Use Autocompletion Features Differently Than Non-Practitioners?
John Wilkie, Ziad Al Halabi, Alperen Karaoglu, Jiafeng Liao, George Ndungu, Chaiyong Ragkhitwetsagul, Matheus Paixão, Jens Krinke.
Who’s this? Developer identification using IDE event data
Agnieszka Ciborowska, Nicholas A. Kraft and Kostadin Damevski.
Revisiting “Programmers’ Build Errors” in the Visual Studio Context: A Replication Study using IDE Interaction Traces
Mauricio Soto and Claire Le Goues.
Common Statement Kind Changes to Inform Automatic Program Repair
Christopher Bellman, Ahmad Seet, Olga Baysal.
Examining Programmer Practices for Locally Handling Exceptions
Mary Beth Kery, Claire Le Goues and Brad Myers Carnegie Mellon University
QualBoa: Reusability-aware Recommendations of Source Code Components
Themistoklis Diamantopoulos, Klearchos Thomopoulos and Andreas Symeonidis Aristotle University of Thessaloniki
The Dispersion of Build Maintenance Activity across Maven Lifecycle Phases
Casimir Desarmeaux, Andrea Pecatikov and Shane McIntosh McGill University
The Relationship between Commit Message Detail and Defect Proneness in Java Projects on GitHub
Jacob Barnett, Charles Gathuru, Luke Soldano and Shane McIntosh McGill University
Analysis of Exception Handling Patterns in Java Projects: An Empirical Study
Suman Nakshatri, Maithri Hegde and Sahithi Thandra University of Waterloo
Judging a commit by its cover: Correlating commit message entropy with build status on Travis-CI
Eddie Antonio Santos and Abram Hindle University of Alberta
Characterizing Energy-Aware Software Projects: Are They Different?
Shaiful Chowdhury and Abram Hindle University of Alberta
A deeper look into bug fixes: Patterns, replacements, deletions, and additions
Mauricio Soto, Ferdian Thung, Chu-Pan Wong, Claire Le Goues and David Lo Carnegie Mellon University, Singapore Management University
How Developers Use Exception Handling in Java?
Muhammad Asaduzzaman, Muhammad Ahasanuzzaman, Chanchal K. Roy and Kevin Schneider University of Saskatchewan, University of Dhaka
Analyzing Developer Sentiment in Commit Logs
Vinayak Sinha, Alina Lazar and Bonita Sharif Youngstown State University
The Hidden Cost of Code Completion: Understanding the Impact of the Recommendation-list Length on its Efficiency
Xianhao Jin - Virginia Tech, USA, Francisco Servant - Virginia Tech http://people.cs.vt.edu/xianhao8/2018_msrch_xianhao.pdf
Enriched Event Streams: A General Dataset For Empirical Studies On In-IDE Activities Of Software Developers
- Sebastian Proksch - University of Zurich, Sven Amann - Technische Universität Darmstadt, Sarah Nadi - University of Alberta
Comprehension Effort and Programming Activities: Related? Or Not Related?
Enriched Event Streams: A General Dataset For Empirical Studies On In-IDE Activities Of Software Developers Sebastian Proksch, Sven Amann, Sarah Nadi
Comprehension Effort and Programming Activities: Related? Or Not Related?
Akond Rahman https://akondrahman.github.io/papers/msr18_chall.pdf https://2018.msrconf.org/details/msr-2018-Mining-Challenge/4/Comprehension-Effort-and-Programming-Activities-Related-Or-Not-Related-
The Hidden Cost of Code Completion: Understanding the Impact of the Recommendation-list Length on its Efficiency
Xianhao Jin, Francisco Servant http://people.cs.vt.edu/xianhao8/2018_msrch_xianhao.pdf
Empirical Study on the Relationship Between Developers Working Habits and Efficiency
Ariel Rodriguez , Fumiya Tanaka , Yasutaka Kamei http://posl.ait.kyushu-u.ac.jp/~kamei/publications/Rodriguez_MSR2018.pdf
Mining and Extraction of Personal Software Process measures through IDE Interaction logs
Alireza Joonbakhsh , Ashkan Sami https://doi.org/10.1145/3196398.3196462 https://github.com/unknowngithubuser1/data/blob/master/PID5276283.pdf
Predicting Developer IDE Commands with Machine Learning
Tyson Bulmer , Lloyd Montgomery, Daniela Damian http://lloydm.io/content/Bulmer et al. - 2018 - Predicting Developers’ IDE Commands with Machine Learning.pdf
Do Practitioners Use Autocompletion Features Differently Than Non-Practitioners?
Rahul Amlekar , Andrés Felipe Rincón Gamboa , Keheliya Gallaba, Shane McIntosh http://rebels.ece.mcgill.ca/papers/msr2018_amlekar.pdf
Who’s this? Developer identification using IDE event data
John Wilkie , Ziad Al Halabi , Alperen Karaoglu , Jiafeng Liao , George Ndungu, Chaiyong Ragkhitwetsagul, Matheus Paixao , Jens Krinke https://doi.org/10.1145/3196398.3196461 http://www.cs.ucl.ac.uk/staff/j.krinke/publications/msr18mc.pdf
Detecting and Characterizing Developer Behavior Following Opportunistic Reuse of Code Snippets from the Web
Agnieszka Ciborowska , Nicholas A. Kraft, Kostadin Damevski http://damevski.github.io/files/ciborowska_msr18_preprint.pdf
Revisiting “Programmers’ Build Errors” in the Visual Studio Context: A Replication Study using IDE Interaction Traces
Noam Rabbani , Mike Harvey , Sadnan Saquif , Keheliya Gallaba, Shane McIntosh http://rebels.ece.mcgill.ca/papers/msr2018_rabbani.pdf
Common Statement Kind Changes to Inform Automatic Program Repair
Mauricio Soto, Claire Le Goues http://www.cs.cmu.edu/~msotogon/Papers/CommonStatementKindChangesToInformAutomaticProgramRepair.pdf
Studying Developer Build Issues And Debugger Usage via Timeline Analysis in Visual Studio IDE
Christopher Bellman , Ahmad Seet , Olga Baysal http://olgabaysal.com/pdf/Bellman_MSR2018_Challenge_preprint.pdf
Detection and Analysis of Behavioral T-patterns in Debugging Activities
César Soto-Valero, Johann Bourcier , Benoit Baudry https://hal.inria.fr/hal-01763369/document
A Study on the Use of IDE Features for Debugging
Afsoon Afzal, Claire Le Goues http://www.cs.cmu.edu/~afsoona/papers/msr18.pdf
SOTorrent: Studying the Origin, Evolution, and Usage of Stack Overflow Code Snippets
Sebastian Baltes , Christoph Treude, Stephan Diehl https://empirical-software.engineering/assets/pdf/msr19-sotorrent.pdf
Mining Rule Violations in JavaScript Code Snippets
Uriel Ferreira Campos, Guilherme Smethurst, João Pedro Moraes, Rodrigo Bonifácio, Gustavo Pinto http://gustavopinto.github.io/lost+found/msr2019c.pdf
Snakes in Paradise?: Insecure Python-related Coding Practices in Stack Overflow
Akond Rahman, Effat Farhana, Nasif Imtiaz https://akondrahman.github.io/papers/msr19_security.pdf
Man vs Machine – A Study into language identification of Stackoverflow code snippets
Jens Dietrich, Markus Luczak-Roesch, Elroy Dalefield https://sites.google.com/site/jensdietrich/publications/preprints/man_vs_machine.pdf
Python Coding Style Compliance on Stack Overflow
Nikolaos Bafatakis, Niels Boecker, Wenjie Boon, Martin Cabello Salazar, Jens Krinke, Gazi Oznacar, Robert White http://www0.cs.ucl.ac.uk/staff/j.krinke/publications/msr19.pdf https://2019.msrconf.org/details/msr-2019-Mining-Challenge/8/Python-Coding-Style-Compliance-on-Stack-Overflow
Towards Mining Answer Edits to Extract Evolution Patterns in Stack Overflow
Themistoklis Diamantopoulos, Maria-Ioanna Sifaki, Andreas Symeonidis https://issel.ee.auth.gr/wp-content/uploads/2019/03/MSR2019.pdf https://2019.msrconf.org/details/msr-2019-Mining-Challenge/12/Towards-Mining-Answer-Edits-to-Extract-Evolution-Patterns-in-Stack-Overflow
Analyzing Comment-induced Updates on Stack Overflow
Abhishek Soni, Sarah Nadi https://dl.dropboxusercontent.com/s/664jj8qnd1pc2k6/Soni_MSR19.pdf
What Edits Are Done on Highly Answered Stack Overflow Questions? An Empirical Study
Xianhao Jin, Francisco Servant http://people.cs.vt.edu/xianhao8/MSR2019.pdf
Can Duplicate Posts on Stack Overflow Benefit the Software Development Community?
Durham Abric, Oliver Clark, Matthew Caminiti, Keheliya Gallaba, Shane McIntosh http://rebels.ece.mcgill.ca/papers/msr2019_abric.pdf
How Often and What StackOverflow Posts Do Developers Reference in Their GitHub Projects?
Saraj Singh Manes, Olga Baysal http://olgabaysal.com/pdf/Manes_Baysal-MSRChallenge19_preprint.pdf
Characterizing Duplicate Code Snippets between Stack Overflow and Tutorials
Manziba Nishi, Agnieszka Ciborowska , Kostadin Damevski http://damevski.github.io/files/nishi-msr19-preprint.pdf
Challenges with Responding to Static Analysis Tool Alerts
Nasif Imtiaz, Akond Rahman, Effat Farhana, Laurie Williams https://akondrahman.github.io/papers/msr19_sat.pdf
Impact of stack overflow code snippets on software cohesion: a preliminary study
Mashal Ahmad, Mel Ó Cinnéide https://doi.org/10.13140/RG.2.2.14791.75688 https://www.researchgate.net/publication/331928559_Impact_of_stack_overflow_code_snippets_on_software_cohesion_a_preliminary_study
We Need to Talk about Microservices: an Analysis from the Discussions on StackOverflow
Alan Bandeira, Carlos Filho, Matheus Paixao , Paulo Maia https://alanpbandeira.github.io/stackoverservices/files/stackoverservices.pdf https://2019.msrconf.org/details/msr-2019-Mining-Challenge/1/We-Need-to-Talk-about-Microservices-an-Analysis-from-the-Discussions-on-StackOverflo
What do developers know about machine learning: a study of ML discussions on StackOverflow
Hareem-e-Sahar , Abdul Ali Bangash, Alexander William Wong, Shaiful Chowdhury, Abram Hindle, Karim Ali
Cheating Death: A Statistical Survival Analysis of Publicly Available Python ProjectsMSR - Mining Challenge
Ali Rao Hamza, Chelsea Parlett-Pelleriti, Erik Linstead http://www1.chapman.edu/~linstead/aliMSR2020.pdf https://2020.msrconf.org/details/msr-2020-mining-challenge/1/Cheating-Death-A-Statistical-Survival-Analysis-of-Publicly-Available-Python-Projects
An investigation to find motives behind cross-platform forks from Software Heritage datasetMSR - Mining Challenge
Avijit Bhattacharjee, Sristy Sumana Nath, Shurui Zhou, Debasish Chakroborti, Banani Roy, Chanchal K. Roy, Kevin Schneider https://doi.org/10.1145/3379597.3387512 https://arxiv.org/pdf/2003.07970.pdf https://2020.msrconf.org/details/msr-2020-mining-challenge/2/An-investigation-to-find-motives-behind-cross-platform-forks-from-Software-Heritage-d
Exploring the Security Awareness of the Python and JavaScript Open Source CommunitiesMSR - Mining Challenge
Gabor Antal, Márton Keleti, Peter Hegedus https://arxiv.org/abs/2006.13652 https://2020.msrconf.org/details/msr-2020-mining-challenge/3/Exploring-the-Security-Awareness-of-the-Python-and-JavaScript-Open-Source-Communities

1.3.2 2021 Challenge

A large-scale study on human-cloned changes for automated program repair
- Fernanda Madeiral, Thomas Durieux
- https://arxiv.org/abs/2104.02386
Applying CodeBERT for Automated Program Repair of Java Simple Bugs
- Ehsan Mashhadi, Hadi Hemmati
- https://arxiv.org/abs/2103.11626
How Effective is Continuous Integration in Indicating Single-Statement Bugs?
- Jasmine Latendresse, Rabe Abdalkareem, Diego Costa, Emad Shihab
- https://zenodo.org/record/4606679
Mea culpa: How developers fix their own simple bugs differently from other developers
- Wenhan Zhu, Michael W. Godfrey
- https://arxiv.org/pdf/2103.11894
On the Distribution of “Simple Stupid Bugs” in Unit Test Files: An Exploratory Study
- Anthony Peruma, Christian D. Newman
- https://arxiv.org/abs/2103.09388
On the Effectiveness of Deep Vulnerability Detectors to Simple Stupid Bug Detection
- Jiayi Hua, Haoyu Wang
- https://doi.org/10.5281/zenodo.4626588
On the Rise and Fall of Simple Stupid Bugs: a Life-Cycle Analysis of SStuBs
- Balázs Mosolygó, Norbert Vándor, Gabor Antal, Peter Hegedus
- https://arxiv.org/abs/2103.09604
PySStuBs: Characterizing Single-Statement Bugs in Popular Open-Source Python Projects
- Arthur Veloso Kamienski, Luisa Palechor, Abram Hindle, Cor-Paul Bezemer
- https://www.researchgate.net/publication/349899864_PySStuBs_Characterizing_Single-Statement_Bugs_in_Popular_Open-Source_Python_Projects

1.3.3 2022 Challenge

An Exploratory Study on Refactoring Documentation in Issues Handling
- Eman Abdullah AlOmar, Anthony Peruma, Mohamed Wiem Mkaouer, Christian D. Newman, Ali Ouni
- https://arxiv.org/pdf/2203.10221.pdf
Between JIRA and GitHub: ASFBot and its Influence on Human Comments in Issue Trackers
- Ambarish Moharil, Dmitrii Orlov, Samar Jameel, Tristan Trouwen, Nathan Cassee, Alexander Serebrenik
- https://cassee.dev/files/asfbot.pdf
Is Refactoring Always a Good Egg? Exploring the Interconnection Between Bugs and Refactorings
- Amirreza Bagheri, Peter Hegedus
- https://conf.researchr.org/details/msr-2022/msr-2022-mining-challenge/2/Is-Refactoring-Always-a-Good-Egg-Exploring-the-Interconnection-Between-Bugs-and-Refa
On the Co-Occurrence of Refactoring of Test and Source Code
- Nicholas Nagy, Rabe Abdalkareem
- https://rabeabdalkareem.github.io/files/21-Nicholas_MSR2022.pdf
Refactoring Debt: Myth or Reality? An Exploratory Study on the Relationship Between Technical Debt and RefactoringBest Mining Challenge Paper Award
- Anthony Peruma, Eman Abdullah AlOmar, Christian D. Newman, Mohamed Wiem Mkaouer, Ali Ouni
- https://arxiv.org/abs/2203.05660
Studying the Impact of Continuous Delivery Adoption on Bug-Fixing Time in Apache’s Open-Source Projects
- Carlos Diego Andrade de Almeida, Diego N. Feijó, Lincoln Souza Rocha
Which bugs are missed in code reviews: An empirical study on SmartSHARK dataset
- fatemeh khoshnoud, Ali Rezaei Nasab, Zahra Toudeji, Ashkan Sami

Input

Paste formatted text here to see it turned into Markdown.

2020-04-04: How it works View source on GitHub

CMPUT660F25 Reading List

2025/08/29