CMPUT660F25 Reading List

2025/08/29

Readings

Some of these are hand picked but in general good sources for papers come form:

Challenge Papers

These are papers that respond to the MSR Mining Challenge. These are short papers.

MSR Papers

Longer MSR full conference technical papers:

ICSE Papers

FSE Papers

ICSME

EMSE: Empirical Software Engineering Journal

https://link.springer.com/journal/10664/articles

Transactions on Software Engineering Journal

TSE Search

TOSEM: ACM Transactions on Software Engineering and Methodology

https://dl.acm.org/loi/tosem

1 Papers

1.1 Full Papers

1.1.1 A Contextual Approach towards More Accurate Duplicate Bug Report Detection

1.1.2 An Empirical Study of End-user Programmers in the Computer Music Community

1.1.3 An empirical study on the evolution of design patterns Aversano, L., Canfora, G., Cerulo, L., Del Grosso, C., and Di Penta, M. . In Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering

1.1.4 Analysing Software Repositories to Understand Software Evolution, Marco D’Ambros, Harald Gall, Michele Lanza, and Martin Pinzger

1.1.5 Automatic identification of bug-introducing changes by Sunghun Kim, Thomas Zimmermann, Kai Pan, E., and James Whitehead, Jr.

1.1.6 Beyond Lines of Code: Do We Need More Complexity Metrics?, by Israel Herraiz and Ahmed E. Hassan

1.1.7 BugCache for Inspections : Hit or Miss?

1.1.8 Bugs as Inconsistent Behavior: A General Approach to Inferring Errors in Systems Code.

1.1.9 Change Impact Graphs: Determining the Impact of Prior Code Changes German, D.M., Robles, G, and Hassan, A. , Journal of Information and Software Technology (INFSOF), Volume 51, Number 10, pages 1394–1408, Oct 2009.

1.1.10 Characteristics of Useful Code Reviews: An Empirical Study at Microsoft

1.1.11 Clones: What is that smell?

1.1.12 Copy-Paste as a Principled Engineering Tool, by Michael Godfrey and Cory Kapser + ‘Cloning Considered Harmful’ Considered Harmful, by Cory J. Kapser and Michael W. Godfrey. Proc. of the 2006 Working Conference on Reverse Engineering (WCRE-06), 23-28 October, Benevento, Italy.

1.1.13 Cross versus Within-Company Cost Estimation Studies: A Systematic Review.

1.1.14 Evidence-Based Failure Prediction, by Nachi Nagappan and Thomas Ball + A Validation of Object-Oriented Design Metrics as Quality Indicators, by Victor R. Basili, Lionel C. Briand, and Walcelio L. Melo, IEEE Trans. on Software Engineering, 22(10, October 1996.

1.1.15 Gerrit Software Code Review Data from Android

1.1.16 GreenMiner: A Hardware Based Mining Software Repositories Software Energy Consumption Framework

1.1.17 Hipikat: recommending pertinent software development artifacts, by Davor Cubranic and Gail C. Murphy

1.1.18 How Well do Experienced Software Developers Predict Software Change?, by Mikael Lindvall and Kristian Sandahl, Journal of Systems and Software, 43(1), Jan 1998.

1.1.19 Identifying Changed Source Code Lines from Version Repositories by Gerardo Canfora, Luigi Cerulo, Massimiliano Di Penta. Proceedings of the Fourth International Workshop on Mining Software Repositories, 2007 (best paper award).

1.1.20 Identifying reasons for software change using historic databases by Audris Mockus and Larry G. Votta

1.1.21 Improving the Effectiveness of Test Suite Through Mining Historical Data

1.1.22 Macro-level software evolution: a case study of a large software compilation Jesus M. Gonzalez-Barahona, Gregorio Robles, et al Journal of mpirical Software Engineering, Volume 14, Number 3 / June, 2009. Extended version of best paper award.

1.1.23 Measuring the Progress of Projects Using the Time Dependence of Code Changes, by Omar Alam, Bram Adams and Ahmed E. Hassan.

1.1.24 Mining Android App Usages for Generating Actionable GUI-based Execution Scenarios

Mario Linares-Vásquez, Martin White, Carlos Eduardo Bernal Cardenas, Kevin Moran and Denys Poshyvanyk (The College of William and Mary, United States) http://www.cs.wm.edu/~denys/pubs/MSR'15-MonkeyLab-CRC.pdf

1.1.25 Mining email social networks

1.1.26 Mining Energy-Aware Commits

1.1.27 Mining Energy-Greedy API Usage Patterns in Android Apps: an Empirical Study

1.1.28 Mining Questions About Software Energy Consumption

1.1.29 Mining Social Networks, Christian Bird, et al. Proceedings of the 2006 international workshop on Mining software repositories.

1.1.30 Mining version histories to guide software changes, Thomas Zimmermann, Peter Weißgerber, Stephan Diehl, Andreas Zeller

1.1.31 Novel applications of Machine Learning in Testing by Lionel Briand

1.1.32 Open Borders? Immigration in Open Source Projects.

1.1.33 Scalable statistical bug isolation by Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan

1.1.34 Seeking the source: software source code as a social and technical, artifact Cleidson de Souza, Jon Froehlich, and Paul Dourish

1.1.35 Software Bertillonage: Finding the provenance of an entity, by Julius Davies, Abram J. Hindle, Daniel M. German, Michael W. Godfrey.

1.1.36 Studying Developers Copy and Paste Behavior

1.1.37 Syntax Errors Just Aren’t Natural: Improving Error Reporting with Language Models

1.1.38 The Evidence for Design Patterns, by Walter Tichy + Design Pattern Detection Using Similarity Scoring, N. Tsantalis, A. Chatzigeorgiou, G. Stephanides, and S. T. Halkidis, IEEE Trans. on Software Engineering, November 2006.

1.1.39 The Impact of Code Review Coverage and Code Review Participation on Software Quality: A Case Study of the Qt, VTK, and ITK Projects

1.1.40 The Past, Present, and Future of Software Evolution, Michael W. Godfrey and Daniel M. German. Invited paper in Proc. of Frontiers of Software Maintenace track at the 2008 IEEE Intl. Conf. on Software Maintenance (ICSM-08), October 2008, Beijing, China.

1.1.41 The Promises and Perils of Mining Git. In Proceedings of the Sixth Working Conference on Mining Software Repositories (MSR 09), Vancouver, Canada, 2009. Christian Bird, Peter C. Rigby, Earl T. Barr, David J. Hamilton, Daniel M. German, Prem Devanbu.

1.1.42 The Promises and Perils of Mining GitHub

1.1.43 The secret life of bugs: Going past the errors and omissions in software repositories, by Jorge Aranda and Gina Venolia, Proc. of the 2009 Intl. Conf. on Software Engineering (ICSE-09), Vancouver, May 2009.

1.1.44 The Top Ten List: Dynamic Fault Prediction, by Ahmed E. Hassan and Richard C. Holt, Proc. of the 2005 IEEE Intl. Conf. on Software Maintenance (ICSM-05), Budapest, Hungary, Sept. 2005.

1.1.45 Toward Deep Learning Software Repositories

1.1.46 Towards Building a Universal Defect Prediction Model

1.1.47 Understanding the impact of code and process metrics on post-release defects: A case study on the Eclipse project, Emad Shihab, Zhen Ming Jiang, Walid M. Ibrahim, Bram Adams, Ahmed E. Hassan, Proc. of the 2010 ACM-IEEE Intl. Symposium on Empirical Software Engineering and Measurement (ESEM-10), Bolzano-Bolzen, Italy, Sept 2010.

1.1.48 Using information fragments to answer the questions developers ask.

1.1.49 Using Software Dependencies and Churn Metrics to Predict Field Failures: An Empirical Case Study Nachiappan Nagappan, Thomas Ball

1.1.50 Visualizing software changes, Stephen G. Eick, Todd L. Graves, Alan F. Karr, Audris Mockus, and Paul Schuster

1.1.51 What is the Gist? Understanding the Use of Public Gists on GitHub

1.1.52 What’s Hot and What’s Not: Windowing Developer Topic Analysis? by Abram J. Hindle, Michael W. Godfrey, Richard C. Holt.

1.1.53 When do changes induce fixes?: Jacek Śliwerski International Max Planck Research School, Saarbrücken, Germany

1.1.54 Who Should Fix This Bug?, by John Anvik, Lyndon Hiew and Gail C. Murphy, Proc. of the 2006 Intl. Conference on Software Engineering (ICSE-06), Shanghai, May 2006.

1.1.55 Will My Patch Make It? And How Fast?: Case Study on the Linux Kernel

1.1.56 Yesterday’s Weather: Guiding Early Reverse Engineering Efforts by Summarizing the Evolution of Changes, Tudor Girba, Stephane Ducasse, Michele Lanza, Proc. 20th IEEE Int’l Conference on Software Maintenance (ICSM'04), September 2004, pp. 40-49.

1.1.57 An Evaluation of Open-Source Software Microbenchmark Suites for Continuous Performance Assessment

1.1.58 SOTorrent: Reconstructing and Analyzing the Evolution Stack Overflow Posts

1.1.59 Data-Driven Search-based Software Engineering

1.1.60 CLEVER: Combining Code Metrics with Clone Detection for Just-In-Time Fault Prevention and Resolution in Large Industrial Projects

1.1.62 Extracting Code Segments and Their Descriptions from Research Articles [preprint]

1.1.63 Structure and Evolution of Package Dependency Networks [preprint]

1.1.64 The Impact Of Using Regression Models to Build Defect Classifiers [preprint]

1.1.65 Choosing an NLP Library for Analyzing Software Documentation: A Systematic Literature Review and a Series of Experiments [preprint]

1.1.66 GreenOracle: Estimating Software Energy Consumption with Energy Measurement Corpora

1.1.67 Mining Performance Regression Inducing Code Changes in Evolving Software

1.1.68 An Empirical Study on the Practice of Maintaining Object-Relational Mapping Code in Java Systems

1.1.69 Software Ingredients: Detection of Third-party Component Reuse in Java Software Release

1.1.70 A Look at the Dynamics of the JavaScript Package Ecosystem

1.1.71 A Large-Scale Study On Repetitiveness, Containment, and Composability of Routines in Source Code

1.1.72 A survey of machine learning for big code and naturalness

1.1.73 Are deep neural networks the best choice for modeling source code?

1.1.74 Towards Accurate Duplicate Bug Retrieval Using Deep Learning Techniques

1.1.75 An Evaluation of Open-Source Software Microbenchmark Suites for Continuous Performance Assessment

1.1.76 CLEVER: Code Metrics with Clone Detection for Just-In-Time Fault Prevention and Resolution in Large Industrial Projects

1.1.77 Leveraging Historical Versions of Android Apps for Efficient and Precise Taint Analysis

1.1.78 Understanding the Usage, Impact, and Adoption of Non-OSI Approved Licenses

1.1.79 How Swift Developers Handle Errors

1.1.80 What are your Programming Language’s Energy-Delay Implications?

1.1.81 Automatically Assessing Code Understandability Reanalyzed: Combined Metrics Matter

1.1.82 Data-Driven Search-based Software Engineering

1.1.83 The Open-Closed Principle of Modern Machine Learning Frameworks

1.1.84 A Benchmark Study on Sentiment Analysis for Software Engineering Research

1.1.85 Natural Language or Not (NLoN) - package for Software Engineering Text Analysis Pipeline

1.1.86 Deep Learning Similarities from Different Representations of Source Code

1.1.87 Bayesian Hierarchical Modelling for Tailoring Metric Thresholds

1.1.88 SCOR: Source Code Retrieval With Semantics and Order

1.1.89 PathMiner : A Library for Mining of Path-Based Representations of Code

1.1.90 Import2vec: learning embeddings for software libraries

1.1.91 Semantic Source Code Models Using Identifier Embeddings

1.1.92 Exploring Word Embedding Techniques to Improve Sentiment Analysis of Software Engineering Texts

1.1.93 Cleaning StackOverflow for Machine Translation

1.1.94 Predicting Good Configurations for GitHub and Stack Overflow Topic Models

1.1.95 Time Present and Time Past: Analyzing the Evolution of JavaScript Code in the Wild

1.1.96 The Software Heritage Graph Dataset: public software development under one roof

1.1.97 World of Code: An Infrastructure for Mining the Universe of Open Source VCS Data

1.1.98 Crossflow: A Framework for Distributed Mining of Software Repositories

1.1.99 GreenHub Farmer: Real-world data for Android Energy Mining

1.1.100 GreenSource: a large-scale collection of Android code, tests and energy metrics

1.1.101 The Emergence of Software Diversity in Maven Central

1.1.102 A Dataset of Parametric Cryptographic Misuses

1.1.103 Tracing Back Log Data to its Log Statement: From Research to Practice

1.1.104 Using Large-Scale Anomaly Detection on Code to Improve Kotlin CompilerMSR - Technical Paper

1.1.105 An Empirical Study of Method Chaining in JavaMSR - Technical Paper

1.1.106 A Tale of Docker Build Failures: A Preliminary StudyMSR - Technical Paper

1.1.107 LogChunks: A Data Set for Build Log AnalysisMSR - Data Showcase

1.1.108 A Dataset of DockerfilesMSR - Data Showcase

1.1.109 Detecting Video Game-Specific Bad Smells in Unity ProjectsMSR - Technical Paper

1.1.110 The Scent of Deep Learning Code: An Empirical StudyMSR - Technical Paper

1.1.111 A Soft Alignment Model for Bug DeduplicationMSR - Technical Paper

1.1.112 Large-Scale Manual Validation of Bugfixing ChangesMSR - Registered Reports

1.1.113 An Empirical Study on Regular Expression BugsMSR - Technical Paper

1.1.114 SoftMon: A Tool to Compare Similar Open-source Software from a Performance PerspectiveMSR - Technical Paper

1.1.115 A Study of Potential Code Borrowing and License Violations in Java Projects on GitHubMSR - Technical Paper

1.1.116 Did You Remember To Test Your Tokens?MSR - Technical Paper

1.1.117 Embedding Java Classes with code2vec: Improvements from Variable ObfuscationMSR - Technical Paper

1.1.118 Can We Use SE-specific Sentiment Analysis Tools in a Cross-Platform Setting?MSR - Technical Paper

1.1.119 Ethical Mining – A Case Study on MSR Mining ChallengesACM SIGSOFT Distinguished Paper AwardMSR - Technical Paper

1.1.120 From Innovations to Prospects: What Is Hidden Behind Cryptocurrencies?MSR - Technical Paper

1.1.121 Identifying Versions of Libraries used in Stack Overflow Code Snippets

1.1.122 PSIMiner: A Tool for Mining Rich Abstract Syntax Trees from Code

1.1.123 How Java Programmers Test Exceptional Behavior

1.1.124 On the Naturalness and Localness of Software Logs

1.1.125 A Large-Scale Comparison of Python Code in Jupyter Notebooks and Scripts

1.1.126 On the Violation of Honesty in Mobile Apps: Automated Detection and Categories

1.1.127 Operationalizing Threats to MSR Studies by Simulation-Based Testing

1.1.128 LineVul: A Transformer-based Line-Level Vulnerability Prediction

1.1.129 Painting the Landscape of Automotive Software

1.1.130 Find something from MSR Conference:

1.1.131 Find something from SIGSOFT:

1.2 SIGSOFT Winners:

1.2.1 A Tale from the Trenches: Cognitive Biases and Software Development

1.2.2 An Empirical Study on Program Failures of Deep Learning Jobs

1.2.3 Big Code != Big Vocabulary: Open-Vocabulary Models for Source Code

1.2.4 Context-aware In-process Crowdworker Recommendation

1.2.5 Here We Go Again: Why Is It Difficult for Developers to Learn Another Programming Language?

1.2.6 Time-travel Testing of Android Apps

1.2.7 Towards the Use of the Readily Available Tests from the Release Pipeline as Performance Tests. Are We There Yet?

1.2.8 Translating Video Recordings of Mobile App Usages into Replayable Scenarios

1.2.9 Unblind Your Apps: Predicting Natural-Language Labels for Mobile GUI Components by Deep Learning

1.2.10 White-box Fairness Testing through Adversarial Sampling

1.2.11 An Empirical Study of Quick Remedy Commits

1.2.12 A Self-Attentional Neural Architecture for Code Completion with Multi-Task Learning

1.2.13 Automating Just-In-Time Comment Updating

1.2.14 Broadening Horizons of Multilingual Static Analysis: Semantic Summary Extraction from C Code for JNI Program Analysis

1.2.15 ChemTest: An Automated Software Testing Framework for an Emerging Paradigm

1.2.16 Problems and Opportunities in Training Deep Learning Software Systems: An Analysis of Variance

1.2.17 Scalable Multiple-View Analysis of Reactive Systems via Bidirectional Model Transformations

1.2.18 Summary-Based Symbolic Evaluation for Smart Contracts

1.2.19 Team Discussions and Dynamics During DevOps Tool Adoptions in OSS Projects

1.3 Challenge Papers

1.3.1 Challenges from 2020 or earlier

  1. [Data] A Repository with 44 Years of Unix Evolution
    Diomidis Spinellis (Athens University of Economics and Business, Greece)

    http://www.dmst.aueb.gr/dds/pubs/conf/2015-MSR-Unix-History/html/Spi15c.html

  2. A comparative exploration of FreeBSD bug lifetimes.
    Gargi Bougie, Christoph Treude, Daniel M. Germán, Margaret-Anne D. Storey

  3. A newbie’s guide to eclipse APIs.
    Reid Holmes, Robert J. Walker

  4. A Tale of Two Browsers.
    Olga Baysal, Ian J. Davis, Michael W. Godfrey

  5. An initial study of the growth of eclipse defects.
    Hongyu Zhang

  6. Analyzing the evolution of eclipse plugins.
    Michel Wermelinger, Yijun Yu

  7. Apples vs. oranges?: an exploration of the challenges of comparing the source code of two software systems.
    Daniel M. Germán, Julius Davies

  8. Assessment of issue handling efficiency.
    Bart Luijten, Joost Visser, Andy Zaidman

  9. Author entropy vs. file size in the gnome suite of applications.
    Jason R. Casebolt, Jonathan L. Krein, Alexander C. MacLean, Charles D. Knutson, Daniel P. Delorey

  10. Cloning and copying between GNOME projects.
    Jens Krinke, Nicolas Gold, Yue Jia, David Binkley

  11. Co-Evolution of Project Documentation and Popularity within Github
    Karan Aggarwal, Abram Hindle and Eleni Stroulia (University of Alberta, Canada) http://webdocs.cs.ualberta.ca/~hindle1/2016/msr14-Documentation.pdf

  12. Do comments explain codes adequately?: investigation by text filtering.
    Yukinao Hirata, Osamu Mizuno

  13. Evaluating process quality in GNOME based on change request data.
    Holger Schackmann, Horst Lichter

  14. Finding file clones in FreeBSD Ports Collection.
    Yusuke Sasaki, Tetsuo Yamamoto, Yasuhiro Hayase, Katsuro Inoue

  15. Forecasting the Number of Changes in Eclipse Using Time Series Analysis.
    Israel Herraiz, Jesús M. González-Barahona, Gregorio Robles

  16. Going Green: An Exploratory Analysis of Energy- Related Questions
    Haroon Malik, Peng Zhao and Michael Godfrey (University of Waterloo, Canada)

  17. Impact of the Creation of the Mozilla Foundation in the Activity of Developers.
    Jesús M. González-Barahona, Gregorio Robles, Israel Herraiz

  18. Local and Global Recency Weighting Approach to Bug Prediction.
    Hemant Joshi, Chuanlei Zhang, Srini Ramaswamy, Coskun Bayrak

  19. Mining Eclipse Developer Contributions via Author-Topic Models.
    Erik Linstead, Paul Rigor, Sushil Krishna Bajracharya, Cristina Videira Lopes, Pierre Baldi

  20. Mining security changes in FreeBSD.
    Andreas Mauczka, Christian Schanes, Florian Fankhauser, Mario Bernhart, Thomas Grechenig

  21. Mining StackOverflow to Filter out Off-topic IRC Discussion
    Shaiful Chowdhury and Abram Hindle (University of Alberta, Canada) http://webdocs.cs.ualberta.ca/~hindle1/2015/shaiful-mining_so.pdf

  22. Mining the coherence of GNOME bug reports with statistical topic models.
    Erik Linstead, Pierre Baldi

  23. On the use of Internet Relay Chat (IRC) meetings by developers of the GNOME GTK+ project.
    Emad Shihab, Zhen Ming Jiang, Ahmed E. Hassan

  24. Perspectives on bugs in the Debian bug tracking system.
    Julius Davies, Hanyu Zhang, Lucas Nussbaum, Daniel M. Germán

  25. Predicting Defects and Changes with Import Relations.
    Adrian Schröter

  26. Predicting Eclipse Bug Lifetimes.
    Lucas D. Panjer

  27. Security and Emotion: Sentiment Analysis of Security Discussions on GitHub
    Daniel Pletea, Bogdan Vasilescu and Alexander Serebrenik (Eindhoven University of Technology, Netherlands)

  28. Summarizing developer work history using time series segmentation: challenge report.
    Harvey P. Siy, Parvathi Chundi, Mahadevan Subramaniam

  29. System compatibility analysis of Eclipse and Netbeans based on bug data.
    Xinlei (Oscar) Wang, Eilwoo Baik, Premkumar T. Devanbu

  30. Towards a simplification of the bug report form in eclipse.
    Israel Herraiz, Daniel M. Germán, Jesús M. González-Barahona, Gregorio Robles

  31. Visualizing Gnome with the Small Project Observatory.
    Mircea Lungu, Jacopo Malnati, Michele Lanza

  32. What topics do Firefox and Chrome contributors discuss?
    Mario Luca Bernardi, Carmine Sementa, Quirino Zagarese, Damiano Distante, Massimiliano Di Penta

  33. Which Non-functional Requirements do Developers Focus on? An Empirical Study on Stack Overflow using Topic Analysis
    Jie Zou, Ling Xu, Weikang Guo, Meng Yan, Dan Yang and Xiaohong Zhang (Chongqing University, China)

  34. On the Differences between Unit and Integration Testing in the TravisTorrent Dataset [preprint]
    Manuel Gerardo Orellana Cordero, Gulsher Laghari, Alessandro Murgia and Serge Demeyer University of Antwerp

  35. Cost-effective Build Outcome Prediction Using Cascaded Classifiers
    Ansong Ni and Ming Li Nanjing University

  36. Sentiment Analysis of Travis CI Builds
    Rodrigo Souza and Bruno Silva Salvador University - UNIFACS, Federal University of Bahia

  37. A Time Series Analysis of TravisTorrent: To Everything There is a Season
    Abigail Atchison, Christina Berardi, Natalie Best, Elizabeth Stevens and Erik Linstead Chapman University

  38. On the Interplay between Non-Functional Requirements and Builds on Continuous Integration [preprint]
    Klérisson Paixão, Crícia Z. Felício, Fernanda Delfim and Marcelo Maia Instituto Federal do Triângulo Mineiro, Universidade Federal de Uberlândia, UFU

  39. The Impact of the Adoption of Continuous Integration on Developer Attraction and Retention
    Yusaira Khan, Yash Gupta, Keheliya Gallaba and Shane McIntosh McGill University

  40. The Hidden Cost of Code Completion: Understanding the Impact of the Recommendation-list Length on its Efficiency
    Ariel Rodriguez, Fumiya Tanaka, Yasutaka Kamei.

  41. Do Practitioners Use Autocompletion Features Differently Than Non-Practitioners?
    John Wilkie, Ziad Al Halabi, Alperen Karaoglu, Jiafeng Liao, George Ndungu, Chaiyong Ragkhitwetsagul, Matheus Paixão, Jens Krinke.

  42. Who’s this? Developer identification using IDE event data
    Agnieszka Ciborowska, Nicholas A. Kraft and Kostadin Damevski.

  43. Revisiting “Programmers’ Build Errors” in the Visual Studio Context: A Replication Study using IDE Interaction Traces
    Mauricio Soto and Claire Le Goues.

  44. Common Statement Kind Changes to Inform Automatic Program Repair
    Christopher Bellman, Ahmad Seet, Olga Baysal.

  45. Examining Programmer Practices for Locally Handling Exceptions
    Mary Beth Kery, Claire Le Goues and Brad Myers Carnegie Mellon University

  46. QualBoa: Reusability-aware Recommendations of Source Code Components
    Themistoklis Diamantopoulos, Klearchos Thomopoulos and Andreas Symeonidis Aristotle University of Thessaloniki

  47. The Dispersion of Build Maintenance Activity across Maven Lifecycle Phases
    Casimir Desarmeaux, Andrea Pecatikov and Shane McIntosh McGill University

  48. The Relationship between Commit Message Detail and Defect Proneness in Java Projects on GitHub
    Jacob Barnett, Charles Gathuru, Luke Soldano and Shane McIntosh McGill University

  49. Analysis of Exception Handling Patterns in Java Projects: An Empirical Study
    Suman Nakshatri, Maithri Hegde and Sahithi Thandra University of Waterloo

  50. Judging a commit by its cover: Correlating commit message entropy with build status on Travis-CI
    Eddie Antonio Santos and Abram Hindle University of Alberta

  51. Characterizing Energy-Aware Software Projects: Are They Different?
    Shaiful Chowdhury and Abram Hindle University of Alberta

  52. A deeper look into bug fixes: Patterns, replacements, deletions, and additions
    Mauricio Soto, Ferdian Thung, Chu-Pan Wong, Claire Le Goues and David Lo Carnegie Mellon University, Singapore Management University

  53. How Developers Use Exception Handling in Java?
    Muhammad Asaduzzaman, Muhammad Ahasanuzzaman, Chanchal K. Roy and Kevin Schneider University of Saskatchewan, University of Dhaka

  54. Analyzing Developer Sentiment in Commit Logs
    Vinayak Sinha, Alina Lazar and Bonita Sharif Youngstown State University

  55. The Hidden Cost of Code Completion: Understanding the Impact of the Recommendation-list Length on its Efficiency
    Xianhao Jin - Virginia Tech, USA, Francisco Servant - Virginia Tech http://people.cs.vt.edu/xianhao8/2018_msrch_xianhao.pdf

  56. Enriched Event Streams: A General Dataset For Empirical Studies On In-IDE Activities Of Software Developers

    • Sebastian Proksch - University of Zurich, Sven Amann - Technische Universität Darmstadt, Sarah Nadi - University of Alberta
  57. Comprehension Effort and Programming Activities: Related? Or Not Related?
    Enriched Event Streams: A General Dataset For Empirical Studies On In-IDE Activities Of Software Developers Sebastian Proksch, Sven Amann, Sarah Nadi

  58. Comprehension Effort and Programming Activities: Related? Or Not Related?
    Akond Rahman https://akondrahman.github.io/papers/msr18_chall.pdf https://2018.msrconf.org/details/msr-2018-Mining-Challenge/4/Comprehension-Effort-and-Programming-Activities-Related-Or-Not-Related-

  59. The Hidden Cost of Code Completion: Understanding the Impact of the Recommendation-list Length on its Efficiency
    Xianhao Jin, Francisco Servant http://people.cs.vt.edu/xianhao8/2018_msrch_xianhao.pdf

  60. Empirical Study on the Relationship Between Developers Working Habits and Efficiency
    Ariel Rodriguez , Fumiya Tanaka , Yasutaka Kamei http://posl.ait.kyushu-u.ac.jp/~kamei/publications/Rodriguez_MSR2018.pdf

  61. Mining and Extraction of Personal Software Process measures through IDE Interaction logs
    Alireza Joonbakhsh , Ashkan Sami https://doi.org/10.1145/3196398.3196462 https://github.com/unknowngithubuser1/data/blob/master/PID5276283.pdf

  62. Predicting Developer IDE Commands with Machine Learning
    Tyson Bulmer , Lloyd Montgomery, Daniela Damian http://lloydm.io/content/Bulmer et al. - 2018 - Predicting Developers’ IDE Commands with Machine Learning.pdf

  63. Do Practitioners Use Autocompletion Features Differently Than Non-Practitioners?
    Rahul Amlekar , Andrés Felipe Rincón Gamboa , Keheliya Gallaba, Shane McIntosh http://rebels.ece.mcgill.ca/papers/msr2018_amlekar.pdf

  64. Who’s this? Developer identification using IDE event data
    John Wilkie , Ziad Al Halabi , Alperen Karaoglu , Jiafeng Liao , George Ndungu, Chaiyong Ragkhitwetsagul, Matheus Paixao , Jens Krinke https://doi.org/10.1145/3196398.3196461 http://www.cs.ucl.ac.uk/staff/j.krinke/publications/msr18mc.pdf

  65. Detecting and Characterizing Developer Behavior Following Opportunistic Reuse of Code Snippets from the Web
    Agnieszka Ciborowska , Nicholas A. Kraft, Kostadin Damevski http://damevski.github.io/files/ciborowska_msr18_preprint.pdf

  66. Revisiting “Programmers’ Build Errors” in the Visual Studio Context: A Replication Study using IDE Interaction Traces
    Noam Rabbani , Mike Harvey , Sadnan Saquif , Keheliya Gallaba, Shane McIntosh http://rebels.ece.mcgill.ca/papers/msr2018_rabbani.pdf

  67. Common Statement Kind Changes to Inform Automatic Program Repair
    Mauricio Soto, Claire Le Goues http://www.cs.cmu.edu/~msotogon/Papers/CommonStatementKindChangesToInformAutomaticProgramRepair.pdf

  68. Studying Developer Build Issues And Debugger Usage via Timeline Analysis in Visual Studio IDE
    Christopher Bellman , Ahmad Seet , Olga Baysal http://olgabaysal.com/pdf/Bellman_MSR2018_Challenge_preprint.pdf

  69. Detection and Analysis of Behavioral T-patterns in Debugging Activities
    César Soto-Valero, Johann Bourcier , Benoit Baudry https://hal.inria.fr/hal-01763369/document

  70. A Study on the Use of IDE Features for Debugging
    Afsoon Afzal, Claire Le Goues http://www.cs.cmu.edu/~afsoona/papers/msr18.pdf

  71. SOTorrent: Studying the Origin, Evolution, and Usage of Stack Overflow Code Snippets
    Sebastian Baltes , Christoph Treude, Stephan Diehl https://empirical-software.engineering/assets/pdf/msr19-sotorrent.pdf

  72. Mining Rule Violations in JavaScript Code Snippets
    Uriel Ferreira Campos, Guilherme Smethurst, João Pedro Moraes, Rodrigo Bonifácio, Gustavo Pinto http://gustavopinto.github.io/lost+found/msr2019c.pdf

  73. Snakes in Paradise?: Insecure Python-related Coding Practices in Stack Overflow
    Akond Rahman, Effat Farhana, Nasif Imtiaz https://akondrahman.github.io/papers/msr19_security.pdf

  74. Man vs Machine – A Study into language identification of Stackoverflow code snippets
    Jens Dietrich, Markus Luczak-Roesch, Elroy Dalefield https://sites.google.com/site/jensdietrich/publications/preprints/man_vs_machine.pdf

  75. Python Coding Style Compliance on Stack Overflow
    Nikolaos Bafatakis, Niels Boecker, Wenjie Boon, Martin Cabello Salazar, Jens Krinke, Gazi Oznacar, Robert White http://www0.cs.ucl.ac.uk/staff/j.krinke/publications/msr19.pdf https://2019.msrconf.org/details/msr-2019-Mining-Challenge/8/Python-Coding-Style-Compliance-on-Stack-Overflow

  76. Towards Mining Answer Edits to Extract Evolution Patterns in Stack Overflow
    Themistoklis Diamantopoulos, Maria-Ioanna Sifaki, Andreas Symeonidis https://issel.ee.auth.gr/wp-content/uploads/2019/03/MSR2019.pdf https://2019.msrconf.org/details/msr-2019-Mining-Challenge/12/Towards-Mining-Answer-Edits-to-Extract-Evolution-Patterns-in-Stack-Overflow

  77. Analyzing Comment-induced Updates on Stack Overflow
    Abhishek Soni, Sarah Nadi https://dl.dropboxusercontent.com/s/664jj8qnd1pc2k6/Soni_MSR19.pdf

  78. What Edits Are Done on Highly Answered Stack Overflow Questions? An Empirical Study
    Xianhao Jin, Francisco Servant http://people.cs.vt.edu/xianhao8/MSR2019.pdf

  79. Can Duplicate Posts on Stack Overflow Benefit the Software Development Community?
    Durham Abric, Oliver Clark, Matthew Caminiti, Keheliya Gallaba, Shane McIntosh http://rebels.ece.mcgill.ca/papers/msr2019_abric.pdf

  80. How Often and What StackOverflow Posts Do Developers Reference in Their GitHub Projects?
    Saraj Singh Manes, Olga Baysal http://olgabaysal.com/pdf/Manes_Baysal-MSRChallenge19_preprint.pdf

  81. Characterizing Duplicate Code Snippets between Stack Overflow and Tutorials
    Manziba Nishi, Agnieszka Ciborowska , Kostadin Damevski http://damevski.github.io/files/nishi-msr19-preprint.pdf

  82. Challenges with Responding to Static Analysis Tool Alerts
    Nasif Imtiaz, Akond Rahman, Effat Farhana, Laurie Williams https://akondrahman.github.io/papers/msr19_sat.pdf

  83. Impact of stack overflow code snippets on software cohesion: a preliminary study
    Mashal Ahmad, Mel Ó Cinnéide https://doi.org/10.13140/RG.2.2.14791.75688 https://www.researchgate.net/publication/331928559_Impact_of_stack_overflow_code_snippets_on_software_cohesion_a_preliminary_study

  84. We Need to Talk about Microservices: an Analysis from the Discussions on StackOverflow
    Alan Bandeira, Carlos Filho, Matheus Paixao , Paulo Maia https://alanpbandeira.github.io/stackoverservices/files/stackoverservices.pdf https://2019.msrconf.org/details/msr-2019-Mining-Challenge/1/We-Need-to-Talk-about-Microservices-an-Analysis-from-the-Discussions-on-StackOverflo

  85. What do developers know about machine learning: a study of ML discussions on StackOverflow
    Hareem-e-Sahar , Abdul Ali Bangash, Alexander William Wong, Shaiful Chowdhury, Abram Hindle, Karim Ali

  86. Cheating Death: A Statistical Survival Analysis of Publicly Available Python ProjectsMSR - Mining Challenge
    Ali Rao Hamza, Chelsea Parlett-Pelleriti, Erik Linstead http://www1.chapman.edu/~linstead/aliMSR2020.pdf https://2020.msrconf.org/details/msr-2020-mining-challenge/1/Cheating-Death-A-Statistical-Survival-Analysis-of-Publicly-Available-Python-Projects

  87. An investigation to find motives behind cross-platform forks from Software Heritage datasetMSR - Mining Challenge
    Avijit Bhattacharjee, Sristy Sumana Nath, Shurui Zhou, Debasish Chakroborti, Banani Roy, Chanchal K. Roy, Kevin Schneider https://doi.org/10.1145/3379597.3387512 https://arxiv.org/pdf/2003.07970.pdf https://2020.msrconf.org/details/msr-2020-mining-challenge/2/An-investigation-to-find-motives-behind-cross-platform-forks-from-Software-Heritage-d

  88. Exploring the Security Awareness of the Python and JavaScript Open Source CommunitiesMSR - Mining Challenge
    Gabor Antal, Márton Keleti, Peter Hegedus https://arxiv.org/abs/2006.13652 https://2020.msrconf.org/details/msr-2020-mining-challenge/3/Exploring-the-Security-Awareness-of-the-Python-and-JavaScript-Open-Source-Communities

1.3.2 2021 Challenge

  1. A large-scale study on human-cloned changes for automated program repair

  2. Applying CodeBERT for Automated Program Repair of Java Simple Bugs

  3. How Effective is Continuous Integration in Indicating Single-Statement Bugs?

  4. Mea culpa: How developers fix their own simple bugs differently from other developers

  5. On the Distribution of “Simple Stupid Bugs” in Unit Test Files: An Exploratory Study

  6. On the Effectiveness of Deep Vulnerability Detectors to Simple Stupid Bug Detection

  7. On the Rise and Fall of Simple Stupid Bugs: a Life-Cycle Analysis of SStuBs

  8. PySStuBs: Characterizing Single-Statement Bugs in Popular Open-Source Python Projects

1.3.3 2022 Challenge

  1. An Exploratory Study on Refactoring Documentation in Issues Handling

  2. Between JIRA and GitHub: ASFBot and its Influence on Human Comments in Issue Trackers

  3. Is Refactoring Always a Good Egg? Exploring the Interconnection Between Bugs and Refactorings

  4. On the Co-Occurrence of Refactoring of Test and Source Code

  5. Refactoring Debt: Myth or Reality? An Exploratory Study on the Relationship Between Technical Debt and RefactoringBest Mining Challenge Paper Award

  6. Studying the Impact of Continuous Delivery Adoption on Bug-Fixing Time in Apache’s Open-Source Projects

    • Carlos Diego Andrade de Almeida, Diego N. Feijó, Lincoln Souza Rocha
  7. Which bugs are missed in code reviews: An empirical study on SmartSHARK dataset

    • fatemeh khoshnoud, Ali Rezaei Nasab, Zahra Toudeji, Ashkan Sami

Input

Paste formatted text here to see it turned into Markdown.

2020-04-04: How it works View source on GitHub