Notícias

Banca de DEFESA: OTÁVIO CURY DA COSTA CASTRO

Uma banca de DEFESA de DOUTORADO foi cadastrada pelo programa.
DISCENTE: OTÁVIO CURY DA COSTA CASTRO
DATA: 22/09/2025
HORA: 14:00
LOCAL: Sala de Videoconferência do PPGCC
TÍTULO: Source Code Expertise: Improving Knowledge Models and Assessing Generative AI Impact
PALAVRAS-CHAVES: software repository mining, code expertise, knowledge concentration, generative artificial intelligence
PÁGINAS: 109
GRANDE ÁREA: Ciências Exatas e da Terra
ÁREA: Ciência da Computação
RESUMO:

Identifying developer expertise in source code is valuable in various Software Engineering contexts. Knowledgeable developers are best suited to perform tasks such as code review and onboarding. Numerous models have been proposed to estimate source code knowledge, making it a well-explored topic; however, important gaps remain that affect the accuracy and applicability of these models. Moreover, the increasing use of Generative Artificial Intelligence (GenAI) tools may influence how code expertise is acquired and measured. This study aims to develop more accurate models for identifying source code experts. We first investigate the correlation between development history variables and developers’ knowledge of source code files. We extract metrics from public and private repositories and survey developers about the files they contributed to. Based on these data, we propose a linear model and train machine learning classifiers, comparing their performance with existing models. We also apply the proposed models to the Truck Factor (TF) metric to assess their practical implications in identifying critical developers. To examine the impact of GenAI, we build a dataset combining code expertise metrics with information on ChatGPT-generated code integrated into open-source projects. We simulate different usage scenarios by assigning a portion of contributions to GenAI instead of developers and survey developers about their perception of GenAI’s effects on code comprehension. Our results show that First Authorship and Recency of Modification are the variables most strongly correlated with source code knowledge. The proposed machine learning models outperform linear baselines, achieving F-scores between 71% and 73%. When applied to the TF algorithm, they improved developer identification, reaching a best average F-score of 74%. GenAI usage negatively affected TF reliability, even in low proportions. Developers reported mixed perceptions, with concerns, especially about use by novice programmers.


MEMBROS DA BANCA:
Externo à Instituição - 064.***.***-13 - ANDRÉ CAVALCANTE HORA - UFMG
Presidente - 1744590 - GUILHERME AMARAL AVELINO
Externo à Instituição - 694.***.***-53 - LINCOLN SOUZA ROCHA - UFC
Interno - 2025063 - ROMUERE RODRIGUES VELOSO E SILVA
Interno - 1446435 - VINICIUS PONTE MACHADO
Notícia cadastrada em: 27/08/2025 08:13
SIGAA | Superintendência de Tecnologia da Informação - STI/UFPI - (86) 3215-1124 | © UFRN | sigjb04.ufpi.br.instancia1 17/09/2025 01:19