Competing pull request

This page provides information and a dataset of technical report, How do Multiple Pull Requests Change the Same Code: A Study of Competing Pull Requests in GitHub, [Paper].

Terminology

1. Base Information

We select top 100 repositories with the most forks on GitHub and extracted pull requests that are created in 2017 via GitHub API (Details in https://developer.github.com/v3/pulls/). After data filtering, we have 60 repositories left. Table 1 shows the basic information of 60 repositories

Table 1. Basic information of 60 repositories
Repository Group of CPR Group of XPR Pull requests in CPR Pull requests in XPR Commits Branches Releases Issues Pull requests Fork Executable LOC Java files
spring-projects/spring-boot 3781 4253 268 193 16188 8 95 443 2534 17587 240061 4180
spring-projects/spring-framework 3700 4986 148 106 16389 13 123 269 1767 13346 584451 6795
apache/incubator-dubbo 2001 14029 131 113 2203 2 51 69 444 12729 102913 1287
elastic/elasticsearch 60301 63322 1321 1288 30554 136 212 984 13919 10423 731337 5691
iluwatar/java-design-patterns 188 1862 29 42 2022 10 12 121 338 10121 26681 1032
zxing/zxing 9 8 7 5 3425 2 24 84 214 7342 42631 490
checkstyle/checkstyle 1253 378 467 146 7670 1 87 171 3575 6408 118755 1731
netty/netty 6224 1203 514 150 8726 31 189 318 3809 6130 246635 2354
square/okhttp 1022 272 151 70 3147 27 52 156 1791 6054 54375 293
PhilJay/MPAndroidChart 185 189 33 39 1938 5 42 57 356 5949 24640 235
libgdx/libgdx 778 1141 115 86 13185 9 44 408 2492 5597 272627 2326
ReactiveX/RxJava 800 164 145 42 5329 3 197 195 2943 5598 266415 1603
Blankj/AndroidUtilCode 90 434 17 22 802 1 51 17 91 5543 19523 158
square/retrofit 241 178 54 42 1569 8 41 119 843 5314 18889 220
google/guava 160 1233 22 22 4676 6 76 138 279 5335 500504 3165
udacity/ud851-Exercises 1092 486785 39 43 45 149 0 1 192 5247 29534 599
apache/kafka 41268 173568 637 896 4871 20 74 419 4833 4667 197643 1698
mybatis/mybatis-3 564 572 72 41 2424 5 26 108 512 4564 56588 1133
shuzheng/zheng 311 1613 29 21 1221 2 0 8 64 4737 30005 367
alibaba/druid 348 187 29 13 5754 7 60 94 884 4438 285254 3954
apache/hadoop 1688 945 70 65 18274 231 281 126 359 4304 1463778 6573
deeplearning4j/deeplearning4j 14837 236059 364 217 9625 103 48 142 2175 4153 191174 1765
bumptech/glide 295 146 53 27 2190 4 32 77 273 4138 75038 643
JakeWharton/butterknife 202 121 19 11 827 3 38 74 375 4096 10576 126
alibaba/fastjson 323 164 56 38 2701 13 103 69 250 4063 147816 2558
DrKLO/Telegram 3 345 2 1 311 2 0 11 318 3867 322851 927
spring-projects/spring-petclinic 8 7 4 5 588 2 1 40 198 3950 1416 36
square/picasso 17 23 3 14 1180 13 21 85 555 3808 9411 74
pockethub/PocketHub 148 29 33 6 3383 5 20 116 586 3723 15661 237
SeleniumHQ/selenium 138 88 55 35 22140 13 110 362 1253 3846 83337 1265
udacity/ud851-Sunshine 184 83 16 5 23 68 0 1 71 3725 216535 1026
facebook/fresco 74 123 11 12 1620 8 32 120 290 3538 73517 811
apache/storm 3770 3221 333 235 9182 38 33 285 2624 3553 275548 2360
bigbluebutton/bigbluebutton 1458 3160 183 114 20377 10 22 80 2578 3526 92306 1137
firebase/quickstart-android 2 6 3 3 472 16 0 34 122 3598 6334 80
kdn251/interviews 27 84 3 3 382 5 0 27 61 3521 10228 458
google/ExoPlayer 2287 4526 119 81 4353 7 112 99 603 3106 94454 679
square/leakcanary 6 3 3 5 431 6 10 44 243 3050 4046 58
code4craft/webmagic 106 69 23 16 1015 3 23 34 110 3011 12104 229
airbnb/lottie-android 528 82 90 27 896 4 32 44 186 3005 10921 172
apache/zookeeper 6471 1940 189 111 1629 11 82 52 499 2900 77848 629
CymChad/BaseRecyclerViewAdapterHelper 304 281 66 45 883 4 167 27 207 2811 5648 92
spring-projects/spring-security-oauth 158 2695 34 37 1226 7 48 80 333 2711 43488 762
xetorthio/jedis 1865 188 99 51 1490 5 35 136 768 2677 27777 172
owncloud/android 559 135 39 25 6508 64 72 61 765 2634 40526 271
junit-team/junit4 119 28 31 19 2215 5 20 139 795 2626 29952 450
Netflix/Hystrix 60 296 22 15 2098 5 146 106 752 2628 49786 410
apache/camel 2477 2433 274 202 31930 33 123 384 2282 2590 1128281 18120
androidannotations/androidannotations 49 300 15 12 2774 3 28 58 588 2475 36818 899
Activiti/Activiti 1432 1535 121 83 7761 29 53 151 1111 2466 140061 1969
jersey/jersey 22 210 19 30 3161 14 97 64 338 2439 257940 3594
signalapp/Signal-Android 1541 576 127 61 3468 131 295 167 1866 2446 74226 657
prestodb/presto 22082 3491 1262 573 13018 4 212 214 7497 2459 535854 4687
MyCATApache/Mycat-Server 555 1531 81 47 2227 7 20 72 999 2458 93974 998
Tencent/tinker 36 124 14 12 293 3 21 16 58 2421 31669 233
spring-projects/spring-security 109 90 49 36 6838 12 98 180 580 2405 122136 2045
afollestad/material-dialogs 32 8 10 6 1553 1 96 75 212 2371 7314 44
apache/flink 37460 31855 1272 810 13564 86 45 377 5824 2405 626439 5994
ftctechnh/ftc_app 1718 45077 63 68 82 3 29 4 319 2334 2512 42
nickbutcher/plaid 58 80 18 22 453 6 7 32 103 2290 16785 132

 

2. Experimental Dataset

In the paper, we have six dataset, [Download dataset]. Table 2 shows the information of each dataset.

Table 2. Dataset description
Name Content Format
data raw java source code java
diffdata  content of cpr java
diffdata1  content of xpr java
cpr features count of cpr csv
xpr features count of xpr csv
profcpr features count of prs in cpr csv
profxpr features count of prs in xpr csv
otherpr features count of prs which neither in cpr nor xpr csv

 

The following picture shows one file of competing pull requests (cpr for short). The first two lines show the files that change the same line, then the following shows what they do to the same line, file 0 and file 1 represent files from different pull requests.

In this cpr, file 0 represent the file below, created by ryandawsonuk and the pr is merged.

File 1 represent the file below, created by Giuseppe Malanga, and he change this file at the change time.

 

The following picture shows one xpr file. File 0 and file 1 have not changed the same line, also file 0 and file 1 represent files from different pull requests.

3. Features of Dataset

In the paper, we analyze features of cpr, xpr, pr in cpr, pr in xpr and other pr. [Download result].

The following picture shows parts of cpr.csv. One line represents features of a group of cpr, and one column represents one feature.