Comparing Two Black-Box Testing Strategies for Software Product Lines

Abstract: Software Product Line (SPL) testing has been considered a challenging task, mainly due to the diversity of products that might be generated from an SPL. To deal with this problem, several techniques for specifying and deriving product specific functional test cases have been proposed. However, there is not much empirical evidence of the benefits and drawbacks of these techniques. To provide this kind of evidence, we conduct studies that compare two design techniques for black-box manual tests, a generic technique that we have observed in an industrial test execution environment, and a product specific technique whose functional test cases could be derived using any SPL technique that considers variations in functional tests. We evaluate their impact from the point of view of the test execution process, obtaining results that indicate that executing product specific test cases is faster and generates fewer errors.

Infrastructure Material

In the link below you can download all the material used to perform both experiments. This material includes the RGMS products with instructions on how to run them, the test suites used (written in portuguese), the Testwatcher tool and the training and dry run material (also written in portuguese).


Test Environment and the database installer we use

Test Suites

Collected Data

Here you can download the data that we collected during the experiments. This material includes the sheets generated by Testwatcher and the CRs reported by each subject.

1º experiment data

2º experiment data

Data Analysis

Here we provide the script to run our data analysis with the R scripts and the data files used. Also, we display here some results and graphics that we couldn't present in the JUCS paper due to lack of space.

Data to run the analysis

Below we see each experiment box plot.


In the graphics below we see the individual times. The first dotplot shows that, in spite of the feature used to run the tests, 17 from the 18 subjects ran the test suites faster using the ST. The second dotplot shows that 8 from the 10 subjects ran the test suites faster using the ST.


After the descriptive analysis we proceeded to run the hypothesis test. Because we used the Latin square design, we created an effect model that models our response variable (execution time). This models states that the response variable is the result of the sums of the influence factors (latin square replica, subjects, features and technique) considered by our experiment plus the residual. With this effect model we can run an ANOVA test to check if the tendency observed by the descriptive graphics is statistically significant. But before running the ANOVA, we first needed to run some tests to check if we could confirm the ANOVA assumptions to our data.

First we checked the assumption of equality or homogeneity of variances, that is, the variance of data in groups should be the same. Below we can see the Box Cox test which gives a significancy of 95% that our model residuals maintain a constant variance. We can see that because the interval which maximizes the function (above the 95% line) contains the value 1.


The second assumption that we examined was if the distribution of the residuals followed a normal distribution. We ran the Shapiro-Wilk hypothesis test to examine this property. It tests the null hypothesis that the data set follows a normal distribution. If it provides a high p-value we cannot reject this hypothesis. With a level of 95% of significance we couldn't reject the null hypothesis in neither experiments. In the first experiment the p-value was 0.1456, and, in the second one, the p-value was 0.4659.

The last property that we wanted to investigate was if our model was additive, that is, there was no interaction between our control factors. So we ran the Tukey Test of Additivity which tests the null hypothesis stating that the model is additive. One more time, we had high p-values for both experiments (0.5743 in the first one and 0.7976 in the second one) hence we cannot reject the null hypothesis the our model is indeed additive.

Finally we ran the ANOVA test to examine whether the technique factor had a significant impact on the execution time. This time the null hypothesis stated that there was no significant difference between the execution time means achieved in GT and in ST. Again we used a 95% level of significance to compare the p-value and in both experiments (0.0001 in the first one and 0.0109 in the second one) we were able to reject the null hypothesis. Our conclusion is that, within the scope of our studies, there is a significant difference between the GT and the ST execution time means. In addition, ST showed smaller values than GT.

In case of any problem, please contact one of the following:

-- PaolaAccioly - 2012-06-01 -- PaolaAccioly - 27 Mar 2012 -- PaolaAccioly - 01 Mar 2012 -- PaolaAccioly - 15 Feb 2012
26 Feb.

Topic attachments
I Attachment Action Size Date Who Comment
Compressed Zip archivezip manage 81.1 K 2012-03-27 - 21:36 UnknownUser  
Compressed Zip archivezip manage 193.4 K 2013-08-29 - 16:33 UnknownUser  
Compressed Zip archivezip manage 131.1 K 2013-08-29 - 16:59 UnknownUser  
Compressed Zip archivezip manage 149.0 K 2012-05-15 - 10:56 UnknownUser  
Compressed Zip archivezip manage 118.2 K 2012-03-27 - 21:51 UnknownUser  
Compressed Zip archivezip manage 999.4 K 2012-03-27 - 21:56 UnknownUser  
Compressed Zip archivezip manage 965.6 K 2012-05-15 - 10:19 UnknownUser  
Compressed Zip archivezip manage 2.0 K 2012-03-01 - 19:06 UnknownUser Data Analysis
Compressed Zip archivezip manage 17.1 K 2012-03-01 - 19:03 UnknownUser Data Collected
PDFpdf How_to_run_RGMS.pdf manage 349.3 K 2012-03-01 - 19:40 UnknownUser  
Compressed Zip archivezip manage 1306.4 K 2013-08-28 - 17:29 UnknownUser  
Compressed Zip archivezip manage 8017.6 K 2012-03-01 - 19:37 UnknownUser RGMS P1 and P2
Compressed Zip archivezip manage 8199.5 K 2013-08-28 - 17:54 UnknownUser  
Compressed Zip archivezip manage 61.9 K 2013-08-28 - 19:36 UnknownUser  
Compressed Zip archivezip manage 12.3 K 2012-03-01 - 19:11 UnknownUser TestWatcher
Compressed Zip archivezip manage 2.9 K 2013-08-29 - 17:21 UnknownUser  
PNGpng boxcoxs.png manage 46.0 K 2013-08-06 - 22:36 UnknownUser  
PNGpng boxplots.png manage 21.6 K 2013-08-06 - 22:34 UnknownUser  
Compressed Zip archivezip manage 987.9 K 2013-08-06 - 19:23 UnknownUser  
PNGpng dotplots.png manage 82.5 K 2013-08-06 - 22:35 UnknownUser  
Compressed Zip archivezip manage 8006.2 K 2012-03-01 - 19:34 UnknownUser hsqldb
PDFpdf prga_msc_dissertation_corrected.pdf manage 869.2 K 2012-06-01 - 10:40 UnknownUser  
Topic revision: r13 - 2013-10-09 - PaolaAccioly
This site is powered by the TWiki collaboration platformCopyright © 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback

mersin escort bayan adana escort bayan izmit escort ankara escort bursa escort