On The Pros and Cons of Using Excel for Regression Analysis
DOI:
https://doi.org/10.24925/turjaf.v12is2.2234-2241.6931Keywords:
Curve fit, Goodness-of-fit, Spreadsheet, Software, Summary outputAbstract
Excel is a useful and powerful software for simple regression analysis without any programming skills and that is why, it is widely preferred by the undergraduate students from various areas such as chemistry, biology and agriculture as well as many engineering disciplines such as chemical engineering, food engineering and bioengineering. Parameter values and coefficient of determination (R2) can be easily obtained together with the graphical representation for those models exist in Excel such as linear and exponential models. It is also possible to visually examine the model fit and experimental data together on the same graph. For linear models (linear in parameters) Excel Add-In Data Analysis-Regression tool creates a summary output, and parameter estimates, parameter uncertainties, adjusted R2 (R2adj) and root mean square error (RMSE) values can be found even for the models that do not exist in Excel. For nonlinear models (nonlinear in parameters) Excel Add-In Solver tool can be used to obtain parameter estimates (but not uncertainties), and R2, R2adj and RMSE can be calculated manually. Despite these advantages, there are some shortcomings of Excel for regression analysis. For linear models with no-intercept Excel reports the incorrect and overoptimistic R2 and also reports incorrect and overpessimistic R2adj. Excel has also some nonlinear models such as exponential and power models in it; however, Excel computes linear parameter estimates for those nonlinear models and again optimistic R2 is calculated. This paper aims (i) to show these inaccuracies with their reasons by using published data, (ii) how to obtain the correct results, (iii) to warn the instructors who would use Excel for regression analysis in class. Excel users (instructors, students, professionals in any field) should be aware of the pitfalls when using Excel for regression.
References
Brown, A.M. (2001). A step-by-step guide to non-linear regression analysis of experimental data using a Microsoft Excel spreadsheet. Computer Methods and Programs Biomedicine, 65, 191–200. https://doi.org/10.1016/S0169-2607(00)00124-3
Buzrul, S. (2024). Fen bilimleri ve mühendislik uygulamalarında deneysel verilerin matematik modellerle tanımlanması. Excel Uygulamalı Anlatım. Ankara, Türkiye, Akademisyen Yayınevi. http://dx.doi.org/10.37609/akya.3030
De Levie, R. (2012). Advanced Excel for scientific data analysis. New York, NY, USA: Oxford University Press.
Denton, P. (2000). Analysis of First-order kinetics using Microsoft Excel Solver. Journal of Chemical Education, 77, 1524-1525. https://doi.org/10.1021/ed077p1524
Dolan, K.D., & Mishra, D.K. (2013). Parameter estimation in food science. Annual Reviews in Food Science and Technology, 4, 401-422. https://dx.doi.org/10.1146/annurev-food-022811-101247
Halabi, A., Deglaire, A., Hamon, P., Bouhallab, S., Dupont, D., & Croguennec, T. (2020). Kinetics of heat-induced denaturation of proteins in model infant milk formulas as a function of whey protein composition. Food Chemistry, 302, 125296. https://doi.org/10.1016/j.foodchem.2019.125296
Harmer, N.J., & Hill, A.M. (2021). Unique data sets and bespoke laboratory videos: teaching and assessing of experimental methods and data analysis in a pandemic. Journal of Chemical Education, 98, 4094-4100. https://doi.org/10.1021/acs.jchemed.1c00853
Harris, D.C. (1998). Nonlinear least-squares curve fitting with Microsoft Excel Solver. Journal of Chemical Education, 75, 119-121. https://doi.org/10.1021/ed075p119
Keeling, K.B. & Pavur, R.J. (2011). Statistical accuracy of spreadsheet software. The American Statistician 65, 265–273. https://doi.org/10.1198/tas.2011.09076
Kemmer, G., & Keller, S. (2010). Nonlinear least-squares data fitting in Excel spreadsheets. Nature Protocols, 5, 267–281. https://doi.org/10.1038/nprot.2009.182
Leylak, C., Yurdakul, M., & Buzrul, S. (2020). Use of Excel in food science 1: linear regression (in Turkish). Food and Health, 6, 186-198. https://doi.org/10.3153/FH20020
Mélard, G. (2014). On the accuracy of statistical procedures in Microsoft Excel 2010. Computational Statistics, 29, 1095-1128. https://doi.org/10.1007/s00180-014-0482-5
Rubin, S.J., & Abrams, B. (2015). Teaching fundamental skills in Microsoft Excel to first-year students in quantitative Analysis. Journal of Chemical Education, 92, 1840-1845. https://doi.org/10.1021/acs.jchemed.5b00122
Serment-Moreno, V. (2021). Microbial modeling needs for the nonthermal processing of foods. Food Engineering Reviews, 13, 465-489. https://dx.doi.org/10.1007/s12393-020-09263-8
van Boekel, M.A.J.S. (1996). Statistical aspects of kinetic modeling for food science problems. Journal of Food Science, 61, 477-485, 489. https://doi.org/10.1111/j.1365-2621.1996.tb13138.x
van Boekel, M.A.J.S. (2008). Kinetic Modeling of Reactions in Foods. Boca Raton, CRC Press. https://doi.org/10.1201/9781420017410
van Boekel, M.A.J.S. (2022). Kinetics of heat-induced changes in dairy products: Developments in data analysis and modelling techniques. International Dairy Journal, 126, 105187. https://doi.org/10.1016/j.idairyj.2021.105187
Yurdakul, M., Leylak, C., & Buzrul, S. (2020). Use of Excel in food science 2: non-linear regression (in Turkish). Food and Health, 6, 199-212. https://doi.org/10.3153/FH20021
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.