Measuring inflation in the sector of information processing equipment presents a significant challenge, as the quality and performance of devices such as computers and smartphones are being substantially improved over short time spans. The harmonised index of consumer prices (HICP) methodology struggles to consistently account for these quality adjustments, which is reflected in the considerable discrepancies of the estimates across EU countries. The aim of the study presented in this paper is to estimate the price changes of computer equipment using hedonic regression and to compare the results with the calculations provided by Statistics Poland and Eurostat. We web-scraped the data from the Polish price comparison platform Pepper.pl covering the period of 2017–2023. We compiled a dataset of ICT products and used machine learning algorithms – K-Nearest Neighbours and Span Categoriser – to impute missing hardware specifications. Our findings differ from official statistics. While Eurostat reports a 10% decline in prices of audio-visual, photographic, and information processing equipment, our analysis indicates a clear upward trend in prices. Among the computer components examined (RAM, GPU, CPU, and SSDs), some deflation was observed only in the case of RAM modules.
inflation, hedonic regression, web scraping, computer equipment, harmonised index of consumer prices, HICP
C40, E31
Batista, G. E. A. P. A., & Monard, M. C. (2003). An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence, 17(5–6), 519–533. https://doi.org/10.1080/713827181.
Białek, J., Kłopotek, M., & Panek, T. (Eds.). (2022). Nowoczesne technologie i nowe źródła danych w pomiarze inflacji. Główny Urząd Statystyczny. https://bws.stat.gov.pl/bws_70_nowoczesne_technologie_i_nowe_zrodla_danych_w_pomiarze_inflacji.
Chessa, A. G. (2016). A new methodology for processing scanner data in the Dutch CPI. Eurostat Review on National Accounts and Macroeconomic Indicators, (1), 49–69. https://ec.europa.eu/eurostat/documents/3217494/7556543/KS-GP-16-001-EN-N.pdf/70e246de-734c-42ba-bee2-bc0b3dd97faa?t=1468230194000.
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 21–27. https://doi.org/10.1109/TIT.1967.1053964.
Diewert, W. E. (2003). Hedonic Regressions: A Review of Some Unresolved Issues. Seventh meeting hosted by The National Institute for Statistics and Economic Studies, 27–29 May, 2003, Paris. https://stats.unece.org/ottawagroup/download/f140.pdf.
Diewert, W. E., Silver, M., & Heravi, S. (2008). Hedonic Imputation versus Time Dummy Hedonic Indexes (NBER Working Paper no. 14018). https://doi.org/10.3386/w14018.
Diewert, W. E., & Shimizu, C. (2024). Product Churn and Quality Adjustment (TCER Working Papers E-197). https://www.tcer.or.jp/wp/pdf/e197.pdf.
Eurostat. (2013). Hedonic Regression Methods. In Handbook on Residential Property Prices Indices (pp. 49–64). https://doi.org/10.1787/9789264197183-7-en.
Eurostat. (2017). Practical Guidelines on the Use of Scanner Data for HICP. Publications Office of the European Union.
Eurostat. (2024). Harmonised Index of Consumer Prices (HICP). Methodological Manual. Publications Office of the European Union. https://doi.org/10.2785/055028.
de Haan, J. (2009). Comment on ‘Hedonic Imputation versus Time Dummy Hedonic Indexes’. In W. E. Diewert, J. S. Greenlees, & C. R. Hulten (Eds.), Price Index Concepts and Measurement (pp. 196–200). University of Chicago Press. https://www.nber.org/books-and-chapters/price-index-concepts-and-measurement/comment-hedonic-imputation-versus-time-dummy-hedonic-indexes-haan.
Honnibal, M., & Montani, I. (2017). spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. https://sentometrics-research.com/publication/72/.
Izquierdo, M., & Matea Rosa, M. de los L. (2001). An approximation to biases in the measurement of Spanish macroeconomic variables derived from product quality changes. Banco de Espana. https://scispace.com/pdf/an-approximation-to-biases-in-the-measurement-of-spanish-1oytgv8vq5.pdf.
Jurafsky, D., & Martin, J. H. (2024). Speech and Language Processing (3rd ed. draft). https://web.stanford.edu/~jurafsky/slp3/.
Kan, M. (2022a). Nvidia Confirms ‘LHR’ Mining Limiter for GPUs Has Been Eliminated. PC MAG. https://www.pcmag.com/news/nvidia-confirms-lhr-mining-limiter-has-been-eliminated-from-gpus.
Kan, M. (2022b). Nvidia’s GPU Mining Limiter Is Finally Beaten With 100% Unlock Tool. PC MAG. https://www.pcmag.com/news/nvidias-gpu-mining-limiter-is-finally-beaten-with-100-unlock-tool.
Murti, D. M. P., Pujianto, U., Wibawa, A. P., & Akbar, M. I. (2019). K-Nearest Neighbor (K-NN) based Missing Data Imputation. In 2019 5th International Conference on Science in Information Technology (pp. 83–88). https://doi.org/10.1109/ICSITech46713.2019.8987530.
Pakes, A. (2002). A Reconsideration of Hedonic Price Indices with an Application to PC’s (Working Paper 8715). National Bureau of Economic Research. https://doi.org/10.3386/w8715.
Parkhomenko, A., Redkina, A., & Maslivets, O. (2007, September 20). Econometric Estimates of Hedonic Price Indexes for Personal Computers in Russia. https://doi.org/10.2139/ssrn.1008011.
Przepiórkowski, A., Bańko, M., Górski, R. L., & Lewandowska-Tomaszczyk, B. (Eds.). (2012). Narodowy Korpus Języka Polskiego. Wydawnictwo Naukowe PWN. https://nkjp.pl/settings/papers/NKJP_ksiazka.pdf.
Rao, M. K., Lagisetty, R., Maniraj, M. S. V. K., Dattu, K. N. S., & Ganga, B. S. (2015). Commodity Price Data Analysis Using Web Scraping. International Journal of Advances in Applied Sciences, 4(4), 146– 150. https://doi.org/10.11591/ijaas.v4.i4.pp146-150.
Rybacki, J., Bińczak, T., & Kaczmarek, F. (2018). Is HICP really harmonized? Problems with quality adjustments and new products. Roczniki Kolegium Analiz Ekonomicznych, (53), 97–116. https://rocznikikae.sgh.waw.pl/p/roczniki_kae_z53_06.pdf.
Shelar, H., Kaur, G., Heda, N., & Agrawal, P. (2020). Named Entity Recognition Approaches and Their Comparison for Custom NER Model. Science & Technology Libraries, 39(3), 324–337. https://doi.org/10.1080/0194262X.2020.1759479.
Statistics Austria. (n.d.). Use of scanner data and web scraping in price statistics. Retrieved 14 March 2026, from https://www.statistik.at/en/about-us/innovations-new-data-sources/use-of-scannerdata-and-webscraping-in-price-statistics.
United Nations Economic Commission for Europe. (2021). Machine Learning for Official Statistics. United Nations. https://unece.org/sites/default/files/2022-09/ECECESSTAT20216.pdf.
Uriarte, J. I., Ramírez Munoz de Toro, G. R., & Larrosa, J. M. C. (2019). Web scraping based online consumer price index: The ‘IPC Online’ case. Journal of Economic and Social Measurement, 44(2–3), 141–159. https://doi.org/10.3233/JEM-190464.
Wilson, L. (2022). GPU Prices and Cryptocurrency Returns. Applied Finance Letters, 11, 2–8. https://doi.org/10.24135/afl.v11i.503.
Wróblewska, A. (2018). Extended and Enhanced Polish Dependency Bank in Universal Dependencies Format. In M.-C. de Marneffe, T. Lynn & S. Schuster (Eds.), Proceedings of the Second Workshop on Universal Dependencies (pp. 173–182). The Association for Computational Linguistics. https://doi.org/10.18653/v1/W18-6020.
Wuebbling, M. (2021, February 18). GeForce Is Made for Gaming, CMP Is Made to Mine. NVIDIA Blog. https://blogs.nvidia.com/blog/geforce-cmp/.