Analyze IMDb movies by sentiment and topic analysis

Ningjing Ouyang

doi:10.54517/esp.v8i3.1958

PDF

Published

2023-10-25

Issue

Vol. 8 No. 3 (2023)

Section

Research Articles

License

The journal adopts the Attribution-NonCommercial 4.0 International (CC BY-NC 4.0), which means that anyone can reuse and redistribute the materials for non-commercial purposes as long as you follow the license terms and the original source is properly cited.

Author(s) shall retain the copyright of their work and grant the Journal/Publisher rights for the first publication with the work concurrently licensed since 2023 Vol.8 No.2.

Under this license, author(s) will allow third parties to download, reuse, reprint, modify, distribute and/or copy the content under the condition that the authors are given credit. No permission is required from the authors or the publisher.

This broad license intends to facilitate free access, as well as the unrestricted use of original works of all types. This ensures that the published work is freely and openly available in perpetuity.

By providing open access, the following benefits are brought about:

Higher Visibility, Availability and Citations-free and unlimited accessibility of the publication over the internet without any restrictions increases citation of the article.
Ease of search-publications are easily searchable in search engines and indexing databases.
Rapid Publication – accepted papers are immediately published online.
Available for free download immediately after publication at https://esp.as-pub.com/index.php/ESP

Copyright Statement

1.The authors certify that the submitted manuscripts are original works, do not infringe the rights of others, are free from academic misconduct and confidentiality issues, and that there are no disputes over the authorship scheme of the collaborative articles. In case of infringement, academic misconduct and confidentiality issues, as well as disputes over the authorship scheme, all responsibilities will be borne by the authors.

2. The author agrees to grant the Editorial Office of Environment and Social Psychology a licence to use the reproduction right, distribution right, information network dissemination right, performance right, translation right, and compilation right of the submitted manuscript, including the work as a whole, as well as the diagrams, tables, abstracts, and any other parts that can be extracted from the work and used in accordance with the characteristics of the journal. The Editorial Board of Environment and Social Psychology has the right to use and sub-licence the above mentioned works for wide dissemination in print, electronic and online versions, and, in accordance with the characteristics of the periodical, for the period of legal protection of the property right of the copyright in the work, and for the territorial scope of the work throughout the world.

3. The authors are entitled to the copyright of their works under the relevant laws of Singapore, provided that they do not exercise their rights in a manner prejudicial to the interests of the Journal.

About Licence

Environment and Social Psychology is an open access journal and all published work is available under the Creative Commons Licence, Authors shall retain copyright of their work and grant the journal/publisher the right of first publication, and their work shall be licensed under the Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).

Under this licence, the author grants permission to third parties to download, reuse, reprint, modify, distribute and/or copy the content with attribution to the author. No permission from the author or publisher is required.

This broad licence is intended to facilitate free access to and unrestricted use of original works of all kinds. This ensures that published works remain free and accessible in perpetuity. Submitted manuscripts, once accepted, are immediately available to the public and permanently accessible free of charge on the journal’s official website (https://esp.as-pub.com/index.php/ESP). Allowing users to read, download, copy, print, search for or link to the full text of the article, or use it for other legal purposes. However, the use of the work must retain the author's signature, be limited to non-commercial purposes, and not be interpretative.

Click to download <Agreement on the Licence for the Use of Copyright on Environmental and Social Psychology>.

How to Cite

Ouyang, N. (2023). Analyze IMDb movies by sentiment and topic analysis. Environment and Social Psychology, 8(3). https://doi.org/10.54517/esp.v8i3.1958

Analyze IMDb movies by sentiment and topic analysis

Ningjing Ouyang

School of Communication, Hong Kong Baptist University

DOI: https://doi.org/10.54517/esp.v8i3.1958

Keywords: movie, nature language processing, sentiment analysis, topic analysis, Bi-LSTM, LDA

Abstract

Movie is an important cultural form, carrying multiple levels and meanings such as art, entertainment and social value. Movie review and rating data sets are huge, and deep learning and natural language processing methods are widely used today. Advances in big data and deep learning offer unprecedented opportunities to understand moviegoer behavior and preferences while providing a cost-effective way to gain insights relevant to the entertainment industry. This project conducts sentiment analysis, topic modeling, and visual statistical analysis based on the IMDb movie data set to identify key factors and deeper insights that influence successful decision-making in film production. This project first uses the word embedding method to vectorize the movie review text, and then uses Bidirectional Long Short-Term Memory (Bi-LSTM) to perform sentiment classification. In addition, statistical methods such as visualization were used to discover conclusions such as the highest average number of movies released in November, and identify trends, patterns and relationships between the variables of IMDb movies. Finally, the Latent Dirichlet Allocation (LDA) topic modeling model was constructed to find out that the important topic with increased demand is light entertainment movies, highlighting the commercial feasibility of comedy movies as a profitable business model. In summary, this project uses an emotion-topic fusion analysis method based on the Bi-LSTM emotion classification method and the LDA topic modeling method. The results show that the Bi-LSTM model can better identify positive and negative emotions in movie reviews, and the LDA topic model performs well in mining popular topics.

References

[1]. Zhang Y, Zhang L. Movie recommendation algorithm based on sentiment analysis and LDA. Procedia Computer Science 2022; 199: 871–878. doi: 10.1016/j.procs.2022.01.109

[2]. Bhuvaneshwari P, Rao AN, Robinson YH, Thippeswamy MN. Sentiment analysis for user reviews using Bi-LSTM self-attention based CNN model. Multimedia Tools and Applications 2022; 81(9): 12405–12419. doi: 10.1007/s11042-022-12410-4

[3]. Topal K, Ozsoyoglu G. Movie review analysis: Emotion analysis of IMDb movie reviews. In: Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM); 18–21 August 2016; San Francisco, CA, USA. pp. 1170–1176.

[4]. Sharma R, Morwal S, Agarwal B. Named entity recognition using neural language model and CRF for Hindi language. Computer Speech & Language 2022; 74: 101356. doi: 10.1016/j.csl.2022.101356

[5]. Trivedi SK, Dey S, Kumar A. Capturing user sentiments for online Indian movie reviews: A comparative analysis of different machine-learning models. The Electronic Library 2018; 36(4): 677–695. doi: 10.1108/EL-04-2017-0075

[6]. Kanani S, Patel S, Gupta RK, et al. An AI-enabled ensemble method for rainfall forecasting using long-short term memory. Mathematical Biosciences and Engineering 2023; 20(5): 8975–9002. doi: 10.3934/mbe.2023394

[7]. Rehman AU, Malik AK, Raza B, Ali W. A hybrid CNN-LSTM model for improving accuracy of movie reviews sentiment analysis. Multimedia Tools and Applications 2019; 78: 26597–26613. doi: 10.1007/s11042-019-07788-7

[8]. Hourrane O, Idrissi N, Benlahmar EH. Sentiment classification on movie reviews and twitter: An experimental study of supervised learning models. In: Proceedings of the 2019 1st International Conference on Smart Systems and Data Science (ICSSD); 3–4 October 2019; Rabat, Morocco. pp. 1–6.

[9]. Shaukat Z, Zulfiqar AA, Xiao C, et al. Sentiment analysis on IMDB using lexicon and neural networks. SN Applied Sciences 2020; 2(2): 1–10. doi: 10.1007/s42452-019-1926-x

[10]. Arora E, Mishra S, Kumar KV, Upadhyay P. Extending bidirectional language model for enhancing the performance of sentiment analysis. In: Gunjan V, Senatore S, Kumar A (editors). Advances in Cybernetics, Cognition, and Machine Learning for Communication Technologies. Springer; pp. 133–141.

[11]. Chirgaiya S, Sukheja D, Shrivastava N, Rawat R. Analysis of sentiment based movie reviews using machine learning techniques. Journal of Intelligent & Fuzzy Systems 2021; 41(5): 5449–5456. doi: 10.3233/JIFS-189866

[12]. Acikalin UU, Bardak B, Kutlu M. Turkish sentiment analysis using bert. In: Proceedings of the 2020 28th Signal Processing and Communications Applications Conference (SIU); 5–7 October 2020; Gaziantep, Turkey. pp. 1–4.

[13]. Wu J, Ye C, Zhou H. BERT for sentiment classification in software engineering. In: Proceedings of the 2021 International Conference on Service Science (ICSS); 14–16 May 2021; Xi’an, China. pp. 115–121.

[14]. Kaushik K, Parmar M. Sentiment analysis based on movie reviews using various classification techniques: A review. International Journal of Scientific Research in Computer Science Engineering and Information Technology 2021; 7(3): 197–208. doi: 10.32628/CSEIT217329

[15]. Hakim AA, Erwin A, Eng KI, et al. Automated document classification for news article in Bahasa Indonesia based on term frequency inverse document frequency (TF-IDF) approach. In: Proceedings of the 2014 6th International Conference on Information Technology and Electrical Engineering (ICITEE); 7–8 October 2014; Yogyakarta, Indonesia. pp. 1–4.

[16]. Yang Q. LDA-based topic mining research on China’s government data governance policy. Social Security and Administration Management 2022; 3(2): 33–42. doi: 10.23977/socsam.2022.030205

[17]. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. Journal of Machine Learning Research 2003; 3: 993–1022.

[18]. Newman D, Lau JH, Grieser K, Baldwin T. Automatic evaluation of topic coherence: Proceedings of the Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics; 2–4 June 2010; Los Angeles, California, USA. pp. 100–108.

[19]. Musat CC, Velcin J, Trausan-Matu S, Rizoiu MA. Improving topic evaluation using conceptual knowledge. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI); 16–22 July 2011; Barcelona, Catalonia, Spain. pp. 1866–1871.

[20]. Baroni M. Composition in distributional semantics. Language and Linguistics Compass 2013; 7(10): 511–522. doi: 10.1111/lnc3.12050

[21]. Roder M, Both A, Hinneburg A. Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining; 2–6 February 2015; Shanghai, China. pp. 399–408.

[22]. IMDB Movie Reviews with ratings. Available online: https://www.kaggle.com/datasets/nisargchodavadiya/imdb-movie-reviews-with-ratings-50k (accessed on 25 September 2023).

[23]. Tan KL, Lee CP, Lim KM. Roberta-Gru: A hybrid deep learning model for enhanced sentiment analysis. Applied Sciences 2023; 13(6): 3915. doi: 10.3390/app13063915

[24]. Ding R, Nallapati R, Xiang B. Coherence-aware neural topic modeling. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing; October–November 2018; Brussels, Belgium. pp. 830–836.

Environment and Social Psychology

editor-in-chief

indexing-and-archiving

Most Viewed

Keywords

Published

Issue

Section

License

How to Cite

Analyze IMDb movies by sentiment and topic analysis

Abstract

References