MARC

LEADER 00000nab a2200000uu 4500
001 3154309443
003 UK-CbPIL
022 |a 2514-9288 
022 |a 2514-9318 
022 |a 0033-0337 
022 |a 1758-7301 
024 7 |a 10.1108/DTA-03-2024-0283  |2 doi 
035 |a 3154309443 
045 2 |b d20250101  |b d20250331 
084 |a 38174  |2 nlm 
100 1 |a Bernardo Cerqueira de Lima  |u Federal University of Minas Gerais, Belo Horizonte, Brazil 
245 1 |a Optimized discovery of discourse topics in social media: science communication about COVID-19 in Brazil 
260 |b Emerald Group Publishing Limited  |c 2025 
513 |a Journal Article 
520 3 |a PurposeSocial media platforms that disseminate scientific information to the public during the COVID-19 pandemic highlighted the importance of the topic of scientific communication. Content creators in the field, as well as researchers who study the impact of scientific information online, are interested in how people react to these information resources and how they judge them. This study aims to devise a framework for extracting large social media datasets and find specific feedback to content delivery, enabling scientific content creators to gain insights into how the public perceives scientific information.Design/methodology/approachTo collect public reactions to scientific information, the study focused on Twitter users who are doctors, researchers, science communicators or representatives of research institutes, and processed their replies for two years from the start of the pandemic. The study aimed in developing a solution powered by topic modeling enhanced by manual validation and other machine learning techniques, such as word embeddings, that is capable of filtering massive social media datasets in search of documents related to reactions to scientific communication. The architecture developed in this paper can be replicated for finding any documents related to niche topics in social media data. As a final step of our framework, we also fine-tuned a large language model to be able to perform the classification task with even more accuracy, forgoing the need of more human validation after the first step.FindingsWe provided a framework capable of receiving a large document dataset, and, with the help of with a small degree of human validation at different stages, is able to filter out documents within the corpus that are relevant to a very underrepresented niche theme inside the database, with much higher precision than traditional state-of-the-art machine learning algorithms. Performance was improved even further by the fine-tuning of a large language model based on BERT, which would allow for the use of such model to classify even larger unseen datasets in search of reactions to scientific communication without the need for further manual validation or topic modeling.Research limitations/implicationsThe challenges of scientific communication are even higher with the rampant increase of misinformation in social media, and the difficulty of competing in a saturated attention economy of the social media landscape. Our study aimed at creating a solution that could be used by scientific content creators to better locate and understand constructive feedback toward their content and how it is received, which can be hidden as a minor subject between hundreds of thousands of comments. By leveraging an ensemble of techniques ranging from heuristics to state-of-the-art machine learning algorithms, we created a framework that is able to detect texts related to very niche subjects in very large datasets, with just a small amount of examples of texts related to the subject being given as input.Practical implicationsWith this tool, scientific content creators can sift through their social media following and quickly understand how to adapt their content to their current user’s needs and standards of content consumption.Originality/valueThis study aimed to find reactions to scientific communication in social media. We applied three methods with human intervention and compared their performance. This study shows for the first time, the topics of interest which were discussed in Brazil during the COVID-19 pandemic. 
651 4 |a Brazil 
653 |a Research facilities 
653 |a Dictionaries 
653 |a COVID-19 vaccines 
653 |a Science 
653 |a Social networks 
653 |a Feedback 
653 |a Data mining 
653 |a Modelling 
653 |a Human performance 
653 |a Machine learning 
653 |a State of the art 
653 |a Datasets 
653 |a Communication channels 
653 |a Large language models 
653 |a Sentiment analysis 
653 |a Texts 
653 |a Pandemics 
653 |a Documents 
653 |a Information seeking behavior 
653 |a Information resources 
653 |a Algorithms 
653 |a False information 
653 |a Digital media 
653 |a Databases 
653 |a Classification 
653 |a Physicians 
653 |a Models 
653 |a Mass media 
653 |a Academic discourse 
653 |a Institutes 
653 |a Humans 
653 |a Computer mediated communication 
653 |a Social media 
653 |a Validity 
653 |a COVID-19 
653 |a Topics 
653 |a Frame analysis 
653 |a Information 
653 |a Heuristic 
653 |a Misinformation 
653 |a Service increment for teaching 
653 |a Communication 
653 |a Language modeling 
653 |a Literature Reviews 
653 |a Predominantly White Institutions 
653 |a Nonverbal Communication 
653 |a Scientific and Technical Information 
653 |a Artificial Intelligence 
700 1 |a Renata Maria Abrantes Baracho  |u Federal University of Minas Gerais, Belo Horizonte, Brazil 
700 1 |a Mandl, Thomas  |u University of Hildesheim, Hildesheim, Germany 
700 1 |a Porto, Patricia Baracho  |u Pontifical Catholic University of Minas Gerais, Belo Horizonte, Brazil 
773 0 |t Data Technologies and Applications  |g vol. 59, no. 1 (2025), p. 180-198 
786 0 |d ProQuest  |t ABI/INFORM Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3154309443/abstract/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text  |u https://www.proquest.com/docview/3154309443/fulltext/embedded/6A8EOT78XXH2IG52?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3154309443/fulltextPDF/embedded/6A8EOT78XXH2IG52?source=fedsrch