Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram.-Z研学术

Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram.

来自 PUBMED

作者：

Mackey TK ， Li J ， Purushothaman V ， Nali M ， Shah N ， Bardier C ， Cai M ， Liang B

展开 

摘要：

The coronavirus disease (COVID-19) pandemic is perhaps the greatest global health challenge of the last century. Accompanying this pandemic is a parallel "infodemic," including the online marketing and sale of unapproved, illegal, and counterfeit COVID-19 health products including testing kits, treatments, and other questionable "cures." Enabling the proliferation of this content is the growing ubiquity of internet-based technologies, including popular social media platforms that now have billions of global users. This study aims to collect, analyze, identify, and enable reporting of suspected fake, counterfeit, and unapproved COVID-19-related health care products from Twitter and Instagram. This study is conducted in two phases beginning with the collection of COVID-19-related Twitter and Instagram posts using a combination of web scraping on Instagram and filtering the public streaming Twitter application programming interface for keywords associated with suspect marketing and sale of COVID-19 products. The second phase involved data analysis using natural language processing (NLP) and deep learning to identify potential sellers that were then manually annotated for characteristics of interest. We also visualized illegal selling posts on a customized data dashboard to enable public health intelligence. We collected a total of 6,029,323 tweets and 204,597 Instagram posts filtered for terms associated with suspect marketing and sale of COVID-19 health products from March to April for Twitter and February to May for Instagram. After applying our NLP and deep learning approaches, we identified 1271 tweets and 596 Instagram posts associated with questionable sales of COVID-19-related products. Generally, product introduction came in two waves, with the first consisting of questionable immunity-boosting treatments and a second involving suspect testing kits. We also detected a low volume of pharmaceuticals that have not been approved for COVID-19 treatment. Other major themes detected included products offered in different languages, various claims of product credibility, completely unsubstantiated products, unapproved testing modalities, and different payment and seller contact methods. Results from this study provide initial insight into one front of the "infodemic" fight against COVID-19 by characterizing what types of health products, selling claims, and types of sellers were active on two popular social media platforms at earlier stages of the pandemic. This cybercrime challenge is likely to continue as the pandemic progresses and more people seek access to COVID-19 testing and treatment. This data intelligence can help public health agencies, regulatory authorities, legitimate manufacturers, and technology platforms better remove and prevent this content from harming the public.

收起

展开 

DOI：

10.2196/20794

被引量：

年份：

1970

全部来源

SCI-Hub (全网免费下载)

发表链接

ResearchGate (全网免费下载)

钛学术 (全网免费下载)

通过文献互助平台发起求助，成功后即可免费获取论文全文。

查看求助

求助方法1：

知识发现用户

每天可免费求助50篇

求助

求助方法1：

关注微信公众号

每天可免费求助2篇

求助方法2：

求助需要支付5个财富值

您现在财富值不足

您可以通过应助全文获取财富值

求助方法2：

完成求助需要支付5财富值

您目前有 1000 财富值

求助

我们已与文献出版商建立了直接购买合作。

你可以通过身份认证进行实名认证，认证成功后本次下载的费用将由您所在的图书馆支付

您可以直接购买此文献，1~5分钟即可下载全文，部分资源由于网络原因可能需要更长时间，请您耐心等待哦~

身份认证全文购买

相似文献(314)

参考文献(20)

引证文献(35)

Big Data, Natural Language Processing, and Deep Learning to Detect and Characterize Illicit COVID-19 Product Sales: Infoveillance Study on Twitter and Instagram.

The coronavirus disease (COVID-19) pandemic is perhaps the greatest global health challenge of the last century. Accompanying this pandemic is a parallel "infodemic," including the online marketing and sale of unapproved, illegal, and counterfeit COVID-19 health products including testing kits, treatments, and other questionable "cures." Enabling the proliferation of this content is the growing ubiquity of internet-based technologies, including popular social media platforms that now have billions of global users. This study aims to collect, analyze, identify, and enable reporting of suspected fake, counterfeit, and unapproved COVID-19-related health care products from Twitter and Instagram. This study is conducted in two phases beginning with the collection of COVID-19-related Twitter and Instagram posts using a combination of web scraping on Instagram and filtering the public streaming Twitter application programming interface for keywords associated with suspect marketing and sale of COVID-19 products. The second phase involved data analysis using natural language processing (NLP) and deep learning to identify potential sellers that were then manually annotated for characteristics of interest. We also visualized illegal selling posts on a customized data dashboard to enable public health intelligence. We collected a total of 6,029,323 tweets and 204,597 Instagram posts filtered for terms associated with suspect marketing and sale of COVID-19 health products from March to April for Twitter and February to May for Instagram. After applying our NLP and deep learning approaches, we identified 1271 tweets and 596 Instagram posts associated with questionable sales of COVID-19-related products. Generally, product introduction came in two waves, with the first consisting of questionable immunity-boosting treatments and a second involving suspect testing kits. We also detected a low volume of pharmaceuticals that have not been approved for COVID-19 treatment. Other major themes detected included products offered in different languages, various claims of product credibility, completely unsubstantiated products, unapproved testing modalities, and different payment and seller contact methods. Results from this study provide initial insight into one front of the "infodemic" fight against COVID-19 by characterizing what types of health products, selling claims, and types of sellers were active on two popular social media platforms at earlier stages of the pandemic. This cybercrime challenge is likely to continue as the pandemic progresses and more people seek access to COVID-19 testing and treatment. This data intelligence can help public health agencies, regulatory authorities, legitimate manufacturers, and technology platforms better remove and prevent this content from harming the public.

Mackey TK ，Li J ，Purushothaman V ，Nali M ，Shah N ，Bardier C ，Cai M ，Liang B ... - 《JMIR Public Health and Surveillance》

被引量: 35 发表:1970年
Machine Learning to Detect Self-Reporting of Symptoms, Testing Access, and Recovery Associated With COVID-19 on Twitter: Retrospective Big Data Infoveillance Study.

Mackey T ，Purushothaman V ，Li J ，Shah N ，Nali M ，Bardier C ，Liang B ，Cai M ，Cuomo R ... - 《JMIR Public Health and Surveillance》

被引量: 62 发表:1970年
Identification and Characterization of Synthetic Nicotine Product Promotion and Sales on Instagram Using Natural Language Processing.

There has been a rapid proliferation of synthetic nicotine products in recent years, despite newly established regulatory authority and limited research into its health risks. Previous research has implicated social media platforms as an avenue for nicotine product unregulated sales. Yet, little is known about synthetic nicotine product content on social media. We utilized natural language processing to characterize the sales of synthetic nicotine products on Instagram. We collected Instagram posts by querying Instagram hashtags (eg, "#tobaccofreenicotine) related to synthetic nicotine. Using Bidirectional Encoder Representations from Transformers, collected posts were categorized into thematically related topic clusters. Posts within topic clusters relevant to study aims were then manually annotated for variables related to promotion and selling (eg, cost discussion, contact information for offline sales). A total of 7425 unique posts were collected with 2219 posts identified as related to promotion and selling of synthetic nicotine products. Nicotine pouches (52.9%, n = 1174), electronic nicotine delivery systems (30.6%, n = 679), and flavored e-liquids (14.1%, n = 313) were most commonly promoted. About 16.1% (n = 345) of posts contained embedded hyperlinks and 5.8% (n = 129) provided contact information for purported offline transactions. Only 17.6% (n = 391) of posts contained synthetic nicotine-specific health warnings. In the United States, synthetic nicotine products can only be legally marketed if they have received premarket authorization from the Food and Drug Administration (FDA). Despite these prohibitions, Instagram appears to be a hub for potentially unregulated sales of synthetic and "tobacco-free" products. Efforts are needed by platforms and regulators to enhance content moderation and prevent unregulated online sales of existing and emerging synthetic nicotine products. There is limited clinical understanding of synthetic nicotine's unique health risks and how these novel products are changing over time due to regulatory oversight. Despite synthetic nicotine-specific regulatory measures, such as the requirement for premarket authorization and FDA warning letters issued to unauthorized sellers, access to and promotion of synthetic nicotine is widely occurring on Instagram, a platform with over 2 billion users and one that is popular among youth and young adults. Activities include direct-to-consumer sales from questionable sources, inadequate health warning disclosure, and exposure with limited age restrictions, all conditions necessary for the sale of various tobacco products. Notably, the number of these Instagram posts increased in response to the announcement of new FDA regulations. In response, more robust online monitoring, content moderation, and proactive enforcement are needed from platforms who should work collaboratively with regulators to identify, report, and remove content in clear violation of platform policies and federal laws. Regulatory implementation and enforcement should prioritize digital platforms as conduits for unregulated access to synthetic nicotine products and other future novel and emerging tobacco products.

Shah NA ，Li Z ，McMann T ，Calac AJ ，Le N ，Nali MC ，Cuomo RE ，Mackey TK ... - 《-》

被引量: - 发表:2024年
Temporal and Location Variations, and Link Categories for the Dissemination of COVID-19-Related Information on Twitter During the SARS-CoV-2 Outbreak in Europe: Infoveillance Study.

The spread of the 2019 novel coronavirus disease, COVID-19, across Asia and Europe sparked a significant increase in public interest and media coverage, including on social media platforms such as Twitter. In this context, the origin of information plays a central role in the dissemination of evidence-based information about the SARS-CoV-2 virus and COVID-19. On February 2, 2020, the World Health Organization (WHO) constituted a "massive infodemic" and argued that this situation "makes it hard for people to find trustworthy sources and reliable guidance when they need it." This infoveillance study, conducted during the early phase of the COVID-19 pandemic, focuses on the social media platform Twitter. It allows monitoring of the dynamic pandemic situation on a global scale for different aspects and topics, languages, as well as regions and even whole countries. Of particular interest are temporal and geographical variations of COVID-19-related tweets, the situation in Europe, and the categories and origin of shared external resources. Twitter's Streaming application programming interface was used to filter tweets based on 16 prevalent hashtags related to the COVID-19 outbreak. Each tweet's text and corresponding metadata as well as the user's profile information were extracted and stored into a database. Metadata included links to external resources. A link categorization scheme-introduced in a study by Chew and Eysenbach in 2009-was applied onto the top 250 shared resources to analyze the relative proportion for each category. Moreover, temporal variations of global tweet volumes were analyzed and a specific analysis was conducted for the European region. Between February 9 and April 11, 2020, a total of 21,755,802 distinct tweets were collected, posted by 4,809,842 distinct Twitter accounts. The volume of #covid19-related tweets increased after the WHO announced the name of the new disease on February 11, 2020, and stabilized at the end of March at a high level. For the regional analysis, a higher tweet volume was observed in the vicinity of major European capitals or in densely populated areas. The most frequently shared resources originated from various social media platforms (ranks 1-7). The most prevalent category in the top 50 was "Mainstream or Local News." For the category "Government or Public Health," only two information sources were found in the top 50: US Centers for Disease Control and Prevention at rank 25 and the WHO at rank 27. The first occurrence of a prevalent scientific source was Nature (rank 116). The naming of the disease by the WHO was a major signal to address the public audience with public health response via social media platforms such as Twitter. Future studies should focus on the origin and trustworthiness of shared resources, as monitoring the spread of fake news during a pandemic situation is of particular importance. In addition, it would be beneficial to analyze and uncover bot networks spreading COVID-19-related misinformation.

Pobiruchin M ，Zowalla R ，Wiesner M 《JOURNAL OF MEDICAL INTERNET RESEARCH》

被引量: 18 发表:1970年
COVID-19 and the 5G Conspiracy Theory: Social Network Analysis of Twitter Data.

Since the beginning of December 2019, the coronavirus disease (COVID-19) has spread rapidly around the world, which has led to increased discussions across online platforms. These conversations have also included various conspiracies shared by social media users. Amongst them, a popular theory has linked 5G to the spread of COVID-19, leading to misinformation and the burning of 5G towers in the United Kingdom. The understanding of the drivers of fake news and quick policies oriented to isolate and rebate misinformation are keys to combating it. The aim of this study is to develop an understanding of the drivers of the 5G COVID-19 conspiracy theory and strategies to deal with such misinformation. This paper performs a social network analysis and content analysis of Twitter data from a 7-day period (Friday, March 27, 2020, to Saturday, April 4, 2020) in which the #5GCoronavirus hashtag was trending on Twitter in the United Kingdom. Influential users were analyzed through social network graph clusters. The size of the nodes were ranked by their betweenness centrality score, and the graph's vertices were grouped by cluster using the Clauset-Newman-Moore algorithm. The topics and web sources used were also examined. Social network analysis identified that the two largest network structures consisted of an isolates group and a broadcast group. The analysis also revealed that there was a lack of an authority figure who was actively combating such misinformation. Content analysis revealed that, of 233 sample tweets, 34.8% (n=81) contained views that 5G and COVID-19 were linked, 32.2% (n=75) denounced the conspiracy theory, and 33.0% (n=77) were general tweets not expressing any personal views or opinions. Thus, 65.2% (n=152) of tweets derived from nonconspiracy theory supporters, which suggests that, although the topic attracted high volume, only a handful of users genuinely believed the conspiracy. This paper also shows that fake news websites were the most popular web source shared by users; although, YouTube videos were also shared. The study also identified an account whose sole aim was to spread the conspiracy theory on Twitter. The combination of quick and targeted interventions oriented to delegitimize the sources of fake information is key to reducing their impact. Those users voicing their views against the conspiracy theory, link baiting, or sharing humorous tweets inadvertently raised the profile of the topic, suggesting that policymakers should insist in the efforts of isolating opinions that are based on fake news. Many social media platforms provide users with the ability to report inappropriate content, which should be used. This study is the first to analyze the 5G conspiracy theory in the context of COVID-19 on Twitter offering practical guidance to health authorities in how, in the context of a pandemic, rumors may be combated in the future.

Ahmed W ，Vidal-Alaball J ，Downing J ，López Seguí F ... - 《JOURNAL OF MEDICAL INTERNET RESEARCH》

被引量: 214 发表:1970年

加载更多

来源期刊

JMIR Public Health and Surveillance

影响因子：14.542

JCR分区：暂无

中科院分区：暂无