When SMILES have Language: Drug Classification using Text Classification Methods on Drug SMILES Strings

Guardado en:
Bibliografiske detaljer
Udgivet i:arXiv.org (Mar 27, 2024), p. n/a
Hovedforfatter: Wasi, Azmine Toushik
Andre forfattere: Šerbetar Karlo, Islam, Raima, Taki, Hasan Rafi, Dong-Kyu Chae
Udgivet:
Cornell University Library, arXiv.org
Fag:
Online adgang:Citation/Abstract
Full text outside of ProQuest
Tags: Tilføj Tag
Ingen Tags, Vær først til at tagge denne postø!

MARC

LEADER 00000nab a2200000uu 4500
001 2972949841
003 UK-CbPIL
022 |a 2331-8422 
035 |a 2972949841 
045 0 |b d20240327 
100 1 |a Wasi, Azmine Toushik 
245 1 |a When SMILES have Language: Drug Classification using Text Classification Methods on Drug SMILES Strings 
260 |b Cornell University Library, arXiv.org  |c Mar 27, 2024 
513 |a Working Paper 
520 3 |a Complex chemical structures, like drugs, are usually defined by SMILES strings as a sequence of molecules and bonds. These SMILES strings are used in different complex machine learning-based drug-related research and representation works. Escaping from complex representation, in this work, we pose a single question: What if we treat drug SMILES as conventional sentences and engage in text classification for drug classification? Our experiments affirm the possibility with very competitive scores. The study explores the notion of viewing each atom and bond as sentence components, employing basic NLP methods to categorize drug types, proving that complex problems can also be solved with simpler perspectives. The data and code are available here: https://github.com/azminewasi/Drug-Classification-NLP. 
653 |a Classification 
653 |a Machine learning 
653 |a Strings 
653 |a Chemical bonds 
653 |a Representations 
653 |a Sentences 
700 1 |a Šerbetar Karlo 
700 1 |a Islam, Raima 
700 1 |a Taki, Hasan Rafi 
700 1 |a Dong-Kyu Chae 
773 0 |t arXiv.org  |g (Mar 27, 2024), p. n/a 
786 0 |d ProQuest  |t Engineering Database 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/2972949841/abstract/embedded/L8HZQI7Z43R0LA5T?source=fedsrch 
856 4 0 |3 Full text outside of ProQuest  |u http://arxiv.org/abs/2403.12984