TY - JOUR
T1 - Automatic cyberbullying detection in spanish-language social networks using sentiment analysis techniques
AU - Mercado, Rolfy Nixon Montufar
AU - Chuctaya, Hernan Faustino Chacca
AU - Castro Gutierrez, Eveling Gloria
N1 - Publisher Copyright:
© 2018, (IJACSA) International Journal of Advanced Computer Science and Applications.
PY - 2018
Y1 - 2018
N2 - Cyberbullying is a growing problem in our society that can bring fatal consequences and can be presented in digital text for example at online social networks. Nowadays there is a wide variety of works focused on the detection of digital texts in the English language, however in the Spanish language there are few studies that address this issue. This paper aims to detect this cybernetic harassment in social networks, in Spanish language. Sentiment analysis techniques are used, such as bag of words, elimination of signs and numbers, tokenization and stemming, as well as a Bayesian classifier. The data used for the training of the Bayesian classifier were obtained from the Spanish Dictionary of Affect in Language (SDAL), which is a database formed by more than 2500 words manually evaluated in three affective dimensions: Pleasantness, activation and imagery, as well as same 595 words obtained following the same procedure of SDAL was used with the help of the members of the Research Center, Technology Transfer and Software Development. As a result, the software developed has 93% success in the validation tests carried out.
AB - Cyberbullying is a growing problem in our society that can bring fatal consequences and can be presented in digital text for example at online social networks. Nowadays there is a wide variety of works focused on the detection of digital texts in the English language, however in the Spanish language there are few studies that address this issue. This paper aims to detect this cybernetic harassment in social networks, in Spanish language. Sentiment analysis techniques are used, such as bag of words, elimination of signs and numbers, tokenization and stemming, as well as a Bayesian classifier. The data used for the training of the Bayesian classifier were obtained from the Spanish Dictionary of Affect in Language (SDAL), which is a database formed by more than 2500 words manually evaluated in three affective dimensions: Pleasantness, activation and imagery, as well as same 595 words obtained following the same procedure of SDAL was used with the help of the members of the Research Center, Technology Transfer and Software Development. As a result, the software developed has 93% success in the validation tests carried out.
KW - Bag of words
KW - Cyberbullying
KW - Sentiment analysis
KW - Social media analytics
KW - Stemming
KW - Tokenization
UR - http://www.scopus.com/inward/record.url?scp=85054039739&partnerID=8YFLogxK
U2 - 10.14569/IJACSA.2018.090733
DO - 10.14569/IJACSA.2018.090733
M3 - Artículo
AN - SCOPUS:85054039739
SN - 2158-107X
VL - 9
SP - 228
EP - 235
JO - International Journal of Advanced Computer Science and Applications
JF - International Journal of Advanced Computer Science and Applications
IS - 7
ER -