Pilih kopi Starbuck
apa enaknya?
Untuk menjawab pertanyaan di atas, akan kami coba
menganalisa data hasil scrapping dari laman twitter Starbuck Indonesia @SbuxIndonesia(23 Feb 2019, jam 10 wib). Hasil analisa akan dibagi
menjadi: 1. Klasifikasi emosi, 2. Klasifikasi polaritas, 3. Analisa Corpus.
Tentunya analisa ini masih terbatas pada quota free yang
diberikan oleh twitter,
tapi sudah bisa memberi gambaran singkat tentang persepsi umumnya masyarakat Indonesia terhadap
Starbuck, untuk mencapai hasil yang
maximal tinggal mengajukan quota yang diinginkat ke twitter.
Berikut saya berikan script R nya:
![]() |
Tb1 |
![]() |
Tb2 |
![]() |
Tb3 |
#---Install library
library(plyr)
library(ggplot2)
library(wordcloud)
library(RColorBrewer)
library(httr)
library(slam)
library(mime)
library(R6)
library(twitteR)
library(bit)
library(bit64)
library(rjson)
library(DBI)
library(tm)
library(Rstem)
library(NLP)
library(sentiment)
library(Rcpp)
#---1.Connect to
Twitter
#Enter the authentication details below
# Authenticate with Twitter
api_key <- "xxxxxxxxxx"
api_secret <- " xxxxxxxxxx"
access_token <- " xxxxxxxxxx "
access_token_secret <- " xxxxxxxxxx "
setup_twitter_oauth(api_key, api_secret, access_token,
access_token_secret)
#---2.Harvest some
tweets
some_tweets = searchTwitter("@SbuxIndonesia",
n=500, lang="en")
# get the text
some_txt = sapply(some_tweets, function(x) x$getText())
#--- 3.Prepare text
for sentiment analysis
# remove retweet entities
some_txt = gsub("(RT|via)((?:\\b\\W*@\\w+)+)",
"", some_txt)
# remove at people
some_txt = gsub("@\\w+", "", some_txt)
# remove punctuation
some_txt = gsub("[[:punct:]]", "",
some_txt)
# remove numbers
some_txt = gsub("[[:digit:]]", "",
some_txt)
# remove html links
some_txt = gsub("http\\w+", "",
some_txt)
# remove unnecessary spaces
some_txt = gsub("[ \t]{2,}", "",
some_txt)
some_txt = gsub("^\\s+|\\s+$", "",
some_txt)
# define "tolower error handling" function
try.error = function(x)
{
# create missing
value
y = NA
# tryCatch error
try_error =
tryCatch(tolower(x), error=function(e) e)
# if not an error
if
(!inherits(try_error, "error"))
y = tolower(x)
# result
return(y)
}
# lower case using try.error with sapply
some_txt = sapply(some_txt, try.error)
# remove NAs in some_txt
some_txt = some_txt[!is.na(some_txt)]
names(some_txt) = NULL
#---4. Perform
sentiment analysis
#Please note that the classifying the polarity and emotion
of the tweets may take a few minutes
# classify emotion
class_emo = classify_emotion(some_txt,
algorithm="bayes", prior=1.0)
# get emotion best fit
emotion = class_emo[,7]
# substitute NA's by "unknown"
emotion[is.na(emotion)] = "unknown"
# classify polarity
class_pol = classify_polarity(some_txt,
algorithm="bayes")
# get polarity best fit
polarity = class_pol[,4]
#---5.Create a data
frame in order plot the results
# data frame with results
sent_df = data.frame(text=some_txt, emotion=emotion,
polarity=polarity, stringsAsFactors=FALSE)
# sort data frame
sent_df = within(sent_df, emotion, polarity)
head(sent_df)
#text
#congratulationsfor winning the champions league title in
milan uclfinal
#emotion polarity
#unknown positive
#---6. Plot the
emotions and polarity of the tweets
# plot distribution of emotions
p <- ggplot(sent_df, aes(x = emotion)) +
geom_bar(aes(y =
..count.., fill = emotion)) +
scale_fill_brewer(palette = "Dark2") +
labs(x =
"emotion categories", y = "number of comments") +
labs(title =
"Analisa Sentimen Twitter Starbuck Indonesia\n(classification by
emotion)",
plot.title =
element_text(size=12))+
theme_bw()#ok
#---7. Plot
distribution of polarity
p1 <- ggplot(sent_df, aes(x=polarity)) +
geom_bar(aes(y=..count.., fill=polarity)) +
scale_fill_brewer(palette="RdGy") +
labs(x="polarity categories", y="number of tweets")
+
labs(title =
"Analisa Sentimen Twitter Starbuck Indonesia \n(classification by
polarity)",
plot.title =
element_text(size=12)) +
theme_dark()#ok
# separating text by emotion
emos = levels(factor(sent_df$emotion))
nemo = length(emos)
emo.docs = rep("", nemo)
for (i in 1:nemo)
{
tmp = some_txt[emotion == emos[i]]
emo.docs[i] = paste(tmp, collapse="")
}
# remove stopwords
emo.docs = removeWords(emo.docs,
stopwords("english"))
# ---8.Create corpus
corpus = Corpus(VectorSource(emo.docs))
tdm = TermDocumentMatrix(corpus)
tdm = as.matrix(tdm)
colnames(tdm) = emos
# comparison word cloud
comparison.cloud(tdm, colors = brewer.pal(nemo,
"Dark2"), scale = c(3,.5),
random.order = FALSE, title.size = 1.5)
Kesimpulan:
1. Dari Tb1 terlihat, starbuck Indonesia banyak yang menyukai dan
nyaman untuk berlama lama di Starbuck ,data(joy = 80), meskipun masih ada juga yang kurang cocok terlihat dari data(sadness=20). Bagi starbuck ada tantangan karena ada data yang belum dimengerti, dan tentu perlu dipelajari lebih dalam(unknow hampir 60), starbuck jangan berbangga diri dulu karena nilai surprise cuma kira kira 5.
2. Dari Tb2, dari table polaritas, menegaskan table 1, tanggapan positive terhadap Starbuck Indonesia demikian besar(nilainya hampir 80), meskipun tanggapan negative juga tidak boleh diremehkan.
3. Dari Tb3 , memang tidak dipungkiri sampai saat ini “taste” Starbuck Indonesia masih menjadi factor utama.
Referensi:
3. Library
4. https://developer.twitter.com/en/apps
Tidak ada komentar:
Posting Komentar