Jumat, 22 Februari 2019

Pilih kopi Starbuck apa enaknya?


Pilih kopi Starbuck apa enaknya?
Untuk menjawab pertanyaan di atas, akan kami coba menganalisa data hasil scrapping dari laman twitter Starbuck Indonesia  @SbuxIndonesia(23 Feb 2019, jam 10 wib). Hasil analisa akan dibagi menjadi: 1. Klasifikasi emosi, 2. Klasifikasi polaritas, 3. Analisa Corpus.
Tentunya analisa ini masih terbatas pada quota free yang diberikan oleh twitter, tapi sudah bisa memberi gambaran singkat tentang persepsi  umumnya masyarakat Indonesia terhadap Starbuck, untuk  mencapai hasil yang maximal tinggal mengajukan quota yang diinginkat ke twitter.
Berikut saya berikan script R nya:

Tb1

Tb2



Tb3


#---Install library
library(plyr)
library(ggplot2)
library(wordcloud)
library(RColorBrewer)
library(httr)
library(slam)
library(mime)
library(R6)
library(twitteR)
library(bit)
library(bit64)
library(rjson)
library(DBI)
library(tm)
library(Rstem)
library(NLP)
library(sentiment)
library(Rcpp)

#---1.Connect to Twitter
#Enter the authentication details below
# Authenticate with Twitter
api_key <- "xxxxxxxxxx"
api_secret <- " xxxxxxxxxx"
access_token <- " xxxxxxxxxx "
access_token_secret <- " xxxxxxxxxx "
setup_twitter_oauth(api_key, api_secret, access_token, access_token_secret)

#---2.Harvest some tweets
some_tweets = searchTwitter("@SbuxIndonesia", n=500, lang="en")
# get the text
some_txt = sapply(some_tweets, function(x) x$getText())

#--- 3.Prepare text for sentiment analysis
# remove retweet entities
some_txt = gsub("(RT|via)((?:\\b\\W*@\\w+)+)", "", some_txt)
# remove at people
some_txt = gsub("@\\w+", "", some_txt)
# remove punctuation
some_txt = gsub("[[:punct:]]", "", some_txt)
# remove numbers
some_txt = gsub("[[:digit:]]", "", some_txt)
# remove html links
some_txt = gsub("http\\w+", "", some_txt)
# remove unnecessary spaces
some_txt = gsub("[ \t]{2,}", "", some_txt)
some_txt = gsub("^\\s+|\\s+$", "", some_txt)
# define "tolower error handling" function
try.error = function(x)
{
   # create missing value
   y = NA
   # tryCatch error
   try_error = tryCatch(tolower(x), error=function(e) e)
   # if not an error
   if (!inherits(try_error, "error"))
   y = tolower(x)
   # result
   return(y)
}


# lower case using try.error with sapply
some_txt = sapply(some_txt, try.error)
# remove NAs in some_txt
some_txt = some_txt[!is.na(some_txt)]
names(some_txt) = NULL

#---4. Perform sentiment analysis
#Please note that the classifying the polarity and emotion of the tweets may take a few minutes

# classify emotion
class_emo = classify_emotion(some_txt, algorithm="bayes", prior=1.0)
# get emotion best fit
emotion = class_emo[,7]
# substitute NA's by "unknown"
emotion[is.na(emotion)] = "unknown"
# classify polarity
class_pol = classify_polarity(some_txt, algorithm="bayes")
# get polarity best fit
polarity = class_pol[,4]

#---5.Create a data frame in order plot the results
# data frame with results
sent_df = data.frame(text=some_txt, emotion=emotion,
polarity=polarity, stringsAsFactors=FALSE)

# sort data frame
sent_df = within(sent_df, emotion, polarity)
head(sent_df)                                                        
                                                            
#text               
#congratulationsfor winning the champions league title in milan uclfinal

#emotion polarity
#unknown positive


#---6. Plot the emotions and polarity of the tweets
# plot distribution of emotions
p <- ggplot(sent_df, aes(x = emotion)) +
  geom_bar(aes(y = ..count.., fill = emotion)) +
  scale_fill_brewer(palette = "Dark2") +
  labs(x = "emotion categories", y = "number of comments") +
  labs(title = "Analisa Sentimen Twitter Starbuck Indonesia\n(classification by emotion)",
     plot.title = element_text(size=12))+
  theme_bw()#ok

#---7. Plot distribution of polarity
p1 <- ggplot(sent_df, aes(x=polarity)) +
  geom_bar(aes(y=..count.., fill=polarity)) +
  scale_fill_brewer(palette="RdGy") +
  labs(x="polarity categories", y="number of tweets") +
  labs(title = "Analisa Sentimen Twitter Starbuck Indonesia \n(classification by polarity)",
       plot.title = element_text(size=12)) +
  theme_dark()#ok

# separating text by emotion
emos = levels(factor(sent_df$emotion))
nemo = length(emos)
emo.docs = rep("", nemo)
for (i in 1:nemo)
{
tmp = some_txt[emotion == emos[i]]
emo.docs[i] = paste(tmp, collapse="")
}
# remove stopwords
emo.docs = removeWords(emo.docs, stopwords("english"))
# ---8.Create corpus
corpus = Corpus(VectorSource(emo.docs))
tdm = TermDocumentMatrix(corpus)
tdm = as.matrix(tdm)
colnames(tdm) = emos
# comparison word cloud
comparison.cloud(tdm, colors = brewer.pal(nemo, "Dark2"), scale = c(3,.5),
                 random.order = FALSE, title.size = 1.5)
Kesimpulan:
1.  Dari Tb1 terlihat, starbuck Indonesia banyak yang menyukai  dan
nyaman  untuk berlama lama di Starbuck ,data(joy = 80), meskipun masih ada juga yang kurang cocok terlihat dari data(sadness=20).  Bagi starbuck ada tantangan karena ada data yang belum dimengerti, dan tentu perlu dipelajari lebih dalam(unknow hampir 60),  starbuck jangan berbangga diri dulu karena  nilai  surprise cuma kira kira 5.
2. Dari Tb2, dari table polaritas, menegaskan table 1, tanggapan positive terhadap Starbuck Indonesia demikian besar(nilainya hampir 80), meskipun tanggapan negative juga  tidak boleh diremehkan.
3. Dari Tb3 , memang tidak dipungkiri sampai saat ini “taste” Starbuck Indonesia masih menjadi factor utama.

Referensi:
1.       Tokopedia vs Bukalapak
2.       Rstudio
3.       Library
4.       https://developer.twitter.com/en/apps    


Tidak ada komentar:

Posting Komentar

Kapan Puncak Covid19 di Indonesia ?

Prediksi Covid19, kapan mencapai puncaknya? Selamat Idhul Fitri from Home, menjawab pertanyaan kapan covid19 mencapai puncaknya? ...