Tokopedia vs Bukalapak
Dunia start up saat
ini lagi booming di Indonesia, ada
beberapa start up dari Indonesia yang berhasil menyandang gelar “Unicorn”,
contohnya Tokopedia, Bukalapak, Traveloka, Gojek dsb.
Tentu bukan perkara mudah sebuah start up bisa menyandang gelar “Unicorn”,
sedikit kisah tentang Bukalapak, yang pernah hampir bangkrut, bangkrut berarti
valuasi start up tsb nyaris nol rupiah(Rp 0.00,-), tapi Achmad Zaky sang foundernya
berhasil menyelamatkan dari tragedy kebangkrutan, tentu tidak mudah
melakukannya.
Ilustrasi Bukalapak tsb, hanyalah salah satu kasus saja,
tentu start up lain juga punya perjuangan dan keunikan tersendiri untuk
survive.
Di sini kita hanya akan membahas perbandingan Tokopedia dan
Bukalapak, terhadap cuitan twitter, apakah disukai atau tidak, orang sono
bilang “like” or “hate”. Kira kira mana yang lebih banyak disukai.
Gb1 |
Gb2 |
Dengan menggunakan R
programming, data cuitan twiter
@tokopedia dan @bukalapak, tgl 21 Februari 2019, jam 9.30. Berikut saya
tuliskan script R nya:
# Install paket
library
Install.package(“twitter”)
Install.packages(“ROAuth”)
Install.packages(“httr”)
library(twitteR)
library(ROAuth)
library(httr)
pkgs
<-c('twitteR','ROAuth','httr','plyr','stringr','ggplot2','plotly')
for(p in pkgs) if(p %in% rownames(installed.packages()) ==
FALSE) {install.packages(p)}
for(p in pkgs) suppressPackageStartupMessages(library(p,
quietly=TRUE, character.only=TRUE))
# Set API Keys
#Untuk mengambil Api
key, bisa daftar di https://developer.twitter.com/en/apps
api_key <- "xxxxxxxxxxxxxxx"
api_secret <- " xxxxxxxxxxxxxxx "
access_token <- " xxxxxxxxxxxxxxx "
access_token_secret <- " xxxxxxxxxxxxxxx "
setup_twitter_oauth(api_key, api_secret, access_token,
access_token_secret)
# Grab latest tweets
tokopedia <- searchTwitter('@tokopedia', n=100)
bukalapak <- searchTwitter('@bukalapak', n=100)
# Loop over tweets
and extract text
feed_tokopedia <- laply(tokopedia, function(t)
t$getText())
feed_bukalapak <- laply(bukalapak, function(t)
t$getText())
yay <-
scan("D:/lexicon/positive-indo.txt",what="character",
comment.char=";")
boo <- scan('D:/lexicon/negative-indo.txt',
what="character", comment.char=";")
# Add a few twitter-specific negative phrases
bad_text = c(boo, 'jelek', 'gagal', 'kecil', 'susah',
'mundur', 'pencitraan', 'bohong')
good_text = c(yay, 'senang', 'selesai', 'besar', 'dukung',
'maju', 'hebat', 'sukses')
score.sentiment <- function(sentences, good_text,
bad_text, .progress='none')
{
require(plyr)
require(stringr)
# we got a vector of sentences. plyr will
handle a list
# or a vector as an "l" for us
# we want a simple array of scores back, so
we use
# "l" + "a" +
"ply" = "laply":
scores =
laply(sentences, function(sentence, good_text, bad_text) {
# clean up sentences with R's regex-driven
global substitute, gsub():
sentence =
gsub('[[:punct:]]', '', sentence)
sentence =
gsub('[[:cntrl:]]', '', sentence)
sentence =
gsub('\\d+', '', sentence)
#to remove emojis
sentence <-
iconv(sentence, 'UTF-8', 'ASCII')
sentence =
tolower(sentence)
# split into words. str_split is in the
stringr package
word.list =
str_split(sentence, '\\s+')
# sometimes a list() is one
level of hierarchy too much
words =
unlist(word.list)
# compare our
words to the dictionaries of positive & negative terms
pos.matches =
match(words, good_text)
neg.matches =
match(words, bad_text)
# match() returns the position of the
matched term or NA
# we just want a TRUE/FALSE:
pos.matches =
!is.na(pos.matches)
neg.matches =
!is.na(neg.matches)
# and conveniently enough, TRUE/FALSE will
be treated as 1/0 by sum():
score =
sum(pos.matches) - sum(neg.matches)
return(score)
}, good_text,
bad_text, .progress=.progress )
scores.df = data.frame(score=scores,
text=sentences)
return(scores.df)
}
# Retreive scores and
add candidate name.
thetokopedia <- score.sentiment(feed_tokopedia,
good_text, bad_text, .progress='text')
thetokopedia$name <- 'tokopedia'
feelbukalapak <- score.sentiment(feed_bukalapak,
good_text, bad_text, .progress='text')
feelbukalapak$name <- 'bukalapak'
# Merge into one
dataframe for plotting
plotdat <- rbind(thetokopedia, feelbukalapak)
# Cut the text, just
gets in the way
plotdat <- plotdat[c("name",
"score")]
# Remove neutral values of 0
plotdat <- plotdat[!plotdat$score == 0, ]
# Remove anything less than -3 or greater than 3
plotdat <- plotdat[!plotdat$score > 5, ]
plotdat <- plotdat[!plotdat$score < (-5), ]
# Nice little quick
plot
q1 <- qplot(factor(score), data=plotdat,
geom="bar",
fill=factor(name),
xlab = "Sentiment Score")
# Or get funky with
ggplot2 + Plotly
ep <- plotdat %>%
ggplot(aes(x =
score, fill = name)) +
geom_bar(binwidth =
1) +
scale_fill_manual(values = c("#0067F7","#FFFF00",
"#CD853F", "#6B8E23","#F70000")) +
theme_classic(base_size = 12) +
scale_x_continuous(name = "Sentiment Score") +
scale_y_continuous(name = "Text count of tweets") +
ggtitle("Sentimen Twitter Tokopedia vs Bukalapak",subtitle =
"Twitter Scrapping 21 Feb 2019")
theme(axis.title.y = element_text(face="bold",
colour="#000000", size=10),
axis.title.x =
element_text(face="bold", colour="#000000", size=8),
axis.text.x =
element_text(angle=16, vjust=0, size=8))
#ggplotly(ep)
Ep
Kesimpulan
Dengan melihat grafik di atas, kita dapat menganalisa sekaligus menyimpulkan:
11. Sentiment negative bukalapak jauh lebih besar
dari tokopedia, kenapa? Beberapa hari sebelum dilakukan scraping data twitter,
CEO bukalapak melontarkan issue yang ditanggapi negative oleh netizen, bahkan
CEO bulalapak sempat dipanggil presiden.
22. Sebaliknya tokopedia makin mantab di atas bukalapak.
33. Stakeholder sebaiknya menjaga diri dan
menghindari pernyataan-pernyataan yang
kontra produktif karena startup sangat sensitive.
44. Startup di Indonesia tergolong masih baru,
meskipun sudah dapat gelar “Unicorn”, sentiment negative masih tergolong besar,
itu bisa disimpulkan karena banyaknya keluhan dari customer.
Referensi:
the twitteR vignettes on CRAN,
R code and data for book titled R and Data Mining: Examples and Case Studies,
http://www.rdatamining.com/
Referensi:
the twitteR vignettes on CRAN,
R code and data for book titled R and Data Mining: Examples and Case Studies,
http://www.rdatamining.com/
Tidak ada komentar:
Posting Komentar