Real R application: Tokopedia vs Bukalapak

Tokopedia vs Bukalapak

Dunia start up saat ini lagi booming di Indonesia, ada beberapa start up dari Indonesia yang berhasil menyandang gelar “Unicorn”, contohnya Tokopedia, Bukalapak, Traveloka, Gojek dsb. Tentu bukan perkara mudah sebuah start up bisa menyandang gelar “Unicorn”, sedikit kisah tentang Bukalapak, yang pernah hampir bangkrut, bangkrut berarti valuasi start up tsb nyaris nol rupiah(Rp 0.00,-), tapi Achmad Zaky sang foundernya berhasil menyelamatkan dari tragedy kebangkrutan, tentu tidak mudah melakukannya.

Ilustrasi Bukalapak tsb, hanyalah salah satu kasus saja, tentu start up lain juga punya perjuangan dan keunikan tersendiri untuk survive.

Di sini kita hanya akan membahas perbandingan Tokopedia dan Bukalapak, terhadap cuitan twitter, apakah disukai atau tidak, orang sono bilang “like” or “hate”. Kira kira mana yang lebih banyak disukai.

Gb1

Gb2

Dengan menggunakan R programming, data cuitan twiter @tokopedia dan @bukalapak, tgl 21 Februari 2019, jam 9.30. Berikut saya tuliskan script R nya:

# Install paket library

Install.package(“twitter”)

Install.packages(“ROAuth”)

Install.packages(“httr”)

library(twitteR)

library(ROAuth)

library(httr)

pkgs <-c('twitteR','ROAuth','httr','plyr','stringr','ggplot2','plotly')

for(p in pkgs) if(p %in% rownames(installed.packages()) == FALSE) {install.packages(p)}

for(p in pkgs) suppressPackageStartupMessages(library(p, quietly=TRUE, character.only=TRUE))

# Set API Keys

#Untuk mengambil Api key, bisa daftar di https://developer.twitter.com/en/apps

api_key <- "xxxxxxxxxxxxxxx"

api_secret <- " xxxxxxxxxxxxxxx "

access_token <- " xxxxxxxxxxxxxxx "

access_token_secret <- " xxxxxxxxxxxxxxx "

setup_twitter_oauth(api_key, api_secret, access_token, access_token_secret)

# Grab latest tweets

tokopedia <- searchTwitter('@tokopedia', n=100)

bukalapak <- searchTwitter('@bukalapak', n=100)

# Loop over tweets and extract text

feed_tokopedia <- laply(tokopedia, function(t) t$getText())

feed_bukalapak <- laply(bukalapak, function(t) t$getText())

yay <- scan("D:/lexicon/positive-indo.txt",what="character", comment.char=";")

boo <- scan('D:/lexicon/negative-indo.txt', what="character", comment.char=";")

# Add a few twitter-specific negative phrases

bad_text = c(boo, 'jelek', 'gagal', 'kecil', 'susah', 'mundur', 'pencitraan', 'bohong')

good_text = c(yay, 'senang', 'selesai', 'besar', 'dukung', 'maju', 'hebat', 'sukses')

score.sentiment <- function(sentences, good_text, bad_text, .progress='none')

{

require(plyr)

require(stringr)

# we got a vector of sentences. plyr will handle a list

# or a vector as an "l" for us

# we want a simple array of scores back, so we use

# "l" + "a" + "ply" = "laply":

scores = laply(sentences, function(sentence, good_text, bad_text) {

# clean up sentences with R's regex-driven global substitute, gsub():

sentence = gsub('[[:punct:]]', '', sentence)

sentence = gsub('[[:cntrl:]]', '', sentence)

sentence = gsub('\\d+', '', sentence)

#to remove emojis

sentence <- iconv(sentence, 'UTF-8', 'ASCII')

sentence = tolower(sentence)

# split into words. str_split is in the stringr package

word.list = str_split(sentence, '\\s+')

# sometimes a list() is one level of hierarchy too much

words = unlist(word.list)

# compare our words to the dictionaries of positive & negative terms

pos.matches = match(words, good_text)

neg.matches = match(words, bad_text)

# match() returns the position of the matched term or NA

# we just want a TRUE/FALSE:

pos.matches = !is.na(pos.matches)

neg.matches = !is.na(neg.matches)

# and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():

score = sum(pos.matches) - sum(neg.matches)

return(score)

}, good_text, bad_text, .progress=.progress )

scores.df = data.frame(score=scores, text=sentences)

return(scores.df)

}

# Retreive scores and add candidate name.

thetokopedia <- score.sentiment(feed_tokopedia, good_text, bad_text, .progress='text')

thetokopedia$name <- 'tokopedia'

feelbukalapak <- score.sentiment(feed_bukalapak, good_text, bad_text, .progress='text')

feelbukalapak$name <- 'bukalapak'

# Merge into one dataframe for plotting

plotdat <- rbind(thetokopedia, feelbukalapak)

# Cut the text, just gets in the way

plotdat <- plotdat[c("name", "score")]

# Remove neutral values of 0

plotdat <- plotdat[!plotdat$score == 0, ]

# Remove anything less than -3 or greater than 3

plotdat <- plotdat[!plotdat$score > 5, ]

plotdat <- plotdat[!plotdat$score < (-5), ]

# Nice little quick plot

q1 <- qplot(factor(score), data=plotdat, geom="bar",

fill=factor(name),

xlab = "Sentiment Score")

# Or get funky with ggplot2 + Plotly

ep <- plotdat %>%

ggplot(aes(x = score, fill = name)) +

geom_bar(binwidth = 1) +

scale_fill_manual(values = c("#0067F7","#FFFF00", "#CD853F", "#6B8E23","#F70000")) +

theme_classic(base_size = 12) +

scale_x_continuous(name = "Sentiment Score") +

scale_y_continuous(name = "Text count of tweets") +

ggtitle("Sentimen Twitter Tokopedia vs Bukalapak",subtitle = "Twitter Scrapping 21 Feb 2019")

theme(axis.title.y = element_text(face="bold", colour="#000000", size=10),

axis.title.x = element_text(face="bold", colour="#000000", size=8),

axis.text.x = element_text(angle=16, vjust=0, size=8))

#ggplotly(ep)

Kesimpulan

Dengan melihat grafik di atas, kita dapat menganalisa sekaligus menyimpulkan:

11. Sentiment negative bukalapak jauh lebih besar dari tokopedia, kenapa? Beberapa hari sebelum dilakukan scraping data twitter, CEO bukalapak melontarkan issue yang ditanggapi negative oleh netizen, bahkan CEO bulalapak sempat dipanggil presiden.

22. Sebaliknya tokopedia makin mantab di atas bukalapak.

33. Stakeholder sebaiknya menjaga diri dan menghindari pernyataan-pernyataan yang kontra produktif karena startup sangat sensitive.

44. Startup di Indonesia tergolong masih baru, meskipun sudah dapat gelar “Unicorn”, sentiment negative masih tergolong besar, itu bisa disimpulkan karena banyaknya keluhan dari customer.

Referensi:
the twitteR vignettes on CRAN,
R code and data for book titled R and Data Mining: Examples and Case Studies,
http://www.rdatamining.com/

Real R application

Rabu, 20 Februari 2019

Tokopedia vs Bukalapak

Tidak ada komentar:

Posting Komentar

Financial Institutions Survey Analysis Part 1

Cari Blog Ini