Introduction
Online Social Network (OSN) is a structure built between individuals, and it is used as a source of spreading information, sharing opinions, and promoting business through social media platforms. Hashtags are the main components within Twitter that work as a tagging technique to group related messages of the same topic. The dynamic nature of hashtag usage presented many research problems in the recent years, such as hashtag recommendation [1]–[7], semantic hashtag classification [8], and events discovery [9]. Since these hashtags are user-generated, redundant hashtags are created on the same topic. It is difficult and time-consuming for a user to search for the related hashtags to use. In this context, recommending the most contemporary relevant hashtags to a user according to his/her interests and preferences is a challenging problem.
Several methods have been proposed in the literature of Twitter hashtag recommendation. The most widely accepted method is hashtag recommendation as a multiclass classification problem that automatically labels a given tweet with a hashtag, where every hashtag is a distinct class label [3], [10]–[13]. The increasing number of these classes reduces the performance of the final recommendation. Research following this method exploits only the content of the tweet. An alternative approach is a hashtag recommendation method based on the content similarity of tweets [14]. This method extracts candidate hashtags from the set of similar tweets. Generally, the problem with this method is that it retrieves all similar tweets from a large data set whether they contain relevant hashtags or not. For example, Bob is posting a tweet about his children playing in the neighborhood park. There are many similar tweets about park and children, but they may contain hashtags that are irrelevant to Bob. Therefore, hashtag recommendation based on the content similarity of tweets has been extended with additional features, such as collaborative filtering of like-minded users through hashtag usage [15], topics [7], social, and time [1]. The social factor [1], [16] is a significant factor. It utilizes historical tweets of all mentioned users, followers, or followees of a user. However, the number of users who join social media is increasing every day, and each of these users generates massive amounts of data. For example, the former U.S. president Barack Obama’s account has 10 472 followers and 614 897 followees relationships in 2019 [17], [18]. Therefore, a gigantic number of processes and memory spaces are required to perform such a task like hashtag recommendation for every single user.
The closest state-of-the-art methods related to this article are: first, hashtag recommendation based on the content similarity of tweets applied on a large data set [19]; second, a hybrid method of content similarity of tweets and collaborative filtering of users [15]; and third, a method which recommends the most recent and frequent hashtags using the content similarity method and utilizing historical tweets of the followees [1]. Unlike these works, in this article, we propose a novel framework that we call community-based hashtag recommendation to study various factors influencing the performance of hashtag recommendation. To achieve this goal, we investigate and measure how different settings of textual and social factors affect the performance of hashtag recommendation. To deeply study the social factors, our framework uses the tweet similarity method and applies it on communities generated using various community detection algorithms from different subtleties of social network constructions to recommend hashtags. As for the textual factor, we address the effect of the semantic and words co-occurrence of the tweets on the performance of hashtag recommendation.
A. Research Contributions
A query tweet is a tweet posted by a user where its hashtags are removed to be used in testing the recommended hashtags. Our observations indicate that a large number of tweets are not as important as the availability of highly similar tweets to the query tweet for hashtag recommendation. Another observation shows that the quality of the recommended hashtags varies a lot among users; the model can perform very well for some users and poor for others. Inspired by these observations and the fact that different formations of network graphs lead to different communities to be detected, this article extends our previous work [20], where we only looked at the performance of hashtag recommendation over communities detected from the network created based on the following relations using the Clique percolation method (CPM) [21] and Breadth First Search (BFS). The network based on the following relations was already generated in [22]. The extension work we cover in this article includes three extra types of user–user networks.
Network based on mention: This type of network groups users who interact with each other through the mention (which can be a tweet reply or a follow-up) of their tweets [23], [24]. We construct mention-based networks in this article by finding users who have at least one interaction through mention (i.e., the @ symbol) in the whole data set.
Network based on hashtag: This type of network groups like-minded users who used similar historical hashtags. Hashtags have been used as an indication for topics discussed between users [23]. We construct hashtag-based networks in this article by finding users who used similar historical hashtags where their similarity score is over a predetermined threshold value.
Network based on topic: This type of networks group like-minded users who share similar discussed topics generated using topics model, such as Latent Dirichlet Allocation (LDA). LDA is one of the most popular techniques [25] that generates the most probable topics discussed in a set of documents and words representing each topic joined with their probabilities. We construct topic-based networks by finding similar users who share similar discussed topics generated using LDA where their similarity score is greater than a predetermined threshold value.
We design a community-based hashtag recommendation framework (Section III-B). This framework exploits the shared hashtag usage among dense users within communities for hashtag recommendation. The experimental results show the effectiveness of our proposed framework.
Our proposed framework helps researchers understand various factors influencing hashtag recommendation, which are more difficult to achieve in existing methods. Using our framework, we found in [20] that the social and textual factors were the most influential factors in hashtag recommendation on detected communities. In this article, we further investigate these two factors. In the textual factor, TF-IDF and Mean of Word Embeddings (MOWE) are integrated into the community-based hashtag recommendation framework, and their results are compared. The framework with the MOWE achieves better results than the one integrated with the TFIDF, which indicates the importance of the tweet’s semantics over the words co-occurrence. We deeply investigate the social factor in terms of the network constructions and the user grouping methods through community detection algorithms. We find that the performance of hashtag recommendation on communities detected using CPM from a hashtag network outperforms all other variations as explained in Section IV-B (Stage 3).
We introduce the use of density curves to visualize and compare the performance of the community-based hashtag recommendation frameworks under different configurations on many communities (Section III-C). The optimal result is observed when the performance has a heavy tail on the right end of the curve.
The remainder of this article is structured as follows. Section II reviews The related work. Section III describes a brief review of the basics, the proposed methodologies, and the evaluation methods. In Section IV, we describe the data set, the preprocessing techniques, the parameter settings, the conducted experiments, and the results. Section V presents general discussions and limitations. Finally, the conclusion and future work are given in Section VI.
Related Work
In this section, we review the research work in the following four main areas: OSN, community detection methods, hashtag recommendation methods, and ranking methods of hashtags.
A. OSN and Community Detection Algorithms
The immense amount of metadata available in OSN allows heterogeneous social networks with various node and edge properties or correlations to be generated. The majority of existing research work focuses on the edge attributes of the networks like the social relations between users (followerships, friendships, or actual interactions) [21], [26], [28], [29]. Other research articles focus on the node attributes to connect like-minded users. Node attributes are typically stored under user profiles, which can contain explicit information about the user, such as biography (name, geo-location, and date of birth) or implicit information extracted from the OSN metadata via some user profiling techniques. Typical implicit information can be the interests of the user [30], topics of their posted messages [31], [32], or the users’ behaviors in using the social networks [23], [24].
Social relations, similar interest, similar geo-locations, and so on, naturally put users in OSN into clusters. It is, therefore, important at the onset to identify communities, which connect groups of users. Subsequent operations can focus just on the detected communities. Indeed, various community detection algorithms have been reported in the literature, including the clique-based methods, modularity-based methods, and label propagation methods. Representatives of these algorithms are briefly discussed in the following.
CPM [21] is an algorithm focusing on detecting subgraphs of fully connected users, i.e., cliques in the network. For a given value
One can consider community detection as a network construction process where, rather than processing on the metadata of the entire social network, only subnetworks are extracted and analyzed. Ideally, one should aim at detecting communities that are sufficiently large instead of many small communities. However, the number and the size distribution of communities are highly dependent on the underlying network, the community detection algorithm, and various factors. For instance, Darmon et al. [23] reported that the density of users within the communities generated based on the usage of similar hashtags and conversation between users was higher than that when similar temporal activities and social relations are taken into account.
B. Hashtag Recommendation Methods
Research on natural language processing (NLP) has shown great success in mapping words and sentences into vectors. Through feature learning, words of similar meaning are close to each other in the embedding space. This allows the meaning of words to be predicted. The most popular and effective word embedding method is the word2vec model [33]. The total, average, or concatenation of the embedding vectors [34] of surrounding words have also been used to predict the candidate words in sentences.
Hashtag recommendation task analyzes the contents of the tweets, as tweets of similar contents are more likely to use similar hashtags. This task is commonly tackled as a multiclass classification problem of hashtags [3], [3], [10], [35] or content similarity problem of tweets [1], [14], [15]. Tackling the hashtag recommendation as a multiclass classification problem predicts a hashtag for a given tweet, where every hashtag is considered as a distinct class label for a tweet. However, two main challenges face this method. First, the hashtag popularity follows the power low distribution, where the majority of hashtags are low in popularity and fewer numbers of hashtags are high in popularity. By training a classifier on an imbalanced data set, the classifier is more biased toward the more popular hashtags. Hence, only a small number of hashtags need to be handled. Westen et al. [10] proposed the #TAGSPACE model, which is a convolutional neural network (CNN) that only considers the top 100 000 popular hashtags in the classification. Gong et al. [35] worked on the local and global channels of tweets. The global channel encodes the feature vectors of every word in the tweet using a CNN, while the local channel encodes only the feature vectors of the significant words using an attention mechanism. Their best result is a recall value of 0.36. Li et al. [3] designed a long-short term memory-recurrent neural network (LSTM-RNN) as a classifier to predict the hashtags. Although this method achieves the highest hit rate (0.86) over the other methods mentioned earlier for the top-10 hashtag recommendations, the number of hashtags is limited to 20, which restricts the real application of the method.
An alternative approach to tackling hashtag recommendation is the content similarity of tweets. A content similarity of tweets method was first developed for hashtag recommendation by Zangerle et al. [14], where TFIDF is used to represent tweets as one-hot vectors. For each tweet in the test set, all the candidate tweets with TFIDF vectors similar to that of the query tweet are retrieved. The corresponding hashtags of these candidate tweets are then ranked, and the top-
More recently, word and sentence embeddings have superseded TFIDF vectors for content similarity measure in hashtag recommendation [7], [16], [39]. Tran et al. [16] and Kou et al. [7] recommend hashtags based on the weight of combining multiple features. In [16], content similarity, collaborative filtering of users who use similar hashtags, and the user interaction strength features are incorporated. The content similarity of tweet embeddings is calculated by summing up the embeddings of the tweet’s words generated from word2vec. The interaction strength of a user is calculated by weighting the mentioned users from a set of followers. In [7], three features are integrated: content similarity, collaborative filtering of users with similar hashtag usage, and topical interests.
In [20], we studied the hashtag recommendation problem by applying a content similarity method on communities of users detected based on the following relationships. We found that the quality of recommendation varies significantly among communities detected using CPM and BFS.
C. Ranking Methods of Hashtags
Given a list of candidate hashtags that have been identified, not all of them should be put forward for recommendation; instead, these hashtags should be ranked so that only the top few of them are recommended to the user. The aim of the hashtag ranking process is to reorder the candidate hashtags extracted by one of the hashtag recommendation methods described in the previous subsection. The reordering of importance values is usually from high to low. In the literature, candidate hashtags have been ranked based on their popularity, relevance, and recency. The definition of these ranking methods is listed below:
Hashtag Popularity: The term popularity is defined to be the number of times a hashtag has been adopted in the set of tweets being considered [14] or has been used by a given set of users [15], [40].
Global Hashtag Popularity: Different from the ranking method mentioned above, the global hashtag popularity value is calculated over the whole data set [14].
Tweet Hashtag Relevance: Hashtags are ranked in the order of their corresponding tweets’ similarity scores between a query tweet and those in the repository and the decreasing order. Hence, the most relevant hashtags are near the top for recommendation. The similarity scores can be computed using a lexical tweet similarity [14] or a semantic tweet similarity function [39], [41].
Hashtag Recency: This ranking method uses the age (number of days) of each hashtag so that the most recently used hashtags are ranked near the top for recommendation [42].
Methodology
In this section, we first provide a brief review of the document representation and similarity measure. We then describe our proposed framework in detail.
A. Brief Review of the Basics
1) Tweet Representations:
An important step in hashtag recommendation is tweet representation. In NLP, the feature vector representing a document can be expressed in terms of the terms in the document. By treating tweets as small documents, the same representation for documents can be used for tweets. As reviewed in the previous section, the word2vec is a popular and effective way to represent terms in a document. This gives another possible way of representing tweets. Just like other NLP tasks, all the tweets must be preprocessed, such as removing the stop words and punctuation. In the remainder of this subsection, we will briefly outline two different ways for representing tweets: term frequency-inverse document frequency (TFIDF) and mean of word embeddings (MOWE).
a) Term frequency-inverse document frequency (TFIDF):
TFIDF is a statistic that reflects how important a term is to a document in a collection of documents. If there are \begin{equation*} {\mathbf{w}}_{d} = TF_{t,d} \cdot {\mathrm {IDF}}_{t, D}\tag{1}\end{equation*}
\begin{equation*} IDF_{t, D}=1 + \log \left ({\frac {|{ D}|}{1 + DF_{t}} }\right)\tag{2}\end{equation*}
b) Mean of word embeddings (MOWE):
Google’s word2vec model locates words that have similar meaning closely together in the vector space. In this article, we adopt the skip-gram word2vec model and train it on our data rather than using the Google’s pretrained model. This is because of the difference in vocabularies between proper documents and tweets. When a tweet \begin{equation*} {\mathbf{w}}_{d}= \sum _{i=1}^{T} {\mathbf{v}}_{i} / T\tag{3}\end{equation*}
2) Similarity Measures of Tweets:
We use the cosine similarity function to measure the nearness between two tweet feature vectors \begin{equation*} sim({\mathbf{w}}_{q}, {\mathbf{w}}_{d})=\frac {\boldsymbol{ w_{q}} \cdot {\mathbf{w}} _{d}}{{\| {\mathbf{w}}_{q} \|}_{2} {\| {\mathbf{w}}_{d}\|}_{2}}.\tag{4}\end{equation*}
B. Proposed Framework
The raw input twitter data have the entries of users, their incomplete profiles, their possible followers and followees, etc. There are also inactive and new users who have no social relationships with other users. These users are not of interest for further analysis. The very first step of our framework is, therefore, to construct a network. As the focus of this article is to investigate the influence of communities on hashtag recommendation. The second step is to detect communities from the network. The third and final step is hashtag recommendation.
1) Network Construction:
Online social networks inherently have a graph structure and are best represented as graphs.
Given a set of users
The four types of networks studied in our framework are outlined below. To simplify the graph structure, we build these networks as undirected and unweighted graphs. Hence, if
a) Network construction based on the following relation:
A followership edge
b) Network construction based on mention:
An interaction edge
c) Network construction based on hashtag:
As the name suggests, this network connects users who share one or more common hashtags. Details of the network construction procedure are outlined in Algorithm 1. The sets of historical hashtags of all the users are put together for training a word2vec model. After that, the trained model can be used to map hashtags into feature vectors. The algorithm goes through each user
Algorithm 1 Network Construction Based on Hashtag
Initialize:
Train a word2vec model on
for each user
Retrieve all the hashtags
Retrieve the hashtag feature vectors
/* Compute the profile vector of user
end for
for
if
end if
end for
return
Hashtags are usually not proper English words. They can be the concatenation of several words, short-hands, and/or replacements of words by numbers. It is, therefore, more meaningful to compute the word2vec vectors of hashtags directly, as done in Algorithm 1.
d) Network construction based on topic:
The pseudocode of the network construction process is given in Algorithm 2. We use LDA [25] to extract the topics discussed by every user in the data set. LDA does not work well on short tweet. To overcome this deficiency, we follow [38] to aggregate tweets posted by a user to extract his/her topics. First, all the tweets are preprocessed to eliminate all the stop words, keeping only nouns. Next, an LDA model
Algorithm 2 Network Construction Based on Topics
Initialize:
Initialize: Pre-process all tweets
Train an LDA model on
for each user
for
end for
/* Compute the profile vector of user
end for
for
if
end if
end for
return
To find users who share similar topic terms and, therefore, should be connected by an edge, the cosine similarity is used to pairwisely compare the
2) Community Detection:
In our framework, we select the three representative community detection methods, such as CPM, Louvain algorithm, and LPA to analyze their influence in hashtag recommendation. For the modularity-based methods, we choose the Louvain algorithm instead of Newman’s algorithm because of its fast speed in constructing the communities.
Given the network
3) Hashtag Recommendation:
Algorithm 3 shows the pseudocode of the top-
Algorithm 3 Top-y
Hashtags Recommended for a Given Query Tweet
/* Retrieve tweets similar to
for
if
end if
end for
/* Call algorithm
/* Variable
if length(
else
end if
return
a) Training the Word2Vec model:
Prior to the hashtag recommendation process, a Word2Vec model must be trained using all the tweets in the data set as described in Section III-A1. This process gives
b) Repository set:
Tweets posted by a community are defined as the MOWE feature vector
c) Ranking methods of hashtags:
We adopt two hashtag ranking methods in the framework (see Section II-C): hashtag ranking based on tweet relevance and hashtag ranking based on tweet popularity. Given a list of candidate hashtags that have been identified for recommendation, the hashtag ranking algorithm may need to look at: 1) the tweets’ similarity scores and/or 2) all the users in the community being considered. Thus, additional arguments such as the specific community
C. Evaluation Methods
We evaluate the performance of hashtag recommendation for a given tweet, a given community, and a set of communities.
1) Hashtag Recommendation Performance of a Tweet:
Three measures are commonly used to evaluate the performance of hashtag recommendation: hit rate, precision, and recall. The hit rate measure [15] gives the ratio of the number of hits to the number of attempts. Precision and recall give a ratio of matched hashtags corresponding to the top-
2) Hashtag Recommendation Performance of a Community:
In order to calculate the hashtag recommendation performance of a given community
Algorithm 4 Average Hit Rate of a Given Community
hit
/* Find common hashtags between
for
if
end if
end for
return
3) Hashtag Recommendation Performance of a Set of Communities:
The process of evaluating a community is repeated over all the
Experiments and Results
In this section, we describe the data set, the preprocessing steps, the parameter settings, our experimental setup, and the results.
A. Data Set, Preprocessing, and Parameter Settings
The data set we use is the Data set-UDI-TwitterCrawl-Aug2012 [22] collected during the period from 2011 to 2012. The data set comprises: 1) a user profile folder, which is not used in this work; 2) a following network file, which we use directly for the following network; and 3) a folder containing the tweets of all users. We use this last folder to generate the hashtag, mention, and topic networks, as explained in Section III-B1.
In the preprocessing stage, we cleaned all tweets by removing stop words, punctuation except the # symbol, and hyperlinks. All tweets and hashtags were transformed into a lower case letter. Any duplication in tweets was removed. The “rt” word was also removed from retweets. In order to be able to test the performance of hashtag recommendation and avoid the randomness of the results, we use the average results of the fivefold cross-validation of each community. For the test set, the ground truth hashtags removed from query tweets are used only for evaluation.
The word2vec model described in Section III-A1 is trained on all the 9 241 235 hashtagged tweets from the data set mentioned above. This training process is performed before the network construction stage so that regardless of the network type (following, hashtag, mention, or topic), the same word2vec model is used to encode words. The hyperparameters used in the training process are: the context window for words is set to 5; the number of epochs is set to 30, words that appear less than 3 times are ignored; and the dimension of word embeddings is set to 300. Accordingly, the dimension of the sentence embeddings is also set to 300. The training produces 1 010 768 unique words in the word2vec dictionary.
We follow [20] and set the threshold
B. Experimental Setups
We conduct our experiments in several stages: network construction, community detection, and hashtag recommendation.
Stage 1 (Network Construction):
Table I shows the statistics of the four constructed networks mentioned in Section IV-A. We can observe that the networks based on following and mention are much larger than the other two networks, with a lot more users and edges in the graphs. As the
threshold decreases for the hashtag and topic networks, it allows more users to connect to each other, thus resulting in more densely connected graphs. We setτu to 0.5 as, at this value, the numbers of edges of the hashtag and topic networks have similar magnitudes as the number of edges in the mention network. Due to hardware constraints and to maintain the consistency across all experiments, we reduce the four networks constructed above to smaller subnetworks by taking only their first 300 000 edges.τu Stage 2 (Community Detection):
Table II shows a comparison between the three community detection algorithms and the aforementioned subnetwork graphs in terms of the number of generated communities and the total number of tweets posted by users in all communities. Although all the networks are restricted to the same number of edges, the numbers of generated communities and the numbers of tweets greatly vary.
Stage 3 (Community-Based Hashtag Recommendation):
At this stage, we apply the community-based hashtag recommendation explained in Algorithm 3 on communities detected from the four subnetworks formed in the second stage using CPM, Louvain algorithm, and LPA. We investigate how the performance of hashtag recommendation is affected by the four factors: 1) the features used to represent tweets; 2) the algorithms used to construct the network; 3) the algorithms used to detect communities; and 4) the ranking methods of candidate hashtags. These four factors are closely related to each other, as demonstrated below.
C. Results
Figs. 1 and 2 show the density curves of the hashtag recommendation performances
Distributions of the number of communities against the average hit rate value for communities generated by CPM, the Louvain algorithm, and LPA. These distributions were generated from the
Comparison between two hashtag ranking methods: tweet hashtag popularity and hashtag relevance. The distributions show the number of communities against the average hit rates for communities generated by CPM, Louvain algorithm, and LPA using the TFIDF features (columns 1 and 2) and the MOWE Word2Vec features (columns 3 and 4). The text
The followings are the results of how the performance of hashtag recommendation is affected by the four factors:
1) Features Used to Represent Tweets:
For the first factor, we compare the performance of hashtag recommendation when the feature representations of tweets are encoded using TFIDF as explained in (1) and using MOWE as explained in (3). In Fig. 1, we observe that the performances of the community-based hashtag recommendation frameworks integrated with MOWE (represented in the last two columns) are better than the performances of the frameworks integrated with TFIDF (represented in the first two columns).
2) Algorithms Used to Construct the Network:
The second factor in this comparison is the effect of the network type. By looking at the mean values of the density curves reported in the subplots in Fig. 1, we find out that communities detected from the network based on hashtag mostly tend to have a higher average hit rate performance than communities generated from other networks when the top-
3) Algorithms Used to Detect Communities:
The third factor in this comparison is the effect of the used community detection algorithm on hashtag recommendation. According to the previous observation, we focus here only on the community-based hashtag recommendation framework where the communities are extracted from the network based on hashtag, the tweet representation is MOWE, and the ranking method is the tweet hashtag popularity. Table III shows the division of the average hit rates of communities into quintiles for the three community detection algorithms for the top-10 hashtag recommendation. From this table and Fig. 1, one can see that the hashtag recommendation performance varies with respect to the community detection algorithms. Some communities achieve high performances while other communities perform very poorly. We explain our results below.
Result 1:
The two distribution curves of communities using the MOWE feature vectors in Fig. 1 (rows 1 and 2, column 4) show that the communities detected by CPM and Louvain algorithm for the hashtag-based network achieve similar average hit rate of 0.43. For the communities detected by the LPA (row 3, column 4), the average hit rate is 0.27 only.
Result 2:
In order to distinguish between CPM and Louvain algorithm, we focus on the percentage of communities with high average hit rate performances. Our results show that the Louvain algorithm performs slightly better than CPM and much better than LPA with 5.40% of the communities achieving higher than 0.8 average hit rate. However, while many Louvain communities perform well in achieving high average hit rates, there is also a large percentage of communities sitting on the other end of the spectrum. As shown in Table III, about 13.18% of Louvain communities have less than 0.2 average hit rate.
Result 3:
Considering the
and4th quintiles together, our results show that 20.67% of CPM communities achieve their average hit rates above 0.6. This high percentage value shows that CPM communities perform slightly better than those detected by the Louvian algorithm (20.31%) and much better than those from LPA, which has only 3.39% communities achieving the same range of average hit rate.5th
4) Ranking Methods of Candidate Hashtags:
We compare the performance between the tweet hashtag popularity and the hashtag relevance ranking methods within the community-based hashtag recommendation. Fig. 2 shows that the hashtag recommendation is affected by the choice of the ranking method. For the TFIDF, the performance of the hashtag recommendation is much better when the hashtag relevance ranking method is used. The mean value of hashtag relevance is higher than the mean value in the tweet hashtag popularity. On the other hand, the performance of hashtag recommendation is more superior when the tweet hashtag popularity is used as the ranking method in MOWE. By and large, the best hashtag recommendation performance among the distribution plots is the model that utilizes the MOWE with the tweet hashtag popularity as the ranking method.
D. Processing Time
All the processes described in the previous sections are implemented using Python on an iMac equipped with Intel Core i9, 8 cores, and 64 GB of memory. Table IV shows the computation time of each process in our framework. The process that is the most time-consuming is the network construction. It takes 505.989 s to construct 1000 edges of the hashtag-based network. Although all the three community detection algorithms used in this article are fast, detecting communities using CPM takes less time than using Louvain and LPA. As for training the Word2Vec model, it takes 0.524 s to learn the vectors of 1000 tweets. If no new hashtags are introduced after the above steps have been performed, the word2vec model does not need to be retrained. In that case, the time taken to recommend a hashtag (
E. Comparison Between the Community-Based Hashtag Recommendation and State-of-the-Art Methods
Recall that, in this article, we measure the performance of hashtag recommendation of a set of communities. Our community-based hashtag recommendation method performs the best when: 1) the communities are extracted using CPM from the network based on hashtag; 2) the tweet representation method is MOWE; and 3) the ranking method is Tweet Hashtag Popularity. It is described in Section IV-C that 20.67% of the communities achieve their average hit rates above 0.6. However, in this section, Table V shows the average hit rate, precision, and recall of all communities using our method in comparison with the state-of-the-art methods. Our method gives the highest average hit rate but less precision and recall, where Kowald et al.’s method is ahead of all methods.
Discussions and Limitations
Our results in the previous section show that the CPM communities in the hashtag-based network outperform other communities detected in other networks for hashtag recommendation. Being a method that detects cliques, it seems that CPM associates users more strictly, and so, the network based on hashtag contains smaller communities. With users who are likely to use the same hashtags are grouped together, higher average hit rate can be achieved for each community and the whole network. It is, therefore, not unexpected that CPM communities for hashtag-based network perform the best in hashtag recommendation. This observation is compatible with what Darmon et al. [23] reported in their research that users grouped from a hashtag-based network are densely connected. As our superior results are achieved by integrating MOWE, it indicates the importance and efficiency of the semantic features of the tweets in hashtag recommendation. Semantic features, such as MOWE, are represented with less dimensions. Using TFIDF, on the other hand, the memory limit hinders the model’s ability when the number of repository tweets is very large as the vectors become very sparse. In addition, recommending hashtags by integrating TFIDF is very time-consuming. The ranking method based on tweet hashtag popularity outperforms the ranking method based on hashtag relevance.
There are two limitations in our implementation that affect our research findings reported above. These limitations are summarized as follows.
It is difficult to generate hashtags that exactly match the ground truth hashtags. For example, if the recommended hashtag is #mba but the ground truth hashtag is #womanMBA or if the recommended hashtag is #travel but the ground truth is #holiday, then both cases yield a zero hit rate even though, in terms of meanings of the hashtags, both examples should be counted as excellent hashtags to recommend to the user. This limitation is the reason why the hit rates in our experiments are rather low. One possible solution is to develop semantic measures that find the closeness of the semantic meaning between hashtags to evaluate the recommendation.
The pretrained word and sentence embedding models require that all the words already exist when inferring the feature vector for a given tweet. This requirement is not realistic for real-world scenarios, thus making it impossible for hashtag recommendation methods to readily work on new tweets. There is no clear solution to overcoming this limitation except for periodically and regularly retraining the system to learn the new embedding vectors.
Conclusion and Future Work
Hashtag recommendation is a very challenging problem within social media applications. In this article, we have presented a community-based hashtag recommendation framework that can help researchers to investigate factors influencing hashtag recommendation. Our community-based hashtag recommendation framework identifies hashtags, which have been shared with other community members. We have presented this study on different aspects of the community-based hashtag recommendation framework: network construction, community detection algorithm, tweet representation method, and ranking method. Our experimental evaluation confirms that different degrees of social relationships affect the performance of hashtag recommendation. Our results also show that the hashtag recommendation performance is better when it is applied on the communities detected using CPM and extracted from the network graph based on hashtag. In addition, for hashtag recommendation, it is more suitable to use MOWE rather than TFIDF to represent tweets.
For our future work, we will investigate the association of the four networks mentioned in this article and their combined effect on the performance of hashtag recommendation. Furthermore, we will work on developing a hashtag recommendation measure that can evaluate hashtags based on their semantic similarity.