Tweets Data Visualization with Circles and User Interaction

Tweets Data Visualization with Circles and User Interaction

Arief Anbiya

Here, we will talk about how circles can be utilized to create an interesting and beautiful visualization of a set of tweets. The idea is, each circle will correspond to a tweet, and we will organize all circles so that it will become an interesting static image of data visualization. Moreover, the visualization can be designed such that we can explore a tweet by clicking the corresponding circle on the static image. We will see that we can actually do this without even changing a piece of the static image. The implementation discussed here will be done using Matplotlib .

The Model

A circle will be used to represent a tweet. The color transparency of the circle will reflect the number of retweets of the tweet (could also be the number of likes, depending on your interest). Circles with strong color represent tweets with high number of retweets. Circles with high transparency are tweets with low number of retweets. Among all the tweets in the dataset, there must be a tweet that has the highest number of retweets, that tweet will be represented by a circle without transparency. So only tweets that have the highest number of retweets will have color with no transparency. All the other tweets in the dataset will have more transparency.

So far, we have only added one measurement to the visualization: the number of retweets (or number of likes). But we can utilize another property of a circle, that is the size (radius). Now what will the size represents? there are many options..but let us pick one. By default, all circles will have radius 1, but if a tweet has more likes than retweets, then we will set the circle to have radius 2. This way, our eyes can easily differentiate which tweets that have more likes than retweets.

Now how should we organize the circles? we could just scattered the circles in the plot, but that would look messy and circles may overlap each other. We could just put the circles in scatter plot (with x and y axis represent two additional measurements), but circles may also overlap each other. So, what we want is that no two circles overlapping each other. This needs us to write our own algorithm. For our case, we will use an algorithm similar to the circle packing algorithm (general example & specific example ). The position of the circles will be random but centered and will be made as compact as possible.

Here is an example with very low samples (for simplicity):

Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Visualization with small samples (5 tweets for each account)

The above image is an example of the visualization using small samples (5 tweets) from Sadiq Khan (Mayor of London), Donald Trump (President of USA), Prabowo Subianto (Indonesian Presidential Candidate), and Joko Widodo (President of Indonesia) twitter accounts. The tweets from Khan and Trump are collected on 25 Nov 2018, and the tweets from Prabowo and Jokowi are collected on 22 Nov 2018. We can easily see that tweets from Donald Trump have more reactions than the others. There is only one tweet that has more retweets than likes, and it is from Sadiq Khan. Here are the snapshots of the 2 tweets (one from Trump, and the other from Sadiq Khan):

Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
The number of retweets and likes on the image are already updated. (this image is captured on 1 Feb 2019)
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
At the time of collection, the number of likes is 0 and the number of retweets is 38. The tweet was retweeted by Sadiq Khan. (this image is captured on 1 Feb 2019)

The 5 tweets from each of Sadiq Khan and Prabowo Subianto are not famous, since the color of the circles are very close to white (very transparent).

The dataset used for the visualization consists of tweet objects, each tweet object is represented by the class tweepy.models.Status (more about this in Tweepy package). The class stores many information such as tweet text, posting time, author’s information, number of retweets, number of likes, and many others.

The Algorithm and Mathematics

In this section, we will discuss about the algorithm and the mathematics behind it. To show a circle in a plot, we can use the article . Our algorithm looks similar with a way we can order things. For example, let there be 10 overlapping circles (made of a sheet of paper) on the floor, they are all positioned very close to a point P . To organize them like the circle packing we may swipe each circle that overlaps with another circle (the swipping is such that the two circles repel each other and the displacement is small). After a number of repetition, all the circles will be packed with P is approximately at the center of the pack. This algorithm looks more alive, it makes the obvious “3-steps” algorithm above looks very stiff.

Additionally, to make our data visualization more interesting, we can add interactive element to it. The visualization can be designed such that we have two figures, one for the circles (main data viz) and the other one to show more detailed information about one tweet (textual). To be more specific: if we click on a circle, then some information about the corresponding tweet (text, number of retweets, number of likes, and posting date) will show up in the 2nd figure. This can be done by using a Matplotlib’s feature, fig.canvas.mpl_connect function.

Adding Interactivity: Tweet Info by Click

After plotting and packing all the circles, we can make each circle to work like a button. To achieve this, we can include help from the function fig.canvas.mpl_connect. The function can take two arguments, the first one is a string that corresponds to the type of interaction (in our case this has to be "button_press_event" ), and the second one is a function that will be called if an event of that type is triggered. Example:

fig.canvas.mpl_connect('button_press_event', on_click)

The above line will add interactivity to the figure fig , but limited to button press event (if you want another type of interaction: for mouse movement on the figure, use "motion_notify_event" ). The function in the second argument must be designed to have one input called event. In our case, we will name the function on_click.

Now, we must design the function on_click so that every time we choose and click a circle in the main figure, the other figure will show some details of the tweet corresponding to the circle. The presentation of information must be neatly designed. Here is an example that is currently being used for the visualization:

### fig2 is the figure for the tweet information
fig2, ax2 = plt.subplots()
ax2.set_xlim([0, 50])
ax2.set_ylim([0, 50])
ax2.tick_params(colors = (0,0,0,0))
def on_click(event):
    ax2.cla()
    x = event.xdata
    y = event.ydata
    print(x,y)
    for i in scattered:
       if abs(x-i.center[0]) <= i.radius:
           if lower_circle(x, i.radius, i.center) <= y <= upper_circle(x, i.radius, i.center):
               try:
                   text = i.tweet.full_text
               except:
                   text = i.tweet.text
               
               ax2.text(25, 35, i.tweet.created_at, fontproperties = helv, \
                        ha = 'center', va = 'center', color = 'black')
               ax2.text(25, 33, squeeze_text(text), fontproperties = helv, \
                        color = "white", ha = 'center', va = 'top', \
                        bbox = {'boxstyle': 'round', 'ec': "black", 'fc': "gray"})
ax2.text(25, 17, i.tweet.author.name + " (@{})".format(i.tweet.author.screen_name), \
                        fontproperties = helv, ha = 'center', va = 'center', color = "black")
ax2.text(25, 15, "Retweet: {}, Likes: {}".format(i.tweet.retweet_count, i.tweet.favorite_count), \
                        fontproperties = helv, ha = 'center', va = 'center', color = "black")
fig2.show()
               break
    print("done")
fig.canvas.mpl_connect('button_press_event', on_click)

scattered is a list containing many CircleObj objects. CircleObj is a user defined class that inherits from the Circle class of Matplotlib. Here is a glimpse of the design:

class CircleObj(Circle):
    
    def __init__(self, tweet, value, label, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.value = value
        self.label = label
        self.tweet = tweet
def collide(self, circles):
        colliding = []
        [colliding.append(i) for i in circles if (dist(self.center, i.center) < self.radius + i.radius)]
        return colliding
    
    #other methods

The code has circles , which is a global variable and a list of all the CircleObj objects before their order are randomized as in the scattered list. The self.tweet attribute is for the tweet object by tweepy.models.Status. The collide method is intended to return all CircleObj objects that collide with self . The full code that distribute and pack the circles with the algorithm described in the previous section is not given here. But obviously..it contains a block of code that updates the location of the circles repeatedly until they are all packed with no overlapping.

A tweet information will be plotted in textual form when the conditions: abs(x-i.center[0]) <= i.radius and lower_circle(x, i.radius, i.center) <= y <= upper_circle(x, i.radius, i.center) are both true. The lower_circle and upper_circle are functions for the equation of a circle ( given x then they will return the y value of the point ( x ,y) on the curve of the circle).

The on_click function will always be triggered when we press the mouse button anywhere on the figure, but it will only plot a tweet information when we click inside a plotted circle. Here is an example of the plotted information:

Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction

Notice that the font is not the default font in Matplotlib. You may learn about using new fonts in Matplotlib by implementing the matplotlib.font_manager. Also, notice that there is another user defined function to decorate the tweet text inside the rounded bounding box, the squeeze_text . It makes the text to have a maximum of 7 words per line and a maximum of 8 rows/lines.

Results with Larger Datasets

Implementation using 400 tweets usually take around about 400–500 iterations of checking all the circles, with one iteration usually take up half a second. If the number of tweets are larger, then the time it takes for one iteration to finish will be longer. At first iteration: all circles are very close to the origin, point (0,0), with a maximum deviation of 0.01 radius from it. Then, for each iteration after that, we give a very small swipe two each of every two overlapping circles by applying a repelling movement, two circles pushing each other away, without necessarily making them non-overlapping circles in one step. In this section, we will view two example results of the data visualization:

  • The 1st example is made using a dataset of 1000 tweets. 500 tweets are from CNN Indonesia, and the other 500 are from Metro TV. Both are collected on 21 Nov 2018. Here is the result:
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
500 tweets from CNN Indonesia and 500 tweets from Metro TV. There are two green circles with quite strong color.

We can make some conclusions from this result:

— There are 2 tweets (2 green circles) from CNN Indonesia that have very high number of retweets compared to all the others. There is not seem to be any tweet from Metro TV that has the same level of popularity as those tweets.

— We can’t say that there are more tweets with retweets > likes, or the otherwise

Although there is not much insights from the static image, we can clearly perform exploration by simply clicking circles. Moreover, the visualization is also quite beautiful.

Okay, so our 1st data analysis does not really satisfying. Next, things can get more interesting if we use a dataset with 4 different accounts.

  • The 2nd example is made using a dataset of 400 tweets from four politicians: mr. Sadiq Khan, mr. Trump, mr. Prabowo, and mr. Joko Widodo (100 tweets each). Tweets from mr. Sadiq and mr. Trump are collected on 25 Nov 2018, while tweets from mr. Prabowo and mr. Jokowi are collected on 22 Nov 2018. For this 2nd example, we will also see a video that shows the user interaction:
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction

Besides the video, here is the static image (*slightly different with the one in the video: both provide same information, but they are different in the positioning due to different calculations)

Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
400 tweets from 4 politicians. Trump’s tweets appear to be more popular than all the others, with quite a large gap.

I won’t write some conclusions about the above visualization, but one clear thing is that Trump’s tweets are very popular compared to all the other 3 politicians. Also, tweets that standout are usually controversial, here are some examples:

Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction

Final Remarks

We have seen that simple shape such as circle can be used to represent tweets in a data visualization. We can also make the visualization interactive (although still very simple). The implementation is not that difficult using Matplotlib.

This method of visualization can be extended:

  • The visualization can also be modeled using graphs: circles or tweets may be represented by vertices, and adjacency of circles may be represented by edges. Adjacency could also be designed to have a meaning (the current one does not use adjacency to give information). Also, the position of a circle can be made to represent one or more measurements. It is a waste to left a circle position without meaning.
  • A straightforward extension: we can include the number of likes to the visualization by attaching the number to the radius of the circle. Notice that we do not need the exact number to show up in the main visualization, because we can just click a circle and then the tweet info will appear in the other figure. Here is an example result (using 50 tweets from each account):
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Tweets Data Visualization with Circles and User Interaction
Visualization that includes the number of likes (represented by the size of the circle)

This writing is still far from expert level. There are many more types of complex data and mathematical visualizations that can give more insights and have more applications. I appreciate constructive comments.

Via towardsdatascience.com

Ref. https://towardsdatascience.com/@anbarief, https://en.wikipedia.org/wiki/Circle_packing, http://www.codeplastic.com/2017/09/09/controlled-circle-packing-with-processing, http://www.tweepy.org/, https://matplotlib.org/api/_as_gen/matplotlib.patches.Circle.html, https://matplotlib.org/api/_as_gen/matplotlib.patches.Circle.html, http://www.codeplastic.com/2017/09/09/controlled-circle-packing-with-processing/, https://matplotlib.org/users/event_handling.html, https://matplotlib.org/users/event_handling.html, https://union-click.jd.com/jdc?d=ZKPlW2

READ THIS:

发表评论

电子邮件地址不会被公开。 必填项已用*标注