NNowadays, Social Media can be considered as the main source of communication between people. It is much easier to plan a trip, discuss a school project, work, or simply talk with anyone throughout a few commands on the cellphone. Furthermore, the main app currently used to communicate is Whatsapp, or, as some Brazilians say, the Zap.
When using Whatsapp, have you ever had the feeling that you are the only person that talks in a group, or that a particular subject is always the focus, or even wondered the hour and day that people talk the most? If you had already thought about any of these questions and wants to check them, you came to the right place.
In this article, we present some interesting visualization of the data from a group from Whatsapp. You can do the same with any group that you want to check. Besides, we extended the excellent work of Saiteja Kura by adding anonymous features to the participants of the Whatsapp group and by performing some few adaptations in the code. After reading this article, feel free to read his work as well.
The code used to generate all the figures can be found at Github.
The first step is to export the chat of the group in a txt file. To do this, you must go directly into your Whatsapp, select the group that you want to analyze and then click on “Export chat”. Then, you may email to yourself the file with the messages. A sample of the resulted file is shown in the figure below:
The file contains basically four information for each message: date, time, name or number of the person, and the content of the message. If you have the contact of the person who sent the message saved on your cellphone, the name saved will be shown; otherwise, it will be the number.
The next step is to create the Dataframe based on the txt file. Initially, we want to separate each line of the file into six columns: date, time, author, message, emoji, and urlcount. Taking into account the example above, we have the following result:
Is this application, we developed a function to replace the name of the person with a number for privacy issues, so it will not be possible to know who sent each image. The code to generate this anonymous feature is shown below:
Basically, each person will have a unique number associated. Then, we replace the name of the person with the chosen number. By running the Data Preparation subsection of the code, you will build a dataframe as shown in Figure 2.
Before performing any type of analysis, we must filter some messages that are related to image, video, stickers, and deleted messages. In this article, we won’t analyze them, thus, they must be removed. This filtering must be performed because an image message, for instance, is represented as “ imagem ocultada” (which means “occulted image” in English) regardless of the image that was sent. The data cleaning is performed in the Group Wise Stats section of the code.
To show some results, I decided to use the Whatsapp group of my master research project, which has 4 researchers including me. This group exists since February of 2019, so there are almost 2 years of content to be analyzed.
The first analysis that we are going to present is related to the emoji most used by the participants, as pointed out by the pie charts below.
We can see that Authors 1 and 4 use a wide variety of emojis, where the three most used are: 😂, 👍🏻 and 👏🏻. Author 2 does not use “laughing” emojis that much and mainly uses both 👍🏻 and 👏🏻. Finally, Author 3 can be defined as the laughing guy, since he mainly uses 😂 emoji (89.9%).
These results can be seen in the Emoji stats section of the code.
Number of messages per day
Another interesting feature is to visualize the number of messages per day, so we can trackback important events that occurred in the chat group. The figure below plots the number of messages sent considering each day since the beginning of the group:
We can see three major dates where there were many messages: May 5, 2019; October 8, 2019; March 22, 2020. After seeing these results, I got very curious and decided to go back and see what happened on these days. It was very fun to remember all these occasions and I will summarize what happened.
May 5, 2019: In this day, we were working on submitting 4 papers in a national congress. So we keep talking about finishing some demands and our personal information to register all the papers.
October 9, 2019: We were organizing a congress in our local university, so we kept pretty busy too.
March 22, 2020: Covid-19 started to outbreak in Brazil, so we kept talking about how things were going to change in our lab and how it would affect our lives.
The time when participants are more active
Finally, the last graph that I will present is related to the time where the participants sent the most number of messages. In this case, it was implemented considering only the hour and minutes, i.e., we disregarded the seconds. This modification allows us to have a better understanding of the time where the messages were sent, since considering the seconds may restrict too much the number of messages.
In this case, we can see that the group is most active during the morning, around 10 to 11 am, which is the time where most of us were in the university, but not in the lab. Also, at this time, we generally combine to meet and have lunch together.
This article presented a fun and interactive way for you to analyze the messages of any group from Whatsapp. The idea and the majority of the code were taken from the article of Saiteja Kura, but we also added some new features, which mainly include bringing anonymous features for the authors and changing the way to represent the date.
Feel free to try making the same thing with any group of yours! The code can be found on my Github too.