German lecturer with Ph.D, ..., I started using Python and R, .and stumbled about emojis ...

Sunday, February 5, 2023

Heat Stains on Twitter (first observations)

In order to find spots where twitter gets hot, with people writing down emphatic statements or opinions, we could start with emojis, as the creation of hashtags is not predictable, while the number of emojis is relatively limited. 

Yet, by the most common packages for Python and R we still get between 1400 and 5000 emojis. These still are far too much, for me, to loop through these lists and make a tweet search request for each of them. I decided to begin with my own list of emojis which will be enriched.

The emojis used in a very lively Italian gossip group ("#jerù"), where often anger about the stars is expressed, will be the first ones, already 34. For the beginning, I will keep within the realm of Italian tweets. As the case of Polish 😆 teaches, the use of emojis largely depends on cultures defined by languages. 

My first Italian list is : 👀 🐻 😅 🌜 👑 🌈 ☕ 🤍 😍 🥰 ♥️ 🤦 ❤️ 🌲 😂 🦋 📸 🤣 ⛰️ 💜 💚 ♀️ 😁 🔥 💖 💗 🙃 😋 🕛 😎 😭 😜 🌺 ✨.

Looking for signs of excitement, I will keep record of tweets searched with my emoji list only if on 100 tweets I get more than 15 exclamation marks. Angry and happy people love doubling and tripling these signs. 

In a tweet search done on November 12, 2022, the most prolific emoji was 🤣

In 100 tweets with this emoji, it appeared 231 times. Exclamation marks: 27 on 9234 characters, with eight double "!" and three "!!!!". 

The most frequent hashtags were 

"#taleequaleshow" "#merito" "#konpetenza" "#ottoemezzo" "#calenda" "#novax", 

i.e. two about political tv shows, two are hashtags used by a journalist (book title: “Damned Pacifists”), two about politics, with the famous hashtag "novax". Maybe this could indicate the right way to find angry people. Should we move on following the hashtags?


A hashtag "novax" search (07/ 01/ 23, n=100) results in only 15 emojis,

💯 🔝 ⚧️ 👋 👿 💉 👇 🐧 💳 💩 🤣 🤡 🏳️ 🇪🇺 🪳

with only four exclamation marks. Where has the excitation gone? We see ten times 💩, following 35 🤣.


The same for "#calenda" (an Italian politician). A search results in 18 exclamation marks, with three times double "!" and one "!!!". But, according to R and the relative emoji package, only five out of 100 tweets contain emojis, 44 in all (26 unique). Five ➡️ , five 🇮🇹, five 🇪🇺. Is it that in politics, Italians use only few emojis? This could be due to the higher age of people interested in politics here.


Let us try with other "controversial topics". A "#Salvini" search (right wing politician) results in 11 on 100 tweets with emojis. 22 🤡 , 18 😂, 13 👏 and five 🇮🇹. Other "hot" topics like #bce (European Central Bank) or "nosbarchi" (no acceptance of refugees) give similar results. People get angry about politics, but do not use emojis in these fields.


Still, by searching with emojis we can find angry people. The clown 🤡 is used when people find laughable something or somebody. Again, in Italy, we get top hashtags about soccer and about the Reality Show Big Brother VIP.

We see

feature frequency     rank 1 🤡 432      1 2 😂 72      2 3 🤣 38      3 4 🤮 26      4 5 😅 16      5 6 👇🏻 12 6 7 💩 12 6

People are angry. Searching for anger with emojis around could be helpful.

The number of unique emojis here is only 23. 
🤬 🤣 ✅ 🤦 😏 😱 😡 ⚫ 🖕 😵 💸 🤡 😂 🤢 💩 🤮 💫 ⏩ 😁 📹.
Maybe this should be the basic list of emojis, on the search of angry people, i.e. heat stains on 
twitter. 

With the middle finger, for example, tweets supposedly are quite aggressive. With an apisearch (n=10) on November 12, 2022, we receive antisemitic content, but only two exclamation marks. This number is again growing with 🤮, becoming 7 in ten tweets, and 2 double "!". 😡 gives eight exclamation marks (two double), tweets about animals rights. The 💣 brings tweets about conspiracy theories, without any "!"  
😡 search gives 42 emojis in 10 tweets, namely
🤬 🤣 💯 👇 😠 💪 🤦 ➡️ 🙏 😢 👌 ♀️ 💥 😤 👿 😱 😡 🤔 ❣️ 🇮🇹 🔻 🖕 ♂️ 🔴 👎 😈 🤞 😂 🤢 💩 🤮 😅 🥲 🤪 🥺 ‼️ 😖 ⏩ 🤨 😁 📹 ☕
The exclamation marks are here, not among the 
characters. We should getting to know emojis by the company they keep. 

Emoji diversity


For comparison, a ☕ search results in 62 unique emojis and 421 total emojis, lexical emoji diversity  .147. Lexical diversity among emojis is low in all cases, maximums are .23 with 🔥, .21 with 😎, .22 with the ❤️ and .2 with 😜. This does not depend on the simple number of emojis. (for example 254 correspond to .10, 601 to .15.). The variety of accompanying emojis rather seems to be a characteristic of the emoji itself.




technically

(ok, it is not elegant, but I had learned programming with Algol W, in 1975)

import pandas as pd

import tweepy

import csv

import emoji

import emojis

import regex

from collections import Counter

#https://stackoverflow.com/questions/49113909/split-and-count-emojis-and-words-in-agiven-string-in-python


<authentication stuff>


api = tweepy.API(auth)

with open('emojilistit22.csv','r') as mine:

    leser = csv.reader(mine, delimiter=',')

    leserl = list(*leser)

    for kw in leserl:

        print (kw)

        container = []

        tweetCount = 100

        results = api.search_tweets(kw, count=tweetCount, lang="it")

        for tweet in results:

            container.append(tweet.text)

        row_count = len(container)

        print("number of tweets ", row_count)

        filename = "exclamation_basis_" + kw + "2022-11" + ".csv"

       

        f = open(filename, 'w')

        writer = csv.writer(f)

        writer.writerow(container)

        f.close()

        zeichendf = pd.read_csv(filename)

        zeichenkette = zeichendf.to_string()

        lang = len(zeichenkette)

        print("Number of characters: ", lang)

        x = zeichenkette.count(".")

        print("Number of single points (full stops? .)", x)

prozent = x/lang*100

        print(prozent, "%")

        y = zeichenkette.count("...")

        print("Number of three points (ellipsis ...)", y)

        prozent = y/lang*100

        print(prozent, "%")

        if y>0:

            relation = x/y

            print("Relation: ", relation)

        x = zeichenkette.count("!")

        print("Number of exclamation marks", x)

        prozent = x/lang

        print(prozent, "%")

        if x>15:

            anteil = lang/x

            print (kw, "exclamation marks:", x, "on", lang, "characters", anteil, "%")

        y = zeichenkette.count("!!")

        print("Number of two exclamation marks", y)

        prozent = y/lang*100

        print(prozent, "%")

        if y>0:

            relation = x/y

            print("Relation: ", relation)

        x = zeichenkette.count("!!!")

        print("Number of three exclamation marks", x)

        y = zeichenkette.count("!!!!")

        print("Number of four exclamation marks", y)

# just having a look at the emojis, will construct an archive later

        material = str(zeichenkette)

        emoji_hier = emojis.get(material)

        print (*emoji_hier)

        zahlein = emojis.count(material, unique=True)

        zahlall = emojis.count(material)

        print (zahlein, zahlall)

 


 

Image flow, image row. Understanding emojis? with Vilém Flusser

Little pictures in between We used to send letters to each other. We used to write down, letter after letter, word by word,  what we hoped w...