German lecturer with Ph.D, ..., I started using Python and R, .and stumbled about emojis ...

Thursday, October 20, 2022

Doubling Emojis

In Italian Tweets, the signs of love rarely appear alone. Love prefers showing up as bigrams. Even trigrams are quite frequent. To be exact, among the (2:4)grams in 100 documents, they are dominating. Only from the fourth position onwards, we get some hashtags.

[1] "😍"

[1] 100

feature frequency rank

1 😍_😍 147           1

2 😍_😍_😍 106     2

3 😍_😍_😍_😍 71 3


Looking for the same emoji in German tweets: 

tokens_ngrams(n = 2:4)


         feature frequency rank docfreq group

1          😍_😍        26    1      10   all

2       😍_😍_😍        16    2       9   all

3   guten_morgen        15    3      15   all


Looks like a cultural difference. But, anyway, the most frequent bigrams still are these 😍 couples. 


The general rule could be: 

Searching tweets for emojis, you will get other emojis as most frequent bigrams. Other examples: 


[1] "🤥"

[1] 100

feature frequency rank docfreq group

1 🤥_🤥 165         1     31 all

2 🤥_🤥_🤥 134     2     22 all

3 🤥_🤥_🤥_🤥 112 3 12 all


[1] "😂"

[1] 100

feature frequency rank docfreq

1 😂_😂 113            1     48

2 😂_😂_😂 65         2     32


Laughter ("Rolling ...") only in 55% of the cases comes alone. 

[1] "🤣"

[1] 100

feature frequency rank docfreq group

1 🤣_🤣 143           1         45     all

2 🤣_🤣_🤣 96         2         32 all

3 🤣_🤣_🤣_🤣 62     3     13     all

On 100 documents, there are 246 🤣. Looks like an echo. 


Again, on German tweets, the tendency is weaker. 

       feature frequency rank docfreq group

1        🤣_🤣        78    1      35   all

2     🤣_🤣_🤣        42    2      28   all

3          ._.        38        3      10   all

12 ?_🤣 5 12 5 all

This circumstance will be explored later. 


Anger seems to be contagious as well. Take a look at

[1] "😡"

[1] 100

feature frequency rank docfreq

1 😡_😡 114         1         49

2 ._.         75         2             20

3 😡_😡_😡 65     3         36


And, uhm 

[1] 100

feature frequency rank docfreq group

1 💩_💩 102         1         35 all

2 ._. 73                 2         21 all

3 💩_💩_💩  67     3         27 all


Washing it away:

[1] "💦"

[1] 100

feature frequency rank docfreq

1 💦_💦 137         1     60

2 💦_💦_💦 77     2     39


A rather strange guy:

[1] "👺"

[1] 100

feature frequency rank

1 👺_👺 88         1

2-4 user names

5 👺_👺_👺 37     2

6 :_👺 20             6


Number six is the combination of the tengu or leprechaun with a colon. Consider "!_😍       131"!


We know that punctuation signs, in text messages, behave strangely. They often appear in couples or triples. We are getting used to phenomenons like "!!!!". In the meantime, the simple full stop is weakened. What if punctuation signs lost their grammatical meaning, and became emojis of their own right? 


A second hint for further research are "evoking emojis". They are not doubled, but complemented by other emojis, seemingly according to certain rules. 


The general rule, for now, would be: 

Searching tweets for emojis, you will get other emojis as most frequent bigrams. These are not always the same emojis. 


[1] "🤍" white heart

[1] 100

feature frequency rank

1 😍_😍 130 1

2 😍_😍_😍 106 2

3 😍_😍_😍_😍 82 3


Weaker:

[1] "💛"

[1] 100

feature frequency rank docfreq

1 💛_❤ 23 1 22




Technically

(14/10/2022), n=100, lang = "italian", on the first 300 emojis of the Emoji Package, simply with a for loop. Statistical methods from Quanteda. 


No comments:

Post a Comment

Image flow, image row. Understanding emojis? with Vilém Flusser

Little pictures in between We used to send letters to each other. We used to write down, letter after letter, word by word,  what we hoped w...