talking emoji: Punctuation in tweets compared, with R: ! 👏 👏 👏

We might think in chats the full stop sign were needed less than in written texts, as the end of a sentence may coincide with the end of a statement, sufficiently defined by the "send" button.

The ellipsis ("...") instead could, as a means for being fast and allusive, when communicating within a social group, be more frequent.

First try: looking for tweets with full stops

Searching tweets with full stops, we might get a first impression of the distribution of punctuation marks. I submitted tweet researches (22-28/10 2022, n =1000) in German, in Italian, and in Polish

German

From the general feature frequency table,

textstat_frequency(matrix, n=20)

feature frequency docfreq

. 783 427

Does this mean that among 1000 tweets searched with keyword = ".", only 427 documents really contain the mark? Something strange is happening here, in the punctuation signs count (see below, "technically").

At least, as a result, we clearly see: the full stop sign is still there. by far not all the "."s are absorbed by "...".

The rest of the table gives an impression of the situation. Nearly a fourth of the "." tweets are making use of the ellipsis sign as well, usually once in a single tweet.

, 568 354

: 502 412

rt 350 350

… 227 225

In Italian, full stops are less frequent, and so are "...", although the relation "."/"..." is quite similar (783/227 = 3.45 against 537/195 = 2.75).

. 537 281

: 431 375

, 345 220

… 195 183

! 121 77

In Polish, we have less full stops (500) than colons (555), while

“…” does not appear among the first twelve.

Obviously, having looked for tweets with ".", we do not get a view on the real frequency of full stops in tweets.

A little surprise, though, could be seen when considering skipgrams (4, 2:4).

In German, the most frequent ones are

1 \U{01faf6} \U{01faf6} \U{01faf6} \U{01faf6}

2 🌹 🌹 🌻 🌻

In Italian, the first one is

! 👏 👏 👏 194

And in Polish, we see

🌱 ✨ 💚 ✨

The scene is dominated by emojis. In the Italian result, it may even seem the exclamation mark was soaked up by emojis, becoming an emoji by itself.

For further investigation, I will not use R, but Python, because I prefer controlling directly what we are counting.

Technically

Search command

fund <- search_tweets(".", n=1000, retryonratelimit = TRUE, include_rts=TRUE, lang="de")

Punctuation mark count

keeping <- c(".","...",",","!","?",":","-",";")

nmatrix <- tokens_select(fund_toks, keeping, selection = "keep")

schau <- dfm(nmatrix)

textstat_frequency(schau)

kwic search

kontext1 <- kwic(fund_toks, ".", valuetype = "glob", window = 10)

kontext2 <- kwic(fund_toks, pattern= ":", window=10)

kontext3 <- kwic(fund_toks), pattern="...", window=10)

The last command does not give any results, as I have posted on stackoverflow, without getting response.

Stackoverflow post

talking emoji

German lecturer with Ph.D, ..., I started using Python and R, .and stumbled about emojis ...

Friday, October 28, 2022

Punctuation in tweets compared, with R: ! 👏 👏 👏

No comments:

Post a Comment

Image flow, image row. Understanding emojis? with Vilém Flusser

search

Report Abuse