German lecturer with Ph.D, ..., I started using Python and R, .and stumbled about emojis ...

Friday, October 28, 2022

Punctuation in tweets compared, with R: ! 👏 👏 👏

We might think in chats the full stop sign were needed less than in written texts, as the end of a sentence may coincide with the end of a statement, sufficiently defined by the "send" button. 

The ellipsis ("...") instead could, as a means for being fast and allusive, when communicating within a social group, be more frequent. 

First try: looking for tweets with full stops

Searching tweets with full stops, we might get a first impression of the distribution of punctuation marks. I submitted tweet researches (22-28/10 2022, n =1000) in German, in Italian, and in Polish

German

From the general feature frequency table,

textstat_frequency(matrix, n=20)

feature frequency docfreq 

.            783          427 


Does this mean that among 1000 tweets searched with keyword = ".", only 427 documents really contain the mark? Something  strange is happening here, in the punctuation signs count (see below, "technically"). 


At least, as a result, we clearly see: the full stop sign is still there. by far not all the "."s are absorbed by "...".


The rest of the table gives an impression of the situation. Nearly a fourth of the "." tweets are making use of the ellipsis sign as well, usually once in a single tweet.


 ,         568    354 

 :         502    412 

 rt        350    350 

…       227     225 


In Italian, full stops are less frequent, and so are "...", although the relation "."/"..." is quite similar (783/227 = 3.45 against 537/195 = 2.75).


.          537     281

:         431     375

 ,        345     220

    195     183

 !       121     77


In Polish, we have less full stops (500) than colons (555), while 

“…” does not appear among the first twelve. 


Obviously, having looked for tweets with ".", we do not get a view on the real frequency of full stops in tweets. 


A little surprise, though, could be seen when considering skipgrams (4, 2:4). 


In German, the most frequent ones are


1 \U{01faf6} \U{01faf6} \U{01faf6} \U{01faf6} 

2 🌹 🌹 🌻 🌻


In Italian, the first one is

! 👏 👏 👏 194 


And in Polish, we see

🌱 ✨ 💚 ✨


The scene is dominated by emojis. In the Italian result, it may even seem the exclamation mark was soaked up by emojis, becoming an emoji by itself. 


For further investigation, I will not use R, but Python, because I prefer controlling directly what we are counting. 


Technically

Search command

fund <- search_tweets(".", n=1000, retryonratelimit = TRUE, include_rts=TRUE, lang="de") 

Punctuation mark count

keeping <- c(".","...",",","!","?",":","-",";")

nmatrix <- tokens_select(fund_toks, keeping, selection = "keep")

schau <- dfm(nmatrix)

textstat_frequency(schau)


kwic search 

kontext1 <- kwic(fund_toks, ".", valuetype = "glob", window = 10)

kontext2 <- kwic(fund_toks, pattern= ":", window=10)

kontext3 <- kwic(fund_toks), pattern="...", window=10)


The last command does not give any results, as I have posted on stackoverflow, without getting response. 

Stackoverflow post







No comments:

Post a Comment

Image flow, image row. Understanding emojis? with Vilém Flusser

Little pictures in between We used to send letters to each other. We used to write down, letter after letter, word by word,  what we hoped w...