Dissertation: Experiment 1

Is Information processed during Reddit conversations?

Part I of my experiment was to explore if Reddit was the site of information processing. If information processing is taking place, I hypothesise that there should be an effect on information richness. As an indicator of information richness, I look at the ratio of common words (And, the, but, or, yet) to uncommon words (scientific, CRISPR, methodology, information) to see how information rich conversations are.

For my experiment, I scraped the first 1000 words of each of the 50 reddit conversations and the last, running an analysis of common words to uncommon words. I found that the mean number of uncommon words in the first 1000 words was 524. The mean number of uncommon words in the second 1000 words was 506. To see if these results are statistically significant, I ran a T-test. My alternate hypothesis was that whether the words examined are the first 1000 or last 1000 should have an affect on the mean level of uncommon words. My null hypothesis therefore is that there would be no hypothesis. With t t statistic of 4.58 and a P value of < 0.001 we reject the null hypothesis and conclude that whether or not the words are from the beginning or end of a post does have a correlation on the level of uncommon words.

Reddit Scraping Script

import praw
person = input(‘Enter code: ‘)
reddit = praw.Reddit(client_id=’xxxxx’, client_secret=’xxxxxxx’, user_agent=’Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.1 Safari/603.1.30′,)
submission = reddit.submission(id=person)
submission.comments.replace_more(limit=0)
for comment in submission.comments.list():
print(comment.body)

Link to the Word Analysis Tool I Used: 

http://www.textfixer.com/tools/online-word-counter.php#newText2

Posts Analysed and Uncommon Words

Global Warming First 1000 Words Last 1000 Words
5hwhlp 536 520
1umk34 566 515
5yfc4y 522 493
27di3k 486 503
616x8p 501 526
5e2gq7 516 504
4ktef6 542 503
3a4s9y 552 518
1nq1kt 508 501
5m11tu 500 500
GMO
62e4cm 542 517
g3gc4cm 488 508
3ggz62 514 514
2j0gis 513 502
2vdrk1 516 524
4qzxq1 530 467
34thm6 525 514
2af33s 516 486
4p16rf 505 485
31kqoq 497 487
Vaccine
57mfgu 561 506
3lahv8 521 518
2lou8v 553 494
3lahwa 451 489
31akrv 510 448
5mlz2h 459 478
5oxpqt 523 467
1z35k1 491 492
48m2wt 490 487
5066yp 581 473
AI
5g3ezx 492 501
4brgvm 518 480
5xdou5 497 482
5jjy0k 485 471
5ay137 487 486
45l03x 470 445
59thrj 533 467
6cfm24 514 493
61rmzt 507 503
4j5q9d 549 535
CRISPR
4fpvqv 496 487
5rd5f1 529 488
5ymfbu 525 494
6cral1 509 547
4knox6 531 508
4p89wd 487 490
5kbaxt 507 509
5kaf3w 516 498
579z69 535 503
5d55dj 605 545

Experiment Descriptive Statistics and T-Test

Mean 516.14 Mean 497.42
Standard Error 4.117559214 Standard Error 3.005856868
Median 515 Median 499
Mode 516 Mode 503
Standard Deviation 29.11554042 Standard Deviation 21.25461775
Sample Variance 847.7146939 Sample Variance 451.7587755
Kurtosis 1.145727484 Kurtosis 0.506098644
Skewness 0.52722074 Skewness -0.060862828
Range 154 Range 102
Minimum 451 Minimum 445
Maximum 605 Maximum 547
Sum 25807 Sum 24871
Count 50 Count 50

 

 

Variable 1 Variable 2
Mean 516.14 497.42
Variance 847.7146939 451.7587755
Observations 50 50
Pearson Correlation 0.375985414
Hypothesized Mean Difference 0
df 49
t Stat 4.583270786
P(T<=t) one-tail 1.58809E-05
t Critical one-tail 1.676550893
P(T<=t) two-tail 3.17618E-05
t Critical two-tail 2.009575237  

 

CRISPR T Test

Variable 1 Variable 2
Mean 524 506.9
Variance 1054.222222 484.9888889
Observations 10 10
Pearson Correlation 0.553191516
Hypothesized Mean Difference 0
df 9
t Stat 1.977043764
P(T<=t) one-tail 0.039718353
t Critical one-tail 1.833112933
P(T<=t) two-tail 0.079436705
t Critical two-tail 2.262157163

AI T Test

Variable 1 Variable 2
Mean 505.2 486.3
Variance 572.8444444 586.9
Observations 10 10
Pearson Correlation 0.615966371
Hypothesized Mean Difference 0
df 9
t Stat 2.831851316
P(T<=t) one-tail 0.009831452
t Critical one-tail 1.833112933
P(T<=t) two-tail 0.019662904
t Critical two-tail 2.262157163

Vaccine T Test

Variable 1 Variable 2
Mean 514 485.2
Variance 1829.333333 396.1777778
Observations 10 10
Pearson Correlation 0.088098758
Hypothesized Mean Difference 0
df 9
t Stat 1.999078997
P(T<=t) one-tail 0.038333289
t Critical one-tail 1.833112933
P(T<=t) two-tail 0.076666577
t Critical two-tail 2.262157163

GMO T Test

Variable 1 Variable 2
Mean 514.6 500.4
Variance 245.8222222 333.6
Observations 10 10
Pearson Correlation 0.103053316
Hypothesized Mean Difference 0
df 9
t Stat 1.968428754
P(T<=t) one-tail 0.040272714
t Critical one-tail 1.833112933
P(T<=t) two-tail 0.080545428
t Critical two-tail 2.262157163

Global Warming T Test

First 1000 Words Last 1000 Words
Mean 522.9 508.3
Variance 652.9888889 113.3444444
Observations 10 10
Pearson Correlation 0.31582935
Hypothesized Mean Difference 0
df 9
t Stat 1.893568345
P(T<=t) one-tail 0.045408491
t Critical one-tail 1.833112933
P(T<=t) two-tail 0.090816982
t Critical two-tail 2.262157163

Leave a comment