Is Information processed during Reddit conversations?
Part I of my experiment was to explore if Reddit was the site of information processing. If information processing is taking place, I hypothesise that there should be an effect on information richness. As an indicator of information richness, I look at the ratio of common words (And, the, but, or, yet) to uncommon words (scientific, CRISPR, methodology, information) to see how information rich conversations are.
For my experiment, I scraped the first 1000 words of each of the 50 reddit conversations and the last, running an analysis of common words to uncommon words. I found that the mean number of uncommon words in the first 1000 words was 524. The mean number of uncommon words in the second 1000 words was 506. To see if these results are statistically significant, I ran a T-test. My alternate hypothesis was that whether the words examined are the first 1000 or last 1000 should have an affect on the mean level of uncommon words. My null hypothesis therefore is that there would be no hypothesis. With t t statistic of 4.58 and a P value of < 0.001 we reject the null hypothesis and conclude that whether or not the words are from the beginning or end of a post does have a correlation on the level of uncommon words.
Reddit Scraping Script
import praw
person = input(‘Enter code: ‘)
reddit = praw.Reddit(client_id=’xxxxx’, client_secret=’xxxxxxx’, user_agent=’Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.1 Safari/603.1.30′,)
submission = reddit.submission(id=person)
submission.comments.replace_more(limit=0)
for comment in submission.comments.list():
print(comment.body)
Link to the Word Analysis Tool I Used:
http://www.textfixer.com/tools/online-word-counter.php#newText2
Posts Analysed and Uncommon Words
Global Warming | First 1000 Words | Last 1000 Words |
5hwhlp | 536 | 520 |
1umk34 | 566 | 515 |
5yfc4y | 522 | 493 |
27di3k | 486 | 503 |
616x8p | 501 | 526 |
5e2gq7 | 516 | 504 |
4ktef6 | 542 | 503 |
3a4s9y | 552 | 518 |
1nq1kt | 508 | 501 |
5m11tu | 500 | 500 |
GMO | ||
62e4cm | 542 | 517 |
g3gc4cm | 488 | 508 |
3ggz62 | 514 | 514 |
2j0gis | 513 | 502 |
2vdrk1 | 516 | 524 |
4qzxq1 | 530 | 467 |
34thm6 | 525 | 514 |
2af33s | 516 | 486 |
4p16rf | 505 | 485 |
31kqoq | 497 | 487 |
Vaccine | ||
57mfgu | 561 | 506 |
3lahv8 | 521 | 518 |
2lou8v | 553 | 494 |
3lahwa | 451 | 489 |
31akrv | 510 | 448 |
5mlz2h | 459 | 478 |
5oxpqt | 523 | 467 |
1z35k1 | 491 | 492 |
48m2wt | 490 | 487 |
5066yp | 581 | 473 |
AI | ||
5g3ezx | 492 | 501 |
4brgvm | 518 | 480 |
5xdou5 | 497 | 482 |
5jjy0k | 485 | 471 |
5ay137 | 487 | 486 |
45l03x | 470 | 445 |
59thrj | 533 | 467 |
6cfm24 | 514 | 493 |
61rmzt | 507 | 503 |
4j5q9d | 549 | 535 |
CRISPR | ||
4fpvqv | 496 | 487 |
5rd5f1 | 529 | 488 |
5ymfbu | 525 | 494 |
6cral1 | 509 | 547 |
4knox6 | 531 | 508 |
4p89wd | 487 | 490 |
5kbaxt | 507 | 509 |
5kaf3w | 516 | 498 |
579z69 | 535 | 503 |
5d55dj | 605 | 545 |
Experiment Descriptive Statistics and T-Test
Mean | 516.14 | Mean | 497.42 |
Standard Error | 4.117559214 | Standard Error | 3.005856868 |
Median | 515 | Median | 499 |
Mode | 516 | Mode | 503 |
Standard Deviation | 29.11554042 | Standard Deviation | 21.25461775 |
Sample Variance | 847.7146939 | Sample Variance | 451.7587755 |
Kurtosis | 1.145727484 | Kurtosis | 0.506098644 |
Skewness | 0.52722074 | Skewness | -0.060862828 |
Range | 154 | Range | 102 |
Minimum | 451 | Minimum | 445 |
Maximum | 605 | Maximum | 547 |
Sum | 25807 | Sum | 24871 |
Count | 50 | Count | 50 |
Variable 1 | Variable 2 | |
Mean | 516.14 | 497.42 |
Variance | 847.7146939 | 451.7587755 |
Observations | 50 | 50 |
Pearson Correlation | 0.375985414 | |
Hypothesized Mean Difference | 0 | |
df | 49 | |
t Stat | 4.583270786 | |
P(T<=t) one-tail | 1.58809E-05 | |
t Critical one-tail | 1.676550893 | |
P(T<=t) two-tail | 3.17618E-05 | |
t Critical two-tail | 2.009575237 |
|
CRISPR T Test
Variable 1 | Variable 2 | |
Mean | 524 | 506.9 |
Variance | 1054.222222 | 484.9888889 |
Observations | 10 | 10 |
Pearson Correlation | 0.553191516 | |
Hypothesized Mean Difference | 0 | |
df | 9 | |
t Stat | 1.977043764 | |
P(T<=t) one-tail | 0.039718353 | |
t Critical one-tail | 1.833112933 | |
P(T<=t) two-tail | 0.079436705 | |
t Critical two-tail | 2.262157163 |
AI T Test
Variable 1 | Variable 2 | |
Mean | 505.2 | 486.3 |
Variance | 572.8444444 | 586.9 |
Observations | 10 | 10 |
Pearson Correlation | 0.615966371 | |
Hypothesized Mean Difference | 0 | |
df | 9 | |
t Stat | 2.831851316 | |
P(T<=t) one-tail | 0.009831452 | |
t Critical one-tail | 1.833112933 | |
P(T<=t) two-tail | 0.019662904 | |
t Critical two-tail | 2.262157163 |
Vaccine T Test
Variable 1 | Variable 2 | |
Mean | 514 | 485.2 |
Variance | 1829.333333 | 396.1777778 |
Observations | 10 | 10 |
Pearson Correlation | 0.088098758 | |
Hypothesized Mean Difference | 0 | |
df | 9 | |
t Stat | 1.999078997 | |
P(T<=t) one-tail | 0.038333289 | |
t Critical one-tail | 1.833112933 | |
P(T<=t) two-tail | 0.076666577 | |
t Critical two-tail | 2.262157163 |
GMO T Test
Variable 1 | Variable 2 | |
Mean | 514.6 | 500.4 |
Variance | 245.8222222 | 333.6 |
Observations | 10 | 10 |
Pearson Correlation | 0.103053316 | |
Hypothesized Mean Difference | 0 | |
df | 9 | |
t Stat | 1.968428754 | |
P(T<=t) one-tail | 0.040272714 | |
t Critical one-tail | 1.833112933 | |
P(T<=t) two-tail | 0.080545428 | |
t Critical two-tail | 2.262157163 |
Global Warming T Test
First 1000 Words | Last 1000 Words | |
Mean | 522.9 | 508.3 |
Variance | 652.9888889 | 113.3444444 |
Observations | 10 | 10 |
Pearson Correlation | 0.31582935 | |
Hypothesized Mean Difference | 0 | |
df | 9 | |
t Stat | 1.893568345 | |
P(T<=t) one-tail | 0.045408491 | |
t Critical one-tail | 1.833112933 | |
P(T<=t) two-tail | 0.090816982 | |
t Critical two-tail | 2.262157163 |