Dissertation: Experiment 1

Is Information processed during Reddit conversations?

Part I of my experiment was to explore if Reddit was the site of information processing. If information processing is taking place, I hypothesise that there should be an effect on information richness. As an indicator of information richness, I look at the ratio of common words (And, the, but, or, yet) to uncommon words (scientific, CRISPR, methodology, information) to see how information rich conversations are.

For my experiment, I scraped the first 1000 words of each of the 50 reddit conversations and the last, running an analysis of common words to uncommon words. I found that the mean number of uncommon words in the first 1000 words was 524. The mean number of uncommon words in the second 1000 words was 506. To see if these results are statistically significant, I ran a T-test. My alternate hypothesis was that whether the words examined are the first 1000 or last 1000 should have an affect on the mean level of uncommon words. My null hypothesis therefore is that there would be no hypothesis. With t t statistic of 4.58 and a P value of < 0.001 we reject the null hypothesis and conclude that whether or not the words are from the beginning or end of a post does have a correlation on the level of uncommon words.

Reddit Scraping Script

import praw
person = input(‘Enter code: ‘)
reddit = praw.Reddit(client_id=’xxxxx’, client_secret=’xxxxxxx’, user_agent=’Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.1 Safari/603.1.30′,)
submission = reddit.submission(id=person)
submission.comments.replace_more(limit=0)
for comment in submission.comments.list():
print(comment.body)

Link to the Word Analysis Tool I Used:

http://www.textfixer.com/tools/online-word-counter.php#newText2

Posts Analysed and Uncommon Words

Global Warming	First 1000 Words	Last 1000 Words
5hwhlp	536	520
1umk34	566	515
5yfc4y	522	493
27di3k	486	503
616x8p	501	526
5e2gq7	516	504
4ktef6	542	503
3a4s9y	552	518
1nq1kt	508	501
5m11tu	500	500
GMO
62e4cm	542	517
g3gc4cm	488	508
3ggz62	514	514
2j0gis	513	502
2vdrk1	516	524
4qzxq1	530	467
34thm6	525	514
2af33s	516	486
4p16rf	505	485
31kqoq	497	487
Vaccine
57mfgu	561	506
3lahv8	521	518
2lou8v	553	494
3lahwa	451	489
31akrv	510	448
5mlz2h	459	478
5oxpqt	523	467
1z35k1	491	492
48m2wt	490	487
5066yp	581	473
AI
5g3ezx	492	501
4brgvm	518	480
5xdou5	497	482
5jjy0k	485	471
5ay137	487	486
45l03x	470	445
59thrj	533	467
6cfm24	514	493
61rmzt	507	503
4j5q9d	549	535
CRISPR
4fpvqv	496	487
5rd5f1	529	488
5ymfbu	525	494
6cral1	509	547
4knox6	531	508
4p89wd	487	490
5kbaxt	507	509
5kaf3w	516	498
579z69	535	503
5d55dj	605	545

Experiment Descriptive Statistics and T-Test

Mean	516.14	Mean	497.42
Standard Error	4.117559214	Standard Error	3.005856868
Median	515	Median	499
Mode	516	Mode	503
Standard Deviation	29.11554042	Standard Deviation	21.25461775
Sample Variance	847.7146939	Sample Variance	451.7587755
Kurtosis	1.145727484	Kurtosis	0.506098644
Skewness	0.52722074	Skewness	-0.060862828
Range	154	Range	102
Minimum	451	Minimum	445
Maximum	605	Maximum	547
Sum	25807	Sum	24871
Count	50	Count	50

	Variable 1	Variable 2
Mean	516.14	497.42
Variance	847.7146939	451.7587755
Observations	50	50
Pearson Correlation	0.375985414
Hypothesized Mean Difference	0
df	49
t Stat	4.583270786
P(T<=t) one-tail	1.58809E-05
t Critical one-tail	1.676550893
P(T<=t) two-tail	3.17618E-05
t Critical two-tail	2.009575237

CRISPR T Test

	Variable 1	Variable 2
Mean	524	506.9
Variance	1054.222222	484.9888889
Observations	10	10
Pearson Correlation	0.553191516
Hypothesized Mean Difference	0
df	9
t Stat	1.977043764
P(T<=t) one-tail	0.039718353
t Critical one-tail	1.833112933
P(T<=t) two-tail	0.079436705
t Critical two-tail	2.262157163

AI T Test

	Variable 1	Variable 2
Mean	505.2	486.3
Variance	572.8444444	586.9
Observations	10	10
Pearson Correlation	0.615966371
Hypothesized Mean Difference	0
df	9
t Stat	2.831851316
P(T<=t) one-tail	0.009831452
t Critical one-tail	1.833112933
P(T<=t) two-tail	0.019662904
t Critical two-tail	2.262157163

Vaccine T Test

	Variable 1	Variable 2
Mean	514	485.2
Variance	1829.333333	396.1777778
Observations	10	10
Pearson Correlation	0.088098758
Hypothesized Mean Difference	0
df	9
t Stat	1.999078997
P(T<=t) one-tail	0.038333289
t Critical one-tail	1.833112933
P(T<=t) two-tail	0.076666577
t Critical two-tail	2.262157163

GMO T Test

	Variable 1	Variable 2
Mean	514.6	500.4
Variance	245.8222222	333.6
Observations	10	10
Pearson Correlation	0.103053316
Hypothesized Mean Difference	0
df	9
t Stat	1.968428754
P(T<=t) one-tail	0.040272714
t Critical one-tail	1.833112933
P(T<=t) two-tail	0.080545428
t Critical two-tail	2.262157163

Global Warming T Test

	First 1000 Words	Last 1000 Words
Mean	522.9	508.3
Variance	652.9888889	113.3444444
Observations	10	10
Pearson Correlation	0.31582935
Hypothesized Mean Difference	0
df	9
t Stat	1.893568345
P(T<=t) one-tail	0.045408491
t Critical one-tail	1.833112933
P(T<=t) two-tail	0.090816982
t Critical two-tail	2.262157163

Share this:

Related

Leave a comment Cancel reply