Archive

Posts Tagged ‘statistics’

Zaman: A Bayesian Approach for Predicting the Popularity of Tweets

2013 May 3 Comments off


FIG 7. Graphical model of the Bayesian log-normal-binomial model for the evolution of retweet graphs. Hyper-priors are omitted for simplicity. The plates denote replication over tweets x and users vxj.

We predict the popularity of short messages called tweets created in the micro-blogging site known as Twitter. We measure the popularity of a tweet by the time-series path of its retweets, which is when people forward the tweet to others. We develop a probabilistic model for the evolution of the retweets using a Bayesian approach, and form predictions using only observations on the retweet times and the local network or “graph” structure of the retweeters. We obtain good step ahead forecasts and predictions of the final total number of retweets even when only a small fraction (i.e. less than one tenth) of the retweet paths are observed. This translates to good predictions within a few minutes of a tweet being posted and has potential implications for understanding the spread of broader ideas, memes, or trends in social networks and also revenue models for both individuals who “sell tweets” and for those looking to monetize their reach.

A Bayesian Approach for Predicting the Popularity of Tweets
Tauhid Zaman, Emily B. Fox, Eric T. Bradlow
arXiv:1304.6777 [cs.SI]

SSES 2012: Warming in North America, 2041-2070

2012 May 16 1 comment

This Web-Project represents an accounting of temperature change that is projected for North America in 2041-2070. Regional Climate Models (RCMs) are run 60 years into the future for small, 50 km x 50 km regions in North America, and their results are analyzed statistically for all regions and all four Boreal seasons. The preponderance of results throughout all of North America is one of warming, usually more than 2°C (3.6°F). A Bayesian, spatial, two-way analysis of variance (ANOVA) model is used to analyze RCM data from the North American Regional Climate Change Assessment Program (NARCCAP).

http://www.stat.osu.edu/~sses/collab_warming.html

Read more…

NCDC DS-9640: CONUS Temperature Anomalies and a couple of ax-grinders

2012 May 12 11 comments

For some reason, two posts at The Blackboard, It’s “Fancy,” Sort of … (Shollenberger) and “To get what he wanted”: Upturned end points. (Lucia), seem to be having difficulty understanding the mechanics of another post at “Open Mind”, In the Classroom (Tamino). But there is nothing unusual or difficult about the methods Tamino used to create the charts which have generated so much smoke and apparent frustration at The Blackboard – resulting in an outbreak of mcintyretude: scorn, derision, insults, and the questioning of motives. Since this is mostly a quick walk through of some code to clear the smoke, I will leave the charts generated to post at the end.

Read more…

Lines, Sines, and Curve Fittings 12 – heteroskedasticity 1

2011 January 21 3 comments

In statistics, a sequence of random variables is heteroscedastic, or heteroskedastic, if the random variables have different variances. The term means “differing variance” and comes from the Greek “hetero” (‘different’) and “skedasis” (‘dispersion’). In contrast, a sequence of random variables is called homoscedastic if it has constant variance.

http://en.wikipedia.org/wiki/Heteroscedasticity

Read more…

Lines, Sines, and Curve Fitting 11 – more extrapolation

2011 January 19 6 comments

I’ve mostly been working through GISTEMP in this series, but the exp+sine results were interesting enough that I wanted to pause and look at both line+sine and exp+sine in all three data sets.

Read more…

Lines, Sines, and Curve Fitting 10 – nls

2011 January 18 19 comments

The “nonlinear least squares” (nls) function is part of the core of R. John Fox wrote an introduction to it: Nonlinear Regression and Nonlinear Least Squares. This function will in a few dozen iterations return a better fit than my brain-dead looping around parameter space a few tens of thousands of times.

Read more…

Lines, Sines, and Curve Fitting 9 – Girma

2011 January 17 39 comments

Dr G. Orssengo recently brought to my attention his “line+sine” model which was presented at WUWT in April 2009. In short, his model is

y = (m*(x-1880) + b) + (A * cos(((x-1880)/T)*(2*pi)))

intercept (1880) b = -0.53
slope m = 0.0059 C / yr
amplitude A = 0.3 C
period T = 60 years

Which he displays as such:

Read more…

Lines, Sines, and Curve Fitting 8 – D'Agostino

2011 January 16 Comments off

The eyeball and quick sigma population checks in the previous post provided some confidence that the global temperature anomalies are normally distributed over the mean. But there are more formal tests, including D’Agostino normality test.

Read more…

GHCN: Pearson's Chi-squared test

2010 May 30 Comments off

Introduction

Various players have looked at changes in trends due to loss of stations, loss of rural stations, loss of high latitude, and loss of high altitude stations. Other cuts have included brightness and GPW population or population density. Recently, Zeke added airports to the list.

Pearson’s Chi-squared test is used to test independence of variables in categories. Make no mistake, I am only playing at being a statistician in this post. I welcome comments and corrections in what follows and suggestions of more appropriate category tests.

Read more…

Follow

Get every new post delivered to your Inbox.

Join 27 other followers