Preferential Attatchment in Feeds

"The size of the blogosphere continues to double every six months" as per the latest quarterly report on the State of the Blogosphere by David Sifry. According to this report there are 33.5 Million weblogs and many of these are activly posting. Last year there was a post by Jim Lanzone from Ask on which feeds matter? According to Bloglines/Ask in July 2005 there were about 1.12 Million feeds that really matter, which is based on the feeds subscribed by all the users on Bloglines. A study of the feeds on bloglines in April 2005 showed that there were about 32,415 public subscribers and their feeds accounted for 1,059,140 public feed subscriptions.

We collected similar data of the publicly listed users on bloglines. From last year, the number of publicly listed subscribers have increased to 82,428 users (2.5 times that of last year) and there are 1,833,913 listed feeds (~ 1.7 times) on the Bloglines site. Hence even though the blogosphere is almost doubling every six months, the number of feeds that "really matter" probably doubles roughly every year. Inspite of it, it may still be only a small fraction of the blogosphere.

This leads me to think that there is some preferential attatchment for feeds. A new user who joins bloglines would subscribe to some of the feeds from the long tail (belonging to friends and based on interest) but most would tend to also subscribe to feeds that are already popular (such as slashdot or other top popular feeds).

Number of subscribers per feed

There is also an inherent limit on the amount of information that a user can keep track of at any given time. To study this we show the number of feeds subscribed by the publicly listed users on bloglines.

feeds per user

From the graph it can be observed that although there are some users who monitor more than 5k feeds (which might not be real users but programs using bloglines API), a majority of users are normal users who subscribe to the blogs and news feeds that they want to follow regularly. Mostly, these users have somewhere between 30-100 feeds that they monitor. This might explain the deviation of the graph from that of a typical power law curve.

To summarize:

  • The blogosphere continues to grow as does the number of people who follow blogs.
  • While this is still a rough estimate, the number of feeds that really matter is a very small fraction of the entire blogosphere.
  • The number of feeds that really matter doubles each year as opposed to the size of the blogosphere, which doubles every 6 months.
  • Most users tend to follow a relatively modest number of feeds.


Which domains matter on the blogosphere?

We recently analyzed data from three different sources: Bloglines, which manages feeds subscribed by users, a sample of Blogpulse index made available for the WWW Weblogging Ecosystems Workshop and Blogwise, a popular blog directory.

Bloglines

Bloglines domain distribution

Bloglines has more than 83,000 publicly listed users who subscribe to about 2,786,687 feeds in all, of which aboout 496,893 are unique. These are feeds that matter since they have been actually subscribed by some users. The above chart shows the top domains from these feeds. It is interesting to note that Blogspot contributes to 45% of the feeds that matter followed by Xanga and Flickr. We also see a substantial presence of web 2.0 sites such as Flickr, del.icio.us, technorati, etc that provide their content in RSS.

Blogpulse

Blogpulse domain distribution

The Blogpulse data contains 1.3 Million blogs from a 21 day period. 50% of the top domains are contributed by livejournal and most of the domains are those of blog hosting sites. More analysis of this data could be found in the paper on "Characterizing the Splogosphere". A related post by Matthew Hurst talks about community structure on the blogosphere that goes across different domains. Also compare this with last year's post on ranking blog hosts and other related posts here and here.
While this data only provides a sample of the blogpulse index, it shows a very interesting difference in content indexed by blog search engines and the feeds that users actually subscribe to in bloglines. Its understandable that there is a difference, blog search engines should also cater to collective mining for trends, and sources like livejournal render themselves well here.

Blogwise

Blogwise

Blogwise is a blog directory that has a relatively small index of 71,252 blogs most of which are contributed by Blogger. The rest of the domains are mostly from blog hosting sites.

Summary

  • Based on bloglines user subscriptions, even though Blogspot has had serious splog issues, Blogspot still contributes to a significant portion of the feeds that matter on the blogosphere.
  • A number of bloglines users subscribe to Web 2.0 sites and dynamically generated RSS feeds over customized queries.
  • Finally, in any index of the blogosphere, the number of blogs that are indexed may not be as important as indexing the feeds that really matter to the user.

Acknowledgement

Thanks to Pranam Kolari for ideas and help with this post. Also Bloglines, Blogpulse and Blogwise for publicly making some of their data available.