A fascinating MarketingProfs.com article by Matthew Syrett, Exploring Blogs for Brand Insights, explores how to use weblogs to gather marketing data. First he says why weblog data might be better than survey data:
The problem with surveys lies in the contrived way they take place. A survey does not observe consumers in their natural setting, where they would behave completely normally.
Instead, it places consumers in an artificial situation and proceeds to ask questions that the consumer may or may not answer correctly. In a way, survey insights are like those about animal behavior observed in a zoo—such insights are not fully reflective of how the animal behaves in the wild.
Then Syrett talks about gathering data about consumer feelings by looking for connections between a brand name and consumer ideas connected to the brand name, using a natural language processing method called Latent Semantic Indexing (LSI).
Accordingly, LSI can be readily repurposed to explore conceptual connections between words identifying brand (i.e., brand names) and the words marking their brand equities.
...
The key to applying LSI to branding analysis is having access to a large consumer-authored text repository upon which to run the analysis. Blogs are such a repository.
The recent surge in blogging by consumers offers an ideal environment for LSI analysis for brand health. Consumer-written blogs reveal largely unedited consumer attitudes in numerous diary-like documents that can be readily harvested for text mining through their RSS or ATOM feeds. Furthermore, the existence of date stamps within these feeds offers better temporal control of data than normal scraping of text content from standard Web sites.
Then he shows how to use Google to gather the data:
To assess the viability of the LSI blog-mining for brand health work, I created a test-bed project exploring the brand equities of the two leading carbonated soft drinks: Coke and Pepsi. I began by harvesting all the post I could from the Live Journal (www.livejournal.com) blog site authored in October and November 2004 that contained the terms Coke, Coca-Cola, Pepsi, or soda in them, via a custom Google query ("site:livejournal.com coke OR coca-cola OR pepsi OR soda").
There is a lot more in the article. The exact mechanics of generating LSI scores, and the scripts for pulling the posts from Google are not presented. But the basic tactic of pulling user data from weblogs via a search engine and using that data to get a snapshot of the market makes a great deal of sense.
You don't even have do a semantic analysis of the data to get value from it. You could get a useful snapshot of consumer opinion by spending an evening reading a couple of hundred weblog posts that mention a company's name. I'll bet most people would be surprised what consumers are saying about their companies and their competitors.
Posted by georgegmacdonald at May 4, 2005 04:10 PM