Sunday, 1 March 2009

Politics and Web2.0 Research Methods Workshop

Politics and Web2.0 Research Methods Workshop

19 – 20 February 2009, Manchester

It was organized by the Infoscape Research Lab at Ryerson university (Canada) and the National Centre for e-Social Sciences at the University of Manchester (UK).

Similarly to my blogging on the CIVICWEB seminar I will briefly present some of the presentations and parts of discussion that I found interesting.

I want to explicitly state, that this is not a direct speech from the participants, some things may have been left out, others overheard so please take the conversations as of informative nature but not as suitable for quotations.

Maja Turnšek Hančič

Online deliberation analysis on YouTube

I talked about methodological problems in trying to employ online discourse analysis for video sharing environments, specifically YouTube.

Andy Williamson

Understanding social interactions within digital political spaces

Andy is from Hansard society where they work very closely with government and research citizens engagement and political communication. They are interested at politically engaged people – how they use the internet. But beyond the engaged minority, they are also interested at how the re-engagement can happen.

He stressed three points which I found interesting:

(a) online engagement methods fail when organisations are risk averse – the public know how to recognize fake invitations.

(b) how are e-democracy intitiatives democratic, if we exclude 60% people who do not have access to the internet? The is thus, how to get people online. “If we don't talk about that we have an elephant in the room and we can't just ignore it.”

(c) old political thinking versus new political thinking - old political thinking is going online but it doesn’t change the old thinking of just delivering information.

He talked about an example of Greenpeace which just did a campaign against the third Runway on Heathrow – they flooded 57 email services of MPs. Greenpeace thought it was a great success – but half of 57 Mps voted for the third runway. It didn’t convince them, it just caused them technical trouble. “One letter written by a letter is worth more than 10.000 pre-formed emailed ones. Greenpeace thus also still thinks in numbers and not quality.”

I thought it was interesting, from an ethical perspective, that he talked they as researchers have a special Facebook account (not a personal one) which makes friends with Facebook UK MPs. More MPs are starting to be very active. The strategies are different, from: “I accept everybody” to “I accept only friends who I know are friends of campaign”. Hementioned also the growing use of Twittter.

He also mentioned mass silent dance in UK which was coordinated through Facebook – the technology backed up mobile phones and word of mouth – “new tools for the same purpose”. Alan stressed that the technology doesn’t change mobilisation in its essence but changes the scale and the speed of mobilization.

Maurice Vergeer

Features on political party websites

Maurice presented the planned Comparative European New Media and Elections Project, which is organized by Maurice Vergeer, Carlos Cunha and Gerrit Voerman. This is basically a follow up on the Internet and National Elections projects. They will analyze political party and candidate websites during European Parliament elections. The focus will be on Web 1.0 and Web 2.0 features.

Methodological problems that Maurice discussed were: archiving, sampling (only bigger parties and only major candidates?)

Maurice presented some first results from secondary analysis of the data of previous research whose authors were kind enough to share the data with him:

Norris 2001, 2003

Jankowski et al.

Gibson & Ward

He tested for scaling using a factor analysis for dichotonomous variables and discovered there was only one dimension with Cronbach’s alpha 0,875) (Pippa Norris assumed two: information and communication), but he was puzzled by the meaning of these dimensions.

It says that there is only one consistent dimension.

Some interesting questions were raised. Rachel Gibson asked to what depth do Maurice and his colleagues plan to go on analyzing Web2.0 features: only presence/absence or also more in depth. Maurice answered that they hope that hey will manage to archive the material (e.g all YouTube videos) and the more in-depth analysis could be done later. He stressed that the first problem of international research or making comparative analyses is where to stop the dept of the coding book, so that it is still doable. Second problem are cultural differences: for example liberal party in UK is not comparable to the party which terms it self liberal in some other country.

Elisa Pieri

ID cards: A frame analysis of the debate in the UK printed media

Elisa presented herself as coming from qualitative background – building on grounded theory. The research she presented is a part of the larger text-mining research.

She talked about two understanding of frames: (a) frame as cognitive process versus (b) frame as strategic persuasive method (Goffman versus Entman).

The case study focuses on public debate on National identity scheme in UK – creation of large database which will be shared by various governmental bodies and private organisations. The aim of her research was to investigate the debate on national newspapers and explore which kind of arguments are being used in support, opposition or just presentation of the scheme

Sample: 280 newspaper texts, using LexisNexis

She identified various frames. ID scheme is framed as: not secure, lack accountability, compulsory versus based on choice, universal, tough on immigration, the imbalance between liberty and security, as being one in the series of government ailed projects etc. She furthermore focused on the “government is being tough on immigration” issue in this case study. She first identified how the question was identified in the original governmental document and then compared it to how newspapers reported on the National scheme.

She discovered two ways of invertion of original frame “being tough on illegal immigration” (that of the document) in the newspaper articles:

(a) as being empty rhetoric, also through the use of satire, basically saying that it will not succeed in prevention of immigration and that it does not go far enough

(b) as being an example of government extreme right ideology of reinforcing “Britishness” and controlling immigration.

In the discussion Greg Elmer asked whether tags could be conceived as frames in themselves. And Elisa stressed that not every code could be a frame – e.g. prime minister tag would not be a frame. Frame is more crafting the argument. It is also broader.

Rachel Gibson

News Blogs: Setting or following the agenda?

Rachel talked about gainig interests in the question of blogs and news agenda after some “high profile scoops” have been reported heavily by mass media: Rathergate 2004 & Trent Loft 2002 – blogs were leading the news agenda, either by discovering the news or making it more prominent.

The question is thus, can they be seen to challenge the dominance of mainstream media?

She tried to answer this question thorugh social network analysis, using VOSON software – gives information on in- and out-bound links plus information collected from the websites themselves.

5 seed sites in each category:

mainstream news media (the guardian, the independent…

mainstream news media blog – a list of twenty important bloggers and of mainstream media blogs collected from that – plus the list of 76 news blogs in mainstream media – so the overlap from these two

indy blogs

She removed all sites that had less than 5 links which shows that the density of Indy blogs is much stronger than among mainstream media. The question posed later by Rachel was: how do we decide on how many links does the website has to have to others, so that we leave it “in the picture”?

The discussion later revealed around the main problem of social network analysis of hyperlinks: which seeds to take?

The second question was how to determine causality. And the consensus seemed to be, that the timeline should necessarily be taken into account. Here Maurice gave an important suggestion: blogs are RSS which is text and contains the date and in this way you don’t have to save pages to get information on the question who hyperlinked first, you just take it from the RSS. Greg said that is exactly what they have been doing at InfoscapeLab.

Han Wo Park

Hyperlink network analysis of 2007 Korea presidential election

Han presented two case studies and social network hyperlink analysis of websites and social networking sites:

(a) case study of 2007 presidential race within the Grand National Party

(b) case study of a hyperlink network of 2007 presidential election

The discussion followed the line of the discussion that started with Rachel’s presentation. Again, the necessity of going over descriptive analysis with social network analysis and finding the causality behind the visualizations was stressed by participants.

Some points from the final discussion on Thursday

Greg Elmer talked about how we as researchers get overwhelmed with complexity of Web2.0. We need to think of Web2.0 as a network phenomena. What ever there is, it is placed now on multiple platforms. We need to try and think conceptually the architecture of all these different platform. How, for example, are communication campaign networked across Facebook and other platform? At which platforms are YouTube videos present? Etc.

Elisa Pieri stressed the importance of going back to the theory we already know. We need to reflect more on the means of interpretation. It just could be, that we can understand all these new phenomena it in terms of theory that we already have. This is even more acute since the research is becoming increasingly more interdisciplinary.

Maurice stressed that all standards of methodology still apply to Web2.0 methods: validity, reliability etc. But the specific questions of how for ex. to sample on Web2.0 may be quite difficult. We should be quite careful. Sampling in a network environment is even more difficult. He agreed that it is very important that we need to tackle the issue of the architecture. We should make an inventory of how to analyze and how to deal with that. He gave, what I think was a great advice: if you do research on web2.0 from theoretical point of view, be modest, don’t be too ambitious not to get yourself involved with all the problems of networking etc. You don’t need to have the understanding of the architecture what is behind it. Why would you be entangled with it if you don’t need it? Do not go over the boundaries of your research project if it is not necessary for your research project.

I added the question of difficulty with analyzing the multimodal content.

Greg furthermore stressed again that we need to do more longitudinal research, on one hand, and on the other, we need to find out ways how to analyze fast moving, quickly disappearing phenomena such as live discussions. Later the dimension of time was connected by Rachel to the dimension of space – with mobile phones this becomes even more complicated and mobile phones accelerate the dimension of time, since for example the political campaigners can publish from anywhere and at anytime.

The debate then developed into the question of how do we do that while at the same time still preserving the in-depth analysis and the insight we need to have while researching. The question of archiving and making public archives was the second major point that was introduced in the debate. This was then connected to the questions of ethics (whether private versus public, or is it better to define data as sensitive) and questions of proprietary laws (private, public, collective?)

Robert Ackland

Virtual Observatory for the Study of Online Networks (VOSON)

His presentation was made through Skype.

He presented VOSON– social networking tool: mapping, social network analysis

Similar tool to VOSON is Issue Crawler (Richard Rogers) and SocSciBot (Mike Thelwall)

VOSON comes from a different disciplinary perspective than Isssue Crawler and and SocSciBot.

Network scientists: about measurement of properties of large scale networks and models

Information scientists: a citation network – using regression and using hyperlinks

Social scientists view the web as social organizational network, behavioural theory

Differences among these three approaches are in web2.0 even greater.

Social network analysis recognizes that in a network we have a self organizing behaviour (e.g in-links into oneself), they put emphasis on need of sampling.

Robert vigorously defended the idea that social scientists should thus actively participate in the design of research tools. Specifically he talked about social network analysis (SNA). He proposed three potential approaches:

(a) data sharing deal with existing social networking sites (SNS) (e.g. Facebook or Orkut)

(b) to build SNA application within exiting SNS – users install your application – you can ask them to download the Facebook application and their data would be collected by researchers

(c) build a new SNS or a “niche” SNS – not suitable for all research projects, but what about in cases of segment of population who are not necessarily be well represented in other SNS? – e.g. to research social inclusion among the elderly. Researcher and a partner organization (e.g. seniors lobby group) could build a SNS together. He made his own social network service for a test – built using Drupal for ANU masters students

Davy Weissenbacher

Text-mining Tools for frame analysis

But Davey did not talk about frame analysis. He generally presented the functioning of ASSIST project which aims to deliver a service for searching and qualitatively analysing social sciences documents. NaCTeM is deigning and evaluating an innovative search engine embedding text mining components.

5.000 documents come from LexisNexis. The search engine allows you to search this document. It looks for semantic content – it provides different information of the document – basically you can put in search word and then you get information on each document (e.g. author, location, persons described). Point: it differentiates within the document – it recognizes which part of the document for example is the name of the author.

We can then process the data with different operators. For example I am interested in specific author and the research will focus on the metadata. Or I am interested in documents writing about a city eg. London – all documents where city London has been tagged as the main entity.

All the codes are designed by computer. E.g. location – it takes the list of the dictionaries of all the cities in UK. Other corpus is for example Wikipedia for celebrities. Other: Educational portal of University.

There is, however, still a lot of “noise”. They plan to use machine learning techniques so that the computer will learn from the context that computer is able to differentiate e.g. title of newspaper from the city itself.

They work also on sentiment analysis: possibilities for automatically computing the opinion of the author – the system will highlight sentences which use specific terms which show negative or positive words. It thus coloured the sentences as negative and positive (going from different shades of green to different shades of red). They analyze the syntax of the sentence and within it find different words with are associated with positive or negative. But again: the system counts high as positive, but not criminality as negative and then you get out:”high criminality =positive”.

Example: “Identity cards will intrude on our private lives, MPs and peers said last night.” – coded as negative because of the word “intrude”.

The main problem is thus that the corpus should be build which takes into account all words and word combinations.

Maurice pointed to work on automatic content analysis by Jan Kleinnijenhuis and Wouter van Attenfeldt. They use the patient-agent relation within a sentence. It deals with the software which deconstruct sentences.

Greg Elmer

The Permanent Campaign

Greg presented the Infoscape’s project: Code Politics. This is a 3 year project, basically analyzing how can we study software code as a political phenomena. They have focused largely on online elections. Since 2004 Canada had 3 federal elections. Aim of the project was to develop: methods, tools, visualizations. One component of visualization of results is making it comprehensible to larger public. They focused on embedded research – not only empirical research and development on methods and analysis, but also engaging through the media and political process. They partnered with public broadcaster and worked directly with journalists, producers, reporters etc. to provide them with visualizations and data. They also meet with political bloggers and online campaigners to learn about their strategies.

Theoretically, Greg builds on the concept of the permanent campaign. Definition of permanent campaign (concept was coined by Patric H. Caddell) – a recognition that campaigns need to be better administered trough controlling the campaign and influencing mass media reporting. Reasons for emergence of the permanent campaign:

(a) intensification of party politics,

(b) rise of national campaigning,

(c) emergence of 25/5 news cycle,

(d) expansion of media spheres to network computing (citizen journalists also tend to be those who are the most politically partisan).

Components of the permanent campaign

If there is an ongoing state of preparedness – permanent campaign – there is greater readiness to have more personnel working for you. It needs to recognize that campaigns are networked – there is no one platform where candidates can go to launch their campaigns, but it is a platform of platforms. This also means integration of political strategies and tactics. This furthermore means redefinition of “campaign” period as a state of elections, crisis, controversies. The idea of permanent campaign is the important concept to see more fully which campaigns exist. Leadership campaign for example – like Obama’s . This provides one answer why he was so open. He had to build a network of supporters, his campaigners did not have those names and database that Hillary Clinton had from the beginning from democrats.

Greg focused on the work of bloggers within the permanent campaign.

Who are they referring to? (answer: mostly other blogers with similar political views) Whom do they support? What are they talking about? (answer: mainly newspaper articles) How does their coverage differ from news coverage (do they talk about the same issues)?

Videoblogger or Vlogger: Where are the videos embedded? He provided some of the answers using the tools developed by Infoscape. One Powerpoint picture for example shows the results of top referrals to official Kevin Rud YouTube channel videos:

Picture taken from Greg’s Power Point presentation uploaded at Infoscape

Who refers to them?

Picture taken from Greg’s Power Point presentation uploaded at Infoscape

Fenwick McKelvey

An Overview of Blog Research

Fenwick presented in detail the tools and methodology developed and used at Infoscape for researching and analyzing blogs.

Next to the possibilities already presented by Greg, Fenwisk talked more on how they hey were able to track discussions on leaders – who was getting more attention? And about which themes?

The Infoscape developed various methods and tools, those specifically designed for analyzing blogs are:



All blogs use RSS feeds – so they use RSS feeds as sources of data. They are scraping the content (and not navigational aspects). They decided to focus only on partisan blogs.

They were able to determine what was the most discussed hyperlink in blogs and then also check that content. They could also go back to the sources of the link of the post. They did this 3 times per week.

The tools are divided into 4 categories:


Data – data sets



I was specifically interested in their work on party politics and YouTube.

Youtube search scraper uses Youtube API to extract data. Once it has a list of key words it queries Youtube API on a set schedule. By comparing this list of videos week over week it can tract the changes over time. You are able to say this is the most popular video in this time span. You are able to make it temporal to see what videos are popular for what time.

It archives the xml and the database. It works on key words. API uses tags, descriptions. When we extract that information from YouTube’s API it gives the number of views, comments, ratings, tags, the date, when it was uploaded and by whom – the ID of a person as a Google account. They have been able to see what videos have been going up and down in ratings. YouTube’s search engine itself namely returns as the most watched also the videos which are very popular, however, those videos could get these numbers a year ago and it is not necessarily that it has any recent views. They were manage to overcome this problem by analyzing the ratings on a weekly basis.

Ganaele Langlois

Facebook and politics 2.0

She talked about research challenges in analyzing Facebook: it is a black-boxed architecture, has first-person perspective and previous scraping methodologies (e.g. hyperlink scrapes) not feasible.

Facebook has a html, but it also uses other languages, which makes it extremely difficult to analyze. Hyperlinking in the old-fashioned meaning doesn't really happen.

Three types of political activity on Facebook:

a) users can state their political affiliation on their profile,

b) can become fans or supporters of a politician's page,

c) users can create or join groups and event.

Ganaele created a robot that automatically scrapes Facebook, but Facebook has found a way to prevent it.

Greg & Ganaele then talked about their forthcoming article: Wikipedia Leaches – they are looking at where and how a particular phrase has been take up – it is not just hyperlinks that you can find networks, it is also text – they tried to figure out what are all the ways in which web objects are individually identifiable and what can we glean from that information and can we then use those unique identifiers to see that particular object where it is reproduced across the web. This is not only to recognize content but also to recognize ways in which content is circulated. You can for example take the ID of a particular video and determine how did that video become viral.