Friday, May 6, 2011

What will you work for? Incentive system in Crowdsourcing

A few days ago, I was pitching an idea at the Entrepreneurship Club at CMU to move away from micro-payments in crowdsourcing to something what people would care for. While micro-payments like $0.01 are meaningful in developing countries, it is not attractive for most others. And the point here is that the reward mechanism has severely narrowed down the kind of tasks that can be performed online in a crowdsourcing manner. If we can find an incentive system that is attractive to a wider range of audience, it will automatically drive more users to participate and in-turn more companies will start moving towards crowdsourcing.

Here is a potential list of things that I think people would work for and surprisingly most of them are still unexplored incentive systems. Unless current platforms expand and provide flexible incentive systems, the adoption of crowdsourcing as mainstream will be limited.
1. Micro-payments (Mturk, Crowdflower etc)
2. Free mobile minutes (Txteagle)
3. Frequent flyer miles (?)
4. Coupons (?)
5. Movie tickets (?)
6. Discount points that can be used at other sites
7. Facebook points, Zynga points (Gambit+Crowdflower)
8 .....

Think about it. Amazon's Mechanical Turk is where most of the crowdsourcing happens today, and at any given instance the number of tasks on the site are still less than 200,000 ! So little for such a widespread concept of crowdsourcing. For more details of studies realated to MTurk, I will refer you to Panos's blog.

Wednesday, May 4, 2011

News Games: Annotating information nuggets in news via Games

Most of us read news every morning, some more often than others. While reading can we have a bit of fun with others who are reading the same article? And by doing so can we in-turn help enhance the reading experience of everyone else as well?

The idea is to device a bunch of simple 2-player games around the content of the newspaper which when annotated gives information that can be used in improving the presentation of the newspaper as well. Imagine the following scenario -

Scenario 1: You are reading a newspaper in your local newspaper (I speak Telugu) Does YSR, and Y.S. Rajasekar Reddy mean the same? How does a computer know? Can a single player play a "Spot the Person" game, where you click on a word thereby highligting it. You get points for identifying the maximum occurences

Scenario 2: The same scenario above can now be played as a 2-player game. Consider a "sentiment analysis" task. Given a newsarticle, ask a question two both participants about "What is Jagan doing in Hyderabad?" and you select a sentence that highlights the activity, lets say "campaigning". Both users get points when they agree on the click !

What does all this mean to your local news reading experience?
- We can now cluster news around people causing them (Named Entity Tagging)
- We can highlight events and relationships between people (Relation Extraction)
- We can better analyze news to create time-lines , location tracking and also understand general pulse of people (Sentiment Analysis)

Tuesday, May 3, 2011

Donate a word


The idea of the site is to collect an English word and its translation into a favorite language of anyone willing to contribute. This summer, I wish to enhance the collection process using a web interface with ample feedback and visualization.

Motivation factor for people to contribute to such a site -
1. Work towards a nobel cause (donate rice -
2. Fun to play (A 2-player game, or a scrabble style game) with a point system
3. Learning experience (123teachmespanish style)

Our motivation here is to enable "Assistive reading" technologies. An accompanying application/plugin will be built to enhance an average non-english speaker's reading experience on wikipedia , by suggesting translations at word-level in their native language.
The source vocabulary will cover all words from English wikipedia and a user can select his target language for providing a translation. Existing dictionaries free online ill be used to seed the application/plugin so that user's get a feel for what their contributions can achieve.

Visualizations will be the motivation:
1.How much of Wikipedia has been covered for the user's language
2. How many articles will the user help by translating a particular word

Visualizing twitter streams from different languages

Twitter has become a great hub that gives a sneak peek at what the world is talking about. However we are only equipped to analyze the English tweets (e.g recent work on dialect identification on twitter, part of speech analysis, sentiment analysis etc.). Can we go beyond in analyzing other languages ?

Why do we need to do this?
More than 50 percent of the messages on twitter are non-English. Infact as of today the world speaks about 4000 languages, and restricting ourselves to English tweets does not give us a complete picture of the world's communication.

For instance, Egypt problem as viewed from united states is different from what people in Egypt sees it. What about the opinion of people who are on twitter but blog/tweet in a different language? Has this to do with the language twitter speaks? What if they don't tweet in English and don't follow the conventions of #hash tags

1. Visualize the twitter/facebook streams for a particular query "egypt revolution" and overlay it on the world-map
2. Use a different lens (a dictionary for Arabic-English) and translate more of the twitter streams (word for word) and overlay it on the map. Does this look radically different?
3. Zoom-in and out on different countries to see what they think about an issue.

Well, what about the rest of the people who don't blog, tweet, facebook? Thats a problem for another day.

Monday, May 2, 2011

Language Translation in the Crowd: Part 1

In the past I have tried involving crowds for translating text to be used in seeding an automatic translation systems.(refer: 2010, Ambati and Vogel 2010)

A few problems with crowd:
1. Too many spammers, how do you know who is doing the right thing, when you don't know what's right.
2. Too few bilingual speakers for any language-pair you pick. There are 50 major language pairs, and 3950 other languages in the world. Think of creating translation systems for translating between 4000X4000 !!!
3. How do you make it interesting for the users to contribute the site and not feel that they are being stolen of their Intellectual property ! (Give them money. Not feasible when you think of the few thousands of language-pairs you are considering)

Now there are some projects out there which have looked at a sub-set of the problems I mention above, although I am not convinced yet we have a silver bullet yet.
1. Monotrans: Effort from Maryland, which is by now well published now and the results have been applied to translation of Children books from the ICDL
2. Duolinguo has been making some noise for about a year now, but its not yet seen by the world outside. I hope and wish its really good, coz the success of such projects give focus to similar efforts !

My take on this is the success of translation in the crowd is going to need the following: (Some of which I am working on and will be publishing in my research!):
1. Translation task needs to become verifiable: Task-breakdown
2. Involve two vs. one person: Collaboration
3. Make it fun or a learning experience for all: Challenging Innovative Games

In a continuing post, I will talk about some of the designs we have come up with to build, collaborative, constructive and verifiable methods for involving the crowd and hopefully its fun and motivating enough for people to contribute without making them feel they are robbed of their time or knowledge. After all, knowledge can only be shared and it rightly should be !

Sunday, May 1, 2011

PollSense: Its the ad-sense of Polls

Polls when, where and how?: A task-integrated context-sensitive polls

Ask them when they are willing to provide?

We need the opinion of people who are well educated in the issue and so identifying the context gives us an understanding of our user. If he is reading about the "Health bill", he perhaps is a right user to vote on it. Reduces noise when compared to the traditional polling methods.

Engage users . Similar to context sensitive ads.
Have widgets and provide services

language independent , domain independent
Time-sensitive, based on social media, current-affairs ?

- As more sites adopt it, you have an understanding of Page-ranks , what pages people visit etc


- Create thousands of automatica polls from news, and provide a toolbar for users
- Share data or just results ?
- Can we learn more about the users themselves (tie up with the web)

Bloomberg is getting into it.

Whats out there:

Create content for your site: