Tuesday, June 14, 2011

Voicify : Crowdsourcing Voice Aquisition

Workflow:
1. Upload a photo of text, document, URL pointing to the data you want speechified
2. Select from a list of existing users, judging by samples or open an audition for a sample !
3. If satisfied by sample continue to completion !
4. Website provides quality assistance: cleanup of voice samples, voting mechanism for the same.

Acquiring userbase:
1. List of people who post on elance, mturk for voice overs
2. Signup people from Mturk / Crowdflower and ask them to post sample of voices
3. Spread the word !

Research and Technology:
1. Storage on E2 cluster
2. Background noise cleanup (filters etc)
3. Quick samples can be obtained on the web or on devices similar to - blip.me, voice.ly

Who may be interested:
1. Animation companies (Project description must be more detailed to accomodate such)
2. Speech Researchers looking for voice overs
3. Every startup wants to spread its wings into new domains possibly with a video
4. Audio books
5. Voicify your Websites to reach people with hearing disabilities

Existing:
www.audiodraft.com (A site for crowdsouring music)
www.castingwords.com (A site for crowdsourcing speech transcription - other way round)
http://voiceover.com/
http://vocaroo.com/ (Record and send voice emails)

Wednesday, June 8, 2011

CollabSum: Collaborative Summarization of Webpages

Every page on the internet is read by at least more than two users. We can observe a power-law distribution over number of users that read a webpage. I am convinced at this point that most useful pages on the internet are read by a few thousands of users everyday. Imagine some of those users clicking on at least one sentence from the article to let us know its importance to the meaning of the article. Now imagine accumulating that information from a few hundreds of such users and producing a synopsis of the article for the quick consumption of everyone else.

I think technically this is a easy to build application which has a far-reaching impact and changes the way we consume content on Internet.

So what do we need to achieve this:
1. A client-side plugin that records clicks of users on a sentence of the news article.
2. A server-side that records these clicks and accumulates them
3. A method to push the accumulated summary back to the Client.
4. A way to overlay the summary on the article (highlight the sentences, heatmap, pop-up)

Who are involved:
1. User: He provides a single click, but benefits from the summaries of others. One way of addressing information overload.
2. Newspapers: Can understand their users better and create synopses and daily digests much faster.
3. Third-party: APIs can be provided to anyone that needs these summaries
4. Researchers: Automatic summarization systems can benefit from such data

What do we do with such data:
1. Improve existing summarization systems for the web
2. Build iphone apps that can provide summarized versions of the existing blogs
3. Create better RSS feeds with summaries and smarter digests for paid subscription users
4. Provide SEO support for user-generated social media by guiding Adsense programs towards relevant text
5. Kindle can now use the data to provide "highlights" for newspaper subscriptions


And all it takes is a single click of the user. At the risk of sounding a cliche - "A single click a day, can keep the information overload away".

Tuesday, June 7, 2011

Crowdsourcing Translation and Commercial Models

Existing Market:
http://www.smartling.com/
http://www.cloudwords.com/
http://www.foxtranslate.com/
http://www.speaklike.com/
http://mygengo.com
MyGengo translates thousands of texts daily in 14 languages for clients such as ShapeUp, Evernote, Youversion and Producteev.

Potential business:
- Typing scanned documents (ULIB style)
- Localization of Websites (Can we directly substitute it into the website and make it available for download?)
- Translating Advertisements
- Sub-text translation of DVDs for Movies
- Translating Polls and Surveys

Sunday, June 5, 2011

Language Learning Games

Commercial Websites:
http://voxy.com/
Combined current environment into the learning of a language.

http://www.babbel.com/
(Generates reports of which words went wrong .... should be useful in
the word alignment game)

http://eurotalk.com/us/
(A cool set of iphone apps associated with the site)

Non-profit Outfits:
http://www.internetpolyglot.com/: More details and interesting log of activity in the project at the author's blog - http://internetpolyglot.blogspot.com/

http://freerice.com : Donate rice by playing games, some of which are language learning flash-card style games.

http://www.digitaldialects.com/: Interesting categorization and flash games.


Research Projects:
Lingua Mechanica: Eric Horovitz's group at Microsoft. Sample games: Word Tetris

Duolingo: A project from Luis Von Ahn, CMU.