Instructors: Prof. Dr. Christian Biemann; Seid Muhie Yimam
Event type:
Lecture
Displayed in timetable as:
NLP - VL
Hours per week:
2
Language of instruction:
German/English
Min. | Max. participants:
- | 20
Comments/contents:
The Web contains more than 10 billion indexable web pages, which can be retrieved via keyword search queries. The lecture will present natural language processing (NLP) methods to automatically process large amounts of unstructured text from the web and analyze the use of web data as a resource for other NLP tasks.
Key topics:
- Processing unstructured web content
- NLP basics: tokenization, part-of-speech tagging, stemming, lemmatization, chunking
- UIMA: principles and applications
- Web contents and their characteristics, incl. diverse genres such as personal web sites, news sites, blogs, forums, wikis
- The web as a corpus – innovative use of the web as a very large, distributed, interlinked, growing, and multilingual corpus
- NLP applications for the web
- Introduction to information retrieval
- Web information retrieval and natural language interfaces
- Web-based question answering
- Mining Web 2.0 sites such as Wikipedia, Wiktionary
- Quality assessment of web contents
- Multilingualism
- Internet of services: service retrieval
- Sentiment analysis and community mining
- Paraphrases, synonyms, semantic relatedness
Learning objectives:
After attending this course, students are in a position to
- understand and differentiate between methods and approaches for processing unstructured text,
- reconstruct and explicate the principle of operation of web search engines,
- construct and analyze exemplary NLP applications for web data,
- analyze and evaluate the potential of using web contents to enhance NLP applications..
|