Advanced Natural Language Processing Techniques with NLTK
In this lesson, we will explore advanced features such as POS tagging
, named entity recognition
, and syntax parsing
using NLTK.
1. Part-of-Speech Tagging
A part of speech
(POS) refers to the grammatical role of a word in a sentence.
For example, in "I am a student."
, I
is a pronoun, am
is a verb, a
is an article, and student
is a noun.
POS tagging involves analyzing each word in a sentence to determine its part of speech.
import nltk
from nltk.tokenize import word_tokenize
from nltk import pos_tag
nltk.download('averaged_perceptron_tagger')
text = "NLTK provides powerful NLP tools."
tokens = word_tokenize(text)
tagged = pos_tag(tokens)
print(tagged)
In the above code, NNP
(proper noun), VBZ
(verb, 3rd person singular present), JJ
(adjective), etc., are the tags for each word indicating its part of speech.
2. Named Entity Recognition (NER)
Named Entity Recognition
(NER) is the process of identifying specific entities such as people, organizations, and locations in a text.
import numpy
from nltk.chunk import ne_chunk
nltk.download('maxent_ne_chunker')
nltk.download('words')
sentence = "I live in California."
tokens = word_tokenize(sentence)
tagged = pos_tag(tokens)
ner_tree = ne_chunk(tagged)
print(ner_tree)
The output appears as follows:
(S I/PRP live/VBP in/IN (GPE California/NNP) ./.)
Here, GPE
indicates a geopolitical entity, and NNP
signifies a proper noun.
How About Other Languages?
NLTK is primarily an English-based natural language processing library, so its support for languages like Korean is limited.
For processing languages like Korean, it's common to use libraries such as spaCy
or KoNLPy
alongside NLTK.
from konlpy.tag import Okt
okt = Okt()
text = "Python makes natural language processing easy."
print(okt.morphs(text)) # Morphological analysis
print(okt.nouns(text)) # Extracting nouns
print(okt.pos(text)) # POS tagging
This code allows you to extract morphemes, identify nouns, and tag parts of speech in a Korean sentence.
While NLTK is excellent for English natural language processing, using other libraries is advisable for handling languages like Korean.
Want to learn more?
Join CodeFriends Plus membership or enroll in a course to start your journey.