title
Natural Language Processing With Python and NLTK p.1 Tokenizing words and Sentences

description
Natural Language Processing is the task we give computers to read and understand (process) written text (natural language). By far, the most popular toolkit or API to do natural language processing is the Natural Language Toolkit for the Python programming language. The NLTK module comes packed full of everything from trained algorithms to identify parts of speech to unsupervised machine learning algorithms to help you train your own machine to understand a specific bit of text. NLTK also comes with a large corpora of data sets containing things like chat logs, movie reviews, journals, and much more! Bottom line, if you're going to be doing natural language processing, you should definitely look into NLTK! Playlist link: https://www.youtube.com/watch?v=FLZvOKSCkxY&list=PLQVvvaa0QuDf2JswnfiGkliBInZnIC4HL&index=1 sample code: http://pythonprogramming.net http://hkinsley.com https://twitter.com/sentdex http://sentdex.com http://seaofbtc.com

detail
{'title': 'Natural Language Processing With Python and NLTK p.1 Tokenizing words and Sentences', 'heatmap': [{'end': 457.156, 'start': 418.667, 'weight': 0.881}, {'end': 671.791, 'start': 653.156, 'weight': 0.938}, {'end': 960.862, 'start': 936.185, 'weight': 0.939}], 'summary': 'Tutorial series covers python nltk module for nlp and sentiment analysis, acquiring and utilizing nltk for text processing, and the powerful capabilities of nltk for tokenizing, part-of-speech tagging, and supporting multiple languages, saving hours of manual work with regular expressions.', 'chapters': [{'end': 185.454, 'segs': [{'end': 67.696, 'src': 'embed', 'start': 3.055, 'weight': 0, 'content': [{'end': 10.2, 'text': 'Hello everybody and welcome to a Python programming tutorial for the NLTK or Natural Language Toolkit module.', 'start': 3.055, 'duration': 7.145}, {'end': 15.684, 'text': 'The Natural Language Toolkit module is for natural language processing or NLP.', 'start': 10.621, 'duration': 5.063}, {'end': 24.731, 'text': 'So what is that? So natural language processing is the process of getting a computer to understand natural language.', 'start': 16.145, 'duration': 8.586}, {'end': 30.795, 'text': 'Now, usually this is in the form of written language And sometimes it can be in the form of spoken language.', 'start': 24.791, 'duration': 6.004}, {'end': 35.119, 'text': 'but usually spoken language gets converted to written language, then to numbers.', 'start': 30.795, 'duration': 4.324}, {'end': 38.501, 'text': "But sometimes it doesn't, it just gets straight converted to numbers as well.", 'start': 35.199, 'duration': 3.302}, {'end': 45.687, 'text': 'So it is the process of converting some form of language to something that the computer can understand, which is numbers.', 'start': 39.262, 'duration': 6.425}, {'end': 48.128, 'text': 'So what can we do with that?', 'start': 46.367, 'duration': 1.761}, {'end': 56.051, 'text': 'So NLTK is actually the first module that I ever worked with and is actually the reason why I chose the Python programming language,', 'start': 48.228, 'duration': 7.823}, {'end': 63.954, 'text': 'because really no other programming language has any sort of API or module, or whatever you want to call it for natural language processing.', 'start': 56.051, 'duration': 7.903}, {'end': 66.055, 'text': 'So this here is my example.', 'start': 64.513, 'duration': 1.542}, {'end': 67.696, 'text': 'This is my personal example.', 'start': 66.115, 'duration': 1.581}], 'summary': 'Nltk module for nlp in python, converts language to numbers.', 'duration': 64.641, 'max_score': 3.055, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FLZvOKSCkxY/pics/FLZvOKSCkxY3055.jpg'}, {'end': 111.665, 'src': 'embed', 'start': 87.518, 'weight': 2, 'content': [{'end': 93.74, 'text': 'so, for example, we do sentiment analysis for finance stuff, so this would be for stocks.', 'start': 87.518, 'duration': 6.222}, {'end': 101.483, 'text': 'so we could choose Apple, for example, and this is the sentiment analysis for Apple, and we can actually see the sentiment has been going up pretty,', 'start': 93.74, 'duration': 7.743}, {'end': 104.104, 'text': "pretty strong, and today's actually a pretty good day for Apple as well.", 'start': 101.483, 'duration': 2.621}, {'end': 105.644, 'text': "So there's that.", 'start': 105.084, 'duration': 0.56}, {'end': 106.724, 'text': 'Then we have like politics.', 'start': 105.664, 'duration': 1.06}, {'end': 108.725, 'text': 'So we measure sentiment on political issues.', 'start': 106.764, 'duration': 1.961}, {'end': 111.665, 'text': "We've got about 50 different political issues.", 'start': 109.345, 'duration': 2.32}], 'summary': 'Sentiment analysis for finance includes 50 political issues.', 'duration': 24.147, 'max_score': 87.518, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FLZvOKSCkxY/pics/FLZvOKSCkxY87518.jpg'}, {'end': 161.194, 'src': 'embed', 'start': 130.669, 'weight': 3, 'content': [{'end': 132.452, 'text': 'And then also we have geographical sentiment.', 'start': 130.669, 'duration': 1.783}, {'end': 135.534, 'text': 'This is probably my favorite one that I have so far.', 'start': 132.972, 'duration': 2.562}, {'end': 140.859, 'text': 'But based on what people are saying and where they are from, I plot it up on a globe.', 'start': 136.055, 'duration': 4.804}, {'end': 142.961, 'text': 'We go basically by city.', 'start': 141.64, 'duration': 1.321}, {'end': 145.643, 'text': 'So this gets as granular as per city.', 'start': 143.021, 'duration': 2.622}, {'end': 147.185, 'text': 'And so we can do this.', 'start': 146.284, 'duration': 0.901}, {'end': 150.087, 'text': 'We can get the last 30 days of sentiment globally.', 'start': 147.225, 'duration': 2.862}, {'end': 153.83, 'text': 'But also we can find out what are the popular topics that people are talking about.', 'start': 150.127, 'duration': 3.703}, {'end': 158.493, 'text': "So for the United States, or North America rather, you've got Love, YouTube.", 'start': 154.751, 'duration': 3.742}, {'end': 161.194, 'text': "YouTube's probably just because people are linking to YouTube videos.", 'start': 158.533, 'duration': 2.661}], 'summary': 'Geographical sentiment analysis tracks sentiments globally by city, with insights into popular topics like love and youtube in north america over the last 30 days.', 'duration': 30.525, 'max_score': 130.669, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FLZvOKSCkxY/pics/FLZvOKSCkxY130669.jpg'}], 'start': 3.055, 'title': 'Python nltk module for nlp and sentiment analysis', 'summary': 'Introduces the python nltk module for nlp, explaining its purpose in converting natural language to numbers for computer understanding. it also discusses the capabilities of nltk in natural language processing, including sentiment analysis for finance, politics, and geographical sentiment, with examples of sentiment trends and popular topics globally.', 'chapters': [{'end': 45.687, 'start': 3.055, 'title': 'Python nltk module for nlp', 'summary': 'Introduces the python nltk module for nlp, explaining its purpose in converting natural language to numbers for computer understanding.', 'duration': 42.632, 'highlights': ['Natural Language Toolkit module is for NLP, which is the process of getting a computer to understand natural language.', 'Natural language processing involves converting some form of language to numbers for computer understanding.']}, {'end': 185.454, 'start': 46.367, 'title': 'Nltk and sentiment analysis in natural language processing', 'summary': 'Discusses the capabilities of nltk in natural language processing, including sentiment analysis for finance, politics, and geographical sentiment, with examples of sentiment trends and popular topics globally.', 'duration': 139.087, 'highlights': ['NLTK is the first module the speaker worked with and was the reason for choosing Python programming language. The speaker chose Python programming language due to NLTK being the first module they worked with for natural language processing.', 'Sentiment analysis is conducted for finance, politics, and geographical sentiment, with specific examples such as sentiment analysis for stocks and political issues. The speaker provides examples of sentiment analysis for stocks, political issues, and geographical sentiment, showcasing the wide application of sentiment analysis.', "Geographical sentiment analysis involves plotting sentiments on a globe based on people's locations, allowing the identification of popular topics and opinions globally. The speaker highlights the capability to plot geographical sentiments globally, identifying popular topics and opinions based on location."]}], 'duration': 182.399, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FLZvOKSCkxY/pics/FLZvOKSCkxY3055.jpg', 'highlights': ['Natural language processing involves converting some form of language to numbers for computer understanding.', 'Natural Language Toolkit module is for NLP, which is the process of getting a computer to understand natural language.', 'Sentiment analysis is conducted for finance, politics, and geographical sentiment, with specific examples such as sentiment analysis for stocks and political issues.', "Geographical sentiment analysis involves plotting sentiments on a globe based on people's locations, allowing the identification of popular topics and opinions globally.", 'NLTK is the first module the speaker worked with and was the reason for choosing Python programming language.']}, {'end': 812.856, 'segs': [{'end': 374.817, 'src': 'embed', 'start': 338.068, 'weight': 0, 'content': [{'end': 341.27, 'text': "So you can grab this if you don't know how to use pip.", 'start': 338.068, 'duration': 3.202}, {'end': 350.636, 'text': "Otherwise, once you have NLTK and you've got it installed, or at least you think you do, you're ready to move on to the next part.", 'start': 342.371, 'duration': 8.265}, {'end': 353.898, 'text': "So I'm going to go ahead and just minimize this stuff because we don't need this right now.", 'start': 350.876, 'duration': 3.022}, {'end': 363.668, 'text': "So the next step that you're going to want to do is you're going to want to make sure you can go import NLTK in IDLE or whatever IDE you use.", 'start': 356.161, 'duration': 7.507}, {'end': 365.929, 'text': 'So this is just where you type your code.', 'start': 364.168, 'duration': 1.761}, {'end': 367.391, 'text': 'So I like to use IDLE.', 'start': 366.01, 'duration': 1.381}, {'end': 370.694, 'text': 'Everyone has their own favorite and everyone thinks everyone should use theirs.', 'start': 367.511, 'duration': 3.183}, {'end': 371.935, 'text': 'I like IDLE.', 'start': 371.334, 'duration': 0.601}, {'end': 374.817, 'text': 'You can use whatever you want, PyCharm or whatever.', 'start': 372.235, 'duration': 2.582}], 'summary': 'Installing nltk with pip and setting up in idle or preferred ide.', 'duration': 36.749, 'max_score': 338.068, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FLZvOKSCkxY/pics/FLZvOKSCkxY338068.jpg'}, {'end': 457.156, 'src': 'heatmap', 'start': 418.667, 'weight': 0.881, 'content': [{'end': 420.147, 'text': "It's taking a second.", 'start': 418.667, 'duration': 1.48}, {'end': 421.008, 'text': 'And boom.', 'start': 420.608, 'duration': 0.4}, {'end': 423.248, 'text': 'So the import of nltk worked.', 'start': 421.368, 'duration': 1.88}, {'end': 430.811, 'text': 'Great Now what we want to do is go ahead and do nltk.download and then empty parameters.', 'start': 423.769, 'duration': 7.042}, {'end': 431.871, 'text': 'Run that.', 'start': 431.271, 'duration': 0.6}, {'end': 434.992, 'text': 'And you should get a pop-up window, not just this one, but another one.', 'start': 432.031, 'duration': 2.961}, {'end': 436.904, 'text': "I'm waiting for it.", 'start': 436.081, 'duration': 0.823}, {'end': 438.99, 'text': "Sometimes it doesn't pop up as it should.", 'start': 437.084, 'duration': 1.906}, {'end': 442.199, 'text': "It's not here yet, but it'll show hopefully soon.", 'start': 439.912, 'duration': 2.287}, {'end': 444.708, 'text': 'so it popped under.', 'start': 443.567, 'duration': 1.141}, {'end': 445.889, 'text': 'here it is so.', 'start': 444.708, 'duration': 1.181}, {'end': 448.991, 'text': "this is a window you'll get now if you are operating headless.", 'start': 445.889, 'duration': 3.102}, {'end': 452.053, 'text': "so say you're operating via shell or something like that.", 'start': 448.991, 'duration': 3.062}, {'end': 453.353, 'text': 'you can still do this.', 'start': 452.053, 'duration': 1.3}, {'end': 457.156, 'text': "you don't actually need a gui or x or whatever.", 'start': 453.353, 'duration': 3.803}], 'summary': 'Using nltk, download and run with empty parameters to get a pop-up window for headless operation.', 'duration': 38.489, 'max_score': 418.667, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FLZvOKSCkxY/pics/FLZvOKSCkxY418667.jpg'}, {'end': 519.73, 'src': 'embed', 'start': 489.926, 'weight': 2, 'content': [{'end': 498.532, 'text': "Okay, so once you have everything downloaded, what you're gonna wanna do is maybe see like a real basic and quick example of what NLTK can do for you.", 'start': 489.926, 'duration': 8.606}, {'end': 508.678, 'text': 'So NLTK and natural language processing obviously is an interest for a lot of people that want to have computers to read or understand text or speech or whatever.', 'start': 498.612, 'duration': 10.066}, {'end': 519.73, 'text': "So what is the first step that you might do when you wanna pull apart a body of text, let's say? Well, you're going to want to organize it somehow.", 'start': 509.059, 'duration': 10.671}], 'summary': 'Nltk can help with natural language processing for text and speech understanding.', 'duration': 29.804, 'max_score': 489.926, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FLZvOKSCkxY/pics/FLZvOKSCkxY489926.jpg'}, {'end': 653.136, 'src': 'embed', 'start': 599.134, 'weight': 3, 'content': [{'end': 601.635, 'text': "so generally you're going to have two forms of tokenizers.", 'start': 599.134, 'duration': 2.501}, {'end': 612.138, 'text': "you're going to have word tokenizers and then you're going to have sentence tokenizers, and what they do is a word tokenizer just separates by words.", 'start': 601.635, 'duration': 10.503}, {'end': 617.02, 'text': 'sentence tokenizer separates by sentence easy.', 'start': 612.138, 'duration': 4.882}, {'end': 618.34, 'text': 'so just keep that in mind.', 'start': 617.02, 'duration': 1.32}, {'end': 619.621, 'text': "that's what tokenizing is.", 'start': 618.34, 'duration': 1.281}, {'end': 622.882, 'text': "you're also going to hear terms of lexicon and corporas.", 'start': 619.621, 'duration': 3.261}, {'end': 624.724, 'text': 'what the heck.', 'start': 623.584, 'duration': 1.14}, {'end': 629.325, 'text': 'so a corpora is just a body of text.', 'start': 624.724, 'duration': 4.601}, {'end': 634.327, 'text': 'so think about a corpora might be a body of medical journal journals.', 'start': 629.325, 'duration': 5.002}, {'end': 638.148, 'text': 'so example would be medical journals.', 'start': 634.327, 'duration': 3.821}, {'end': 642.389, 'text': "so this is kind of like it's a body of text where they're all kind of around the same thing.", 'start': 638.148, 'duration': 4.241}, {'end': 643.589, 'text': 'so you might have medical journals.', 'start': 642.389, 'duration': 1.2}, {'end': 650.235, 'text': 'you might have an example of, maybe, presidential speeches.', 'start': 643.589, 'duration': 6.646}, {'end': 653.136, 'text': 'that was another one, stuff like that.', 'start': 650.235, 'duration': 2.901}], 'summary': 'Tokenizers separate text into words or sentences. corpora are bodies of text, like medical journals or presidential speeches.', 'duration': 54.002, 'max_score': 599.134, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FLZvOKSCkxY/pics/FLZvOKSCkxY599134.jpg'}, {'end': 685.441, 'src': 'heatmap', 'start': 653.156, 'weight': 0.938, 'content': [{'end': 658.419, 'text': "Then you've got lexicon, and also a corpora would be anything in the English language.", 'start': 653.156, 'duration': 5.263}, {'end': 660.6, 'text': "That's another example of a corpora.", 'start': 659.139, 'duration': 1.461}, {'end': 667.944, 'text': "Then you've got lexicon, and lexicon, you can just think of a lexicon like a dictionary, okay? This is the words and their meanings.", 'start': 661.28, 'duration': 6.664}, {'end': 671.791, 'text': 'now again, this varies, right.', 'start': 669.85, 'duration': 1.941}, {'end': 675.334, 'text': 'so for the English language, that would be like the English dictionary.', 'start': 671.791, 'duration': 3.543}, {'end': 685.441, 'text': 'but consider, for example, the difference between investor speak and regular English speak.', 'start': 675.334, 'duration': 10.107}], 'summary': 'A lexicon contains words and meanings, like a dictionary in english language.', 'duration': 32.285, 'max_score': 653.156, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FLZvOKSCkxY/pics/FLZvOKSCkxY653156.jpg'}], 'start': 186.181, 'title': 'Acquiring and utilizing nltk for text processing', 'summary': 'Covers the process of acquiring and installing nltk, including acquiring python, using pip to install nltk, and downloading nltk resources. it also introduces the basics of nltk, emphasizing the value of nltk for natural language processing and the importance of organizing text by paragraph and sentence.', 'chapters': [{'end': 485.804, 'start': 186.181, 'title': 'Acquiring and installing nltk', 'summary': 'Explains the process of acquiring and installing nltk, including acquiring python, using pip to install nltk, and downloading nltk resources, which can take a few minutes to hours depending on the internet connection.', 'duration': 299.623, 'highlights': ["Acquiring Python and using pip to install NLTK To install NLTK, users need to acquire Python from Python.org, choose the latest version, and then use pip to install NLTK, which can be a user's first exposure to Python.", 'Downloading NLTK resources After installing NLTK, users need to run nltk.download() and download all resources, which can take a varying amount of time depending on the internet connection.', 'Accessing NLTK in an IDE Users can access NLTK in their preferred IDE by importing NLTK and running nltk.download() to acquire resources, providing a seamless way to utilize NLTK for natural language processing tasks.']}, {'end': 812.856, 'start': 489.926, 'title': 'Nltk basics and tokenizing', 'summary': 'Introduces the basics of nltk, including tokenizing and the importance of organizing text by paragraph and sentence, emphasizing the value of nltk for natural language processing.', 'duration': 322.93, 'highlights': ['NLTK and natural language processing are introduced, emphasizing the importance of organizing text by paragraph and sentence. NLTK and natural language processing are introduced, emphasizing the importance of organizing text by paragraph and sentence.', 'The significance of tokenizing, specifically word and sentence tokenizers, is explained for organizing text. The significance of tokenizing, specifically word and sentence tokenizers, is explained for organizing text.', 'The concept of corpora as a body of text and lexicon as words and their meanings is discussed, providing examples such as medical journals and differences in investor speak and regular English speak. The concept of corpora as a body of text and lexicon as words and their meanings is discussed, providing examples such as medical journals and differences in investor speak and regular English speak.']}], 'duration': 626.675, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FLZvOKSCkxY/pics/FLZvOKSCkxY186181.jpg', 'highlights': ["Acquiring Python and using pip to install NLTK To install NLTK, users need to acquire Python from Python.org, choose the latest version, and then use pip to install NLTK, which can be a user's first exposure to Python.", 'Accessing NLTK in an IDE Users can access NLTK in their preferred IDE by importing NLTK and running nltk.download() to acquire resources, providing a seamless way to utilize NLTK for natural language processing tasks.', 'Downloading NLTK resources After installing NLTK, users need to run nltk.download() and download all resources, which can take a varying amount of time depending on the internet connection.', 'The concept of corpora as a body of text and lexicon as words and their meanings is discussed, providing examples such as medical journals and differences in investor speak and regular English speak.', 'NLTK and natural language processing are introduced, emphasizing the importance of organizing text by paragraph and sentence.', 'The significance of tokenizing, specifically word and sentence tokenizers, is explained for organizing text.']}, {'end': 1187.577, 'segs': [{'end': 943.774, 'src': 'embed', 'start': 915.525, 'weight': 0, 'content': [{'end': 921.65, 'text': "but it would be a pretty significant one to get as much as NLTK is going to get, because that's basically how NLTK does it.", 'start': 915.525, 'duration': 6.125}, {'end': 931.66, 'text': "so we're going to utilize NLTK to split this by sentence and by word and and at least show you how powerful NLTK is and save you like hours of writing your own regular expressions.", 'start': 921.65, 'duration': 10.01}, {'end': 936.125, 'text': "Okay, so first let's do by sentence.", 'start': 932.381, 'duration': 3.744}, {'end': 943.774, 'text': "So let's print sent underscore tokenize and we want to sent tokenize example text.", 'start': 936.185, 'duration': 7.589}], 'summary': 'Utilize nltk to split text by sentence and word, saving hours of regex writing.', 'duration': 28.249, 'max_score': 915.525, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FLZvOKSCkxY/pics/FLZvOKSCkxY915525.jpg'}, {'end': 960.862, 'src': 'heatmap', 'start': 936.185, 'weight': 0.939, 'content': [{'end': 943.774, 'text': "So let's print sent underscore tokenize and we want to sent tokenize example text.", 'start': 936.185, 'duration': 7.589}, {'end': 946.356, 'text': "So I'm just going to copy and paste example text right in there.", 'start': 943.794, 'duration': 2.562}, {'end': 947.598, 'text': "So let's print that.", 'start': 946.857, 'duration': 0.741}, {'end': 950.635, 'text': 'And it creates a list.', 'start': 949.754, 'duration': 0.881}, {'end': 953.997, 'text': 'So this is just denoting that this is a Python list.', 'start': 951.015, 'duration': 2.982}, {'end': 960.862, 'text': 'So the first element, hello, Mr. Smith, how are you doing today? So it did not fail or fall for this.', 'start': 954.418, 'duration': 6.444}], 'summary': "Using sent_tokenize in python, a list is created with the first element being 'hello, mr. smith, how are you doing today?'", 'duration': 24.677, 'max_score': 936.185, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FLZvOKSCkxY/pics/FLZvOKSCkxY936185.jpg'}, {'end': 1085.903, 'src': 'embed', 'start': 1061.472, 'weight': 2, 'content': [{'end': 1067.417, 'text': 'but telling sentences not so easy and telling words, Surprisingly enough, not so easy.', 'start': 1061.472, 'duration': 5.945}, {'end': 1069.737, 'text': 'Now, of course, this is just the real basic stuff.', 'start': 1067.877, 'duration': 1.86}, {'end': 1075.139, 'text': 'This is more of pre-processing of anything rather than any sort of analysis or anything like that.', 'start': 1069.777, 'duration': 5.362}, {'end': 1083.902, 'text': "But as we go on, we'll see that NLTK can do really powerful things, like part of speech tagging, where it recognizes what part of speech things are,", 'start': 1075.519, 'duration': 8.383}, {'end': 1084.383, 'text': 'and all that.', 'start': 1083.902, 'duration': 0.481}, {'end': 1085.903, 'text': "It's a lot more complex.", 'start': 1084.843, 'duration': 1.06}], 'summary': 'Introduction to nltk for basic pre-processing, with potential for powerful part of speech tagging and complex analysis.', 'duration': 24.431, 'max_score': 1061.472, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FLZvOKSCkxY/pics/FLZvOKSCkxY1061472.jpg'}], 'start': 813.597, 'title': 'Nltk for text processing', 'summary': 'Introduces nltk for text processing, demonstrating its power by tokenizing, part-of-speech tagging, and supporting multiple languages, with the ability to create custom tokenizers, saving hours of manual work with regular expressions.', 'chapters': [{'end': 1013.408, 'start': 813.597, 'title': 'Nltk text tokenization', 'summary': 'Demonstrates using nltk to tokenize sentences and words from an example text, highlighting the challenges of accurately splitting sentences and words and the benefits of utilizing nltk, saving hours of writing own regular expressions.', 'duration': 199.811, 'highlights': ["NLTK is utilized to tokenize sentences and words, accurately capturing complex sentence structures and punctuation. NLTK is used to tokenize sentences and words, accurately capturing complex sentence structures and punctuation, such as 'Hello, Mr. Smith' and 'Mr. period' as its own word, saving time and effort in writing regular expressions.", "The challenges of accurately splitting sentences and words are discussed, where simple methods like splitting by space and capital letters may not be accurate, especially with complex sentence structures. The challenges of accurately splitting sentences and words are discussed, where simple methods like splitting by space and capital letters may not be accurate, especially with complex sentence structures such as 'Hello, Mr. Smith' and 'Mr. period' as its own word.", 'The benefits of utilizing NLTK are highlighted, as it can save significant time in writing regular expressions for tokenization. The benefits of utilizing NLTK are highlighted, as it can save significant time in writing regular expressions for tokenization, providing an efficient and powerful alternative to custom regular expressions.']}, {'end': 1187.577, 'start': 1014.248, 'title': 'Nltk and text processing', 'summary': 'Introduces nltk for text processing, demonstrating its power by tokenizing, part-of-speech tagging, and supporting multiple languages, with the ability to create custom tokenizers. nltk can perform powerful tasks that would take hours with regular expressions.', 'duration': 173.329, 'highlights': ['The chapter introduces NLTK for text processing, demonstrating its power by tokenizing, part-of-speech tagging, and supporting multiple languages, with the ability to create custom tokenizers.', 'NLTK can perform powerful tasks that would take hours with regular expressions.', 'NLTK works with the English language by default but also supports other languages, with the potential to create custom trainers for any language.', 'The chapter provides a quick example of how NLTK can be utilized to pull apart text, sentences, and paragraphs, showcasing its power in just a few lines of code.']}], 'duration': 373.98, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/FLZvOKSCkxY/pics/FLZvOKSCkxY813597.jpg', 'highlights': ["NLTK is utilized to tokenize sentences and words, accurately capturing complex sentence structures and punctuation, such as 'Hello, Mr. Smith' and 'Mr. period' as its own word, saving time and effort in writing regular expressions.", 'The benefits of utilizing NLTK are highlighted, as it can save significant time in writing regular expressions for tokenization, providing an efficient and powerful alternative to custom regular expressions.', 'The chapter introduces NLTK for text processing, demonstrating its power by tokenizing, part-of-speech tagging, and supporting multiple languages, with the ability to create custom tokenizers.', 'NLTK can perform powerful tasks that would take hours with regular expressions.', 'The chapter provides a quick example of how NLTK can be utilized to pull apart text, sentences, and paragraphs, showcasing its power in just a few lines of code.']}], 'highlights': ['NLTK can perform powerful tasks that would take hours with regular expressions.', "NLTK is utilized to tokenize sentences and words, accurately capturing complex sentence structures and punctuation, such as 'Hello, Mr. Smith' and 'Mr. period' as its own word, saving time and effort in writing regular expressions.", 'The chapter introduces NLTK for text processing, demonstrating its power by tokenizing, part-of-speech tagging, and supporting multiple languages, with the ability to create custom tokenizers.', 'The benefits of utilizing NLTK are highlighted, as it can save significant time in writing regular expressions for tokenization, providing an efficient and powerful alternative to custom regular expressions.', 'NLTK and natural language processing are introduced, emphasizing the importance of organizing text by paragraph and sentence.', "Acquiring Python and using pip to install NLTK To install NLTK, users need to acquire Python from Python.org, choose the latest version, and then use pip to install NLTK, which can be a user's first exposure to Python.", 'Accessing NLTK in an IDE Users can access NLTK in their preferred IDE by importing NLTK and running nltk.download() to acquire resources, providing a seamless way to utilize NLTK for natural language processing tasks.', 'Downloading NLTK resources After installing NLTK, users need to run nltk.download() and download all resources, which can take a varying amount of time depending on the internet connection.', 'The concept of corpora as a body of text and lexicon as words and their meanings is discussed, providing examples such as medical journals and differences in investor speak and regular English speak.', 'Sentiment analysis is conducted for finance, politics, and geographical sentiment, with specific examples such as sentiment analysis for stocks and political issues.']}