title
Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Training | Edureka

description
๐Ÿ”ฅ NLP Using Python (Use Code "๐˜๐Ž๐”๐“๐”๐๐„๐Ÿ๐ŸŽ") - https://www.edureka.co/python-natural-language-processing-course This Edureka video will provide you with a comprehensive and detailed knowledge of Natural Language Processing, popularly known as NLP. You will also learn about the different steps involved in processing the human language like Tokenization, Stemming, Lemmatization and much more along with a demo on each one of the topics. The following topics covered in this video : 1. The Evolution of Human Language 2. What is Text Mining? 3. What is Natural Language Processing? 4. Applications of NLP 5. NLP Components and Demo Do subscribe to our channel and hit the bell icon to never miss an update from us in the future: https://goo.gl/6ohpTV --------------------------------------------------------------------------------------------------------- Facebook: https://www.facebook.com/edurekaIN/ Twitter: https://twitter.com/edurekain LinkedIn: https://www.linkedin.com/company/edureka Instagram: https://www.instagram.com/edureka_learning/ --------------------------------------------------------------------------------------------------------- - - - - - - - - - - - - - - How it Works? 1. This is 21 hrs of Online Live Instructor-led course. Weekend class: 7 sessions of 3 hours each. 2. We have a 24x7 One-on-One LIVE Technical Support to help you with any problems you might face or any clarifications you may require during the course. 3. At the end of the training you will have to undergo a 2-hour LIVE Practical Exam based on which we will provide you a Grade and a Verifiable Certificate! - - - - - - - - - - - - - - About the Course Edureka's Natural Language Processing using Python Training focuses on step by step guide to NLP and Text Analytics with extensive hands-on using Python Programming Language. It has been packed up with a lot of real-life examples, where you can apply the learnt content to use. Features such as Semantic Analysis, Text Processing, Sentiment Analytics and Machine Learning have been discussed. This course is for anyone who works with data and textโ€“ with good analytical background and little exposure to Python Programming Language. It is designed to help you understand the important concepts and techniques used in Natural Language Processing using Python Programming Language. You will be able to build your own machine learning model for text classification. Towards the end of the course, we will be discussing various practical use cases of NLP in python programming language to enhance your learning experience. -------------------------- Who Should go for this course ? Edurekaโ€™s NLP Training is a good fit for the below professionals: From a college student having exposure to programming to a technical architect/lead in an organisation Developers aspiring to be a โ€˜Data Scientist' Analytics Managers who are leading a team of analysts Business Analysts who want to understand Text Mining Techniques 'Python' professionals who want to design automatic predictive models on text data "This is apt for everyoneโ€ --------------------------------- Why Learn Natural Language Processing or NLP? Natural Language Processing (or Text Analytics/Text Mining) applies analytic tools to learn from collections of text data, like social media, books, newspapers, emails, etc. The goal can be considered to be similar to humans learning by reading such material. However, using automated algorithms we can learn from massive amounts of text, very much more than a human can. It is bringing a new revolution by giving rise to chatbots and virtual assistants to help one system address queries of millions of users. NLP is a branch of artificial intelligence that has many important implications on the ways that computers and humans interact. Human language, developed over thousands and thousands of years, has become a nuanced form of communication that carries a wealth of information that often transcends the words alone. NLP will become an important technology in bridging the gap between human communication and digital data. --------------------------------- For more information, please write back to us at sales@edureka.co or call us at IND: 9606058406 / US: 18338555775 (toll-free).

detail
{'title': 'Natural Language Processing (NLP) & Text Mining Tutorial Using NLTK | NLP Training | Edureka', 'heatmap': [{'end': 1339.029, 'start': 1302.785, 'weight': 1}], 'summary': 'Tutorial covers text mining, nlp basics, nltk usage for tokenization, stemming, pos tagging, and text classification, demonstrating practical applications and the importance of nlp in handling unstructured textual data.', 'chapters': [{'end': 48.847, 'segs': [{'end': 48.847, 'src': 'embed', 'start': 11.92, 'weight': 0, 'content': [{'end': 16.522, 'text': 'Hello everyone and welcome to this interesting session on text mining and NLP.', 'start': 11.92, 'duration': 4.602}, {'end': 20.784, 'text': "So before moving forward, let's have a quick look at the agenda for today's session.", 'start': 17.263, 'duration': 3.521}, {'end': 24.967, 'text': "I'll start off by explaining the importance of language and its evolution.", 'start': 21.425, 'duration': 3.542}, {'end': 28.548, 'text': "Then we'll understand what is text mining and moving forward.", 'start': 25.487, 'duration': 3.061}, {'end': 31.49, 'text': "We'll see how text mining and NLP are connected.", 'start': 28.588, 'duration': 2.902}, {'end': 35.939, 'text': 'Now NLP here stands for natural language processing and moving forward.', 'start': 32.176, 'duration': 3.763}, {'end': 43.503, 'text': "We'll see the various application of NLP in the industry and the different components of NLP, along with the demo and finally, in this video,", 'start': 35.959, 'duration': 7.544}, {'end': 47.386, 'text': 'with an end-to-end demo where we will use NLP along with machine learning.', 'start': 43.503, 'duration': 3.883}, {'end': 48.847, 'text': "So let's get started.", 'start': 47.926, 'duration': 0.921}], 'summary': 'Session covers text mining, nlp, and nlp applications in industry with a demo.', 'duration': 36.927, 'max_score': 11.92, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA11920.jpg'}], 'start': 11.92, 'title': 'Text mining and nlp', 'summary': 'Discusses the agenda for a session on text mining and nlp, covering the importance of language evolution, the connection between text mining and nlp, applications and components of nlp, and an end-to-end demo using nlp and machine learning.', 'chapters': [{'end': 48.847, 'start': 11.92, 'title': 'Text mining and nlp', 'summary': 'Discusses the agenda for a session on text mining and nlp, covering the importance of language evolution, the connection between text mining and nlp, applications and components of nlp, and an end-to-end demo using nlp and machine learning.', 'duration': 36.927, 'highlights': ['The session covers the importance of language and its evolution, along with the connection between text mining and NLP.', 'The various applications of NLP in the industry and the different components of NLP are discussed.', 'The session concludes with an end-to-end demo using NLP along with machine learning.']}], 'duration': 36.927, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA11920.jpg', 'highlights': ['The session covers the importance of language and its evolution, along with the connection between text mining and NLP.', 'The various applications of NLP in the industry and the different components of NLP are discussed.', 'The session concludes with an end-to-end demo using NLP along with machine learning.']}, {'end': 658.543, 'segs': [{'end': 88.258, 'src': 'embed', 'start': 49.668, 'weight': 2, 'content': [{'end': 54.811, 'text': 'Now the success of human race is because of the ability to communicate and share information.', 'start': 49.668, 'duration': 5.143}, {'end': 58.38, 'text': 'Now that is where the concept of language comes in.', 'start': 55.518, 'duration': 2.862}, {'end': 67.528, 'text': 'However, many such standards came up, resulting in many such language, with each language having its own set of basic shapes called alphabets,', 'start': 58.881, 'duration': 8.647}, {'end': 73.433, 'text': 'and the combination of alphabets resulted in words, and the combination of these words, arranged meaningfully,', 'start': 67.528, 'duration': 5.905}, {'end': 75.395, 'text': 'resulted in the formation of a sentence.', 'start': 73.433, 'duration': 1.962}, {'end': 84.322, 'text': 'Now each language has a set of rules that is used while developing these sentences and these set of rules are also known as grammar.', 'start': 76.395, 'duration': 7.927}, {'end': 88.258, 'text': "Now coming to today's world, that is, the 21st century.", 'start': 85.456, 'duration': 2.802}], 'summary': 'Human success due to communication. languages use alphabets, words, sentences, and grammar rules. 21st century context.', 'duration': 38.59, 'max_score': 49.668, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA49668.jpg'}, {'end': 135.142, 'src': 'embed', 'start': 108.985, 'weight': 1, 'content': [{'end': 112.908, 'text': 'Now in order to produce significant and actionable insights from the text data.', 'start': 108.985, 'duration': 3.923}, {'end': 116.85, 'text': 'It is important to get acquainted with the techniques of text analysis.', 'start': 113.328, 'duration': 3.522}, {'end': 120.993, 'text': "So let's understand what is text analysis or text mining now.", 'start': 117.41, 'duration': 3.583}, {'end': 126.136, 'text': 'It is the process of deriving meaningful information from natural language text,', 'start': 121.273, 'duration': 4.863}, {'end': 130.219, 'text': 'and text mining usually involves the process of structuring the input text,', 'start': 126.136, 'duration': 4.083}, {'end': 135.142, 'text': 'deriving patterns within the structure data and finally evaluating the interpreted output.', 'start': 130.219, 'duration': 4.923}], 'summary': 'Text analysis extracts insights from natural language text.', 'duration': 26.157, 'max_score': 108.985, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA108985.jpg'}, {'end': 192.135, 'src': 'embed', 'start': 167.899, 'weight': 5, 'content': [{'end': 174.824, 'text': 'So NLP refers to the artificial intelligence method of communicating with an intelligent system using natural language.', 'start': 167.899, 'duration': 6.925}, {'end': 180.287, 'text': 'by utilizing NLP and its components, one can organize the massive chunks of textual data,', 'start': 174.824, 'duration': 5.463}, {'end': 189.633, 'text': 'perform numerous automated tasks and solve a wide range of problems, such as automatic summarization, machine translation, name entity recognition,', 'start': 180.287, 'duration': 9.346}, {'end': 192.135, 'text': 'speech recognition and topic segmentation.', 'start': 189.633, 'duration': 2.502}], 'summary': 'Nlp enables organizing and processing massive textual data for various automated tasks and problem-solving.', 'duration': 24.236, 'max_score': 167.899, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA167899.jpg'}, {'end': 254.451, 'src': 'embed', 'start': 225.902, 'weight': 0, 'content': [{'end': 231.553, 'text': 'So now if we have a look at the various applications of NLP, First of all, we have sentimental analysis.', 'start': 225.902, 'duration': 5.651}, {'end': 234.315, 'text': 'Now, this is a field where NLP is used heavily.', 'start': 231.873, 'duration': 2.442}, {'end': 236.517, 'text': 'We have speech recognition now here.', 'start': 234.635, 'duration': 1.882}, {'end': 241.601, 'text': 'We are also talking about the voice assistants, like Google Assistant, Cortana and the Siri now.', 'start': 236.557, 'duration': 5.044}, {'end': 244.743, 'text': 'next we have the implementation of chatbot, as I discussed earlier.', 'start': 241.601, 'duration': 3.142}, {'end': 249.427, 'text': 'just now now you might have used the customer care chat services of any app.', 'start': 244.743, 'duration': 4.684}, {'end': 254.451, 'text': 'It also uses NLP to process the data entered and provide the response based on the input.', 'start': 249.907, 'duration': 4.544}], 'summary': 'Nlp applications include sentiment analysis, speech recognition, voice assistants, and chatbots in customer care services.', 'duration': 28.549, 'max_score': 225.902, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA225902.jpg'}, {'end': 311.054, 'src': 'embed', 'start': 284.736, 'weight': 6, 'content': [{'end': 289.059, 'text': 'and one of the coolest application of NLP is advertisement matching now here.', 'start': 284.736, 'duration': 4.323}, {'end': 292.681, 'text': 'What we mean is basically recommendation of the ads based on your history.', 'start': 289.259, 'duration': 3.422}, {'end': 296.183, 'text': 'Now NLP is divided into two major components.', 'start': 293.661, 'duration': 2.522}, {'end': 303.489, 'text': 'that is, the natural language understanding, which is also known as NLU, and we have the natural language generation, which is also known as NLG.', 'start': 296.183, 'duration': 7.306}, {'end': 311.054, 'text': 'The understanding involves tasks like mapping the given input into natural language into useful representations,', 'start': 304.249, 'duration': 6.805}], 'summary': 'Nlp in advertisement matching recommends ads based on user history.', 'duration': 26.318, 'max_score': 284.736, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA284736.jpg'}, {'end': 356.742, 'src': 'embed', 'start': 325.962, 'weight': 9, 'content': [{'end': 329.506, 'text': 'Now NLU is usually considered harder than NLG.', 'start': 325.962, 'duration': 3.544}, {'end': 333.37, 'text': 'Now, you might be thinking that even a small child can understand a language.', 'start': 329.947, 'duration': 3.423}, {'end': 339.197, 'text': "So let's see what are the difficulties a machine faces while understanding any particular languages.", 'start': 333.931, 'duration': 5.266}, {'end': 342.169, 'text': 'Now, understanding a new language is very hard.', 'start': 339.787, 'duration': 2.382}, {'end': 347.174, 'text': 'Taking our English into consideration, there are a lot of ambiguity and that too in different levels.', 'start': 342.289, 'duration': 4.885}, {'end': 351.697, 'text': 'We have lexical ambiguity, syntactical ambiguity and referential ambiguity.', 'start': 347.714, 'duration': 3.983}, {'end': 356.742, 'text': 'So lexical ambiguity is the presence of two or more possible meanings within a single word.', 'start': 352.218, 'duration': 4.524}], 'summary': 'Nlu is challenging due to language ambiguities; english has lexical, syntactical, and referential ambiguities.', 'duration': 30.78, 'max_score': 325.962, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA325962.jpg'}, {'end': 495.511, 'src': 'embed', 'start': 470.464, 'weight': 7, 'content': [{'end': 476.37, 'text': 'We can use it to perform functions like classification tokenization stemming tagging and much more.', 'start': 470.464, 'duration': 5.906}, {'end': 481.218, 'text': 'Now once you install the NLTK library, you will see an NLTK downloader.', 'start': 477.074, 'duration': 4.144}, {'end': 487.824, 'text': 'It is a pop-up window which will come up and in that you have to select the all option and press the download button.', 'start': 481.638, 'duration': 6.186}, {'end': 495.511, 'text': 'It will download all the required files the corpora the models and all the different packages which are available in the NLTK.', 'start': 488.485, 'duration': 7.026}], 'summary': "Nltk library enables various functions like classification, tokenization, stemming, tagging. select 'all' in nltk downloader to download required files, corpora, models, packages.", 'duration': 25.047, 'max_score': 470.464, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA470464.jpg'}, {'end': 537.806, 'src': 'embed', 'start': 512.381, 'weight': 8, 'content': [{'end': 517.424, 'text': 'Now, tokenization involves three steps, which is breaking a complex sentence into words,', 'start': 512.381, 'duration': 5.043}, {'end': 524.689, 'text': 'understanding the importance of each words with the respect to the sentence and finally, produce a structural description on an input sentence.', 'start': 517.424, 'duration': 7.265}, {'end': 531.943, 'text': 'So if we have a look at the example here, considering this sentence Tokenization is the first step in NLP.', 'start': 525.45, 'duration': 6.493}, {'end': 537.806, 'text': 'Now when we divide it into tokens as you can see here, we have 1 2 3 4 5 6 and 7 tokens here.', 'start': 532.343, 'duration': 5.463}], 'summary': 'Tokenization breaks a sentence into 7 tokens in nlp.', 'duration': 25.425, 'max_score': 512.381, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA512381.jpg'}], 'start': 49.668, 'title': 'Language evolution and nlp basics', 'summary': 'Discusses the significance of language in human success, highlighting language diversity and communication rules. it also emphasizes the importance of nlp in handling unstructured textual data, with only 21% of data being structured, and its applications in deriving insights and solving various problems.', 'chapters': [{'end': 88.258, 'start': 49.668, 'title': 'Evolution of language', 'summary': "Discusses the importance of language in the success of the human race, the diversity of languages, and the rules and structure of language, emphasizing the significance of communication and information sharing for the human race's success.", 'duration': 38.59, 'highlights': ['The success of the human race is attributed to the ability to communicate and share information.', 'Diversity in languages has resulted in the development of many standards and languages, each with its own set of alphabets and grammar rules.', 'Language plays a crucial role in the 21st century for communication and information sharing.']}, {'end': 658.543, 'start': 88.258, 'title': 'Text analysis and nlp basics', 'summary': 'Discusses the importance of text mining, natural language processing (nlp), its applications, components, and challenges, emphasizing the need for nlp in handling unstructured textual data, with only 21% of available data being structured, and the process of deriving valuable insights and solving a wide range of problems through nlp.', 'duration': 570.285, 'highlights': ['Text mining involves deriving meaningful information from natural language text, with the majority of data being unstructured, and only 21% of available data being structured.', 'NLP aims to communicate with intelligent systems using natural language, organizing textual data, automating tasks, and solving problems, including automatic summarization, machine translation, and speech recognition.', 'NLP applications include sentiment analysis, speech recognition, chatbot implementation, machine translation, spell checking, keyword search, information extraction, and advertisement matching.', 'NLP is divided into natural language understanding (NLU) and natural language generation (NLG), with NLU involving mapping input into useful representations and analyzing different language aspects, while NLG focuses on producing meaningful natural language phrases and sentences.', 'Challenges in NLP include lexical ambiguity, syntactical ambiguity, and referential ambiguity, which arise from the complexities of language understanding.', 'The NLTK library is essential for NLP, providing tools for working with human language data, performing functions like classification, tokenization, stemming, and tagging.', 'Tokenization is a key process in NLP, involving breaking strings into tokens, understanding the importance of each word in a sentence, and producing a structural description of the input sentence.']}], 'duration': 608.875, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA49668.jpg', 'highlights': ['NLP applications include sentiment analysis, speech recognition, chatbot implementation, machine translation, spell checking, keyword search, information extraction, and advertisement matching.', 'Text mining involves deriving meaningful information from natural language text, with the majority of data being unstructured, and only 21% of available data being structured.', 'The success of the human race is attributed to the ability to communicate and share information.', 'Diversity in languages has resulted in the development of many standards and languages, each with its own set of alphabets and grammar rules.', 'Language plays a crucial role in the 21st century for communication and information sharing.', 'NLP aims to communicate with intelligent systems using natural language, organizing textual data, automating tasks, and solving problems, including automatic summarization, machine translation, and speech recognition.', 'NLP is divided into natural language understanding (NLU) and natural language generation (NLG), with NLU involving mapping input into useful representations and analyzing different language aspects, while NLG focuses on producing meaningful natural language phrases and sentences.', 'The NLTK library is essential for NLP, providing tools for working with human language data, performing functions like classification, tokenization, stemming, and tagging.', 'Tokenization is a key process in NLP, involving breaking strings into tokens, understanding the importance of each word in a sentence, and producing a structural description of the input sentence.', 'Challenges in NLP include lexical ambiguity, syntactical ambiguity, and referential ambiguity, which arise from the complexities of language understanding.']}, {'end': 927.903, 'segs': [{'end': 684.859, 'src': 'embed', 'start': 658.543, 'weight': 4, 'content': [{'end': 666.37, 'text': 'and this is where NLTK comes into picture and it helps a lot of programmers to learn about the different features and the different application of language processing.', 'start': 658.543, 'duration': 7.827}, {'end': 671.152, 'text': 'So here I have created a paragraph on artificial intelligence.', 'start': 667.35, 'duration': 3.802}, {'end': 673.033, 'text': 'So let me just execute it.', 'start': 671.772, 'duration': 1.261}, {'end': 677.235, 'text': 'Not this AI is of the string type.', 'start': 674.254, 'duration': 2.981}, {'end': 679.937, 'text': 'So it will be easier for us to tokenize it.', 'start': 677.415, 'duration': 2.522}, {'end': 684.859, 'text': 'Nonetheless, any of the files can be used to tokenize for simplicity here.', 'start': 680.577, 'duration': 4.282}], 'summary': 'Nltk aids programmers in learning language processing with various applications, including tokenization.', 'duration': 26.316, 'max_score': 658.543, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA658543.jpg'}, {'end': 747.782, 'src': 'embed', 'start': 701.258, 'weight': 0, 'content': [{'end': 706.281, 'text': "So here I'm considering ai underscore tokens and I'm using the word underscore tokenized function on it.", 'start': 701.258, 'duration': 5.023}, {'end': 709.463, 'text': "Let's see what's the output of this ai underscore tokens.", 'start': 707.042, 'duration': 2.421}, {'end': 716.308, 'text': 'So as you can see here, it has divided all the input which was provided here into the tokens.', 'start': 710.964, 'duration': 5.344}, {'end': 720.951, 'text': "Now let's have a look at the number of tokens here we have here.", 'start': 718.169, 'duration': 2.782}, {'end': 723.853, 'text': 'So in total, we have 273 tokens.', 'start': 722.152, 'duration': 1.701}, {'end': 737.697, 'text': 'Now these tokens are a list of words and the special characters which are separated items of the list now in order to find the frequency of the distinct elements here in the given a paragraph.', 'start': 726.112, 'duration': 11.585}, {'end': 743.38, 'text': 'We are going to import the frequency distinct function which falls under NLTK probability.', 'start': 737.737, 'duration': 5.643}, {'end': 747.782, 'text': "So let's create a F test in which we have the function here frequentist.", 'start': 744.1, 'duration': 3.682}], 'summary': 'Using ai_tokens, 273 tokens created, frequency analysis performed with nltk.', 'duration': 46.524, 'max_score': 701.258, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA701258.jpg'}, {'end': 839.303, 'src': 'embed', 'start': 770.589, 'weight': 2, 'content': [{'end': 775.394, 'text': 'so as to avoid the probability of considering a word with uppercase and lowercase as different.', 'start': 770.589, 'duration': 4.805}, {'end': 779.558, 'text': 'Now suppose we were to select the top 10 tokens with the highest frequency.', 'start': 776.135, 'duration': 3.423}, {'end': 791.186, 'text': 'So here you can see that we have comma 30 times, the 13 times of 12 times and and 12 times, whereas the meaningful words, which are intelligence,', 'start': 780.279, 'duration': 10.907}, {'end': 793.988, 'text': 'which is 6 times, and intelligent, 16..', 'start': 791.186, 'duration': 2.802}, {'end': 796.89, 'text': 'Now there is another type of tokenizer which is the blank tokenizer.', 'start': 793.988, 'duration': 2.902}, {'end': 803.215, 'text': "Now let's use the blank tokenizer over the same string to tokenize the paragraph with respect to the blank string.", 'start': 797.371, 'duration': 5.844}, {'end': 805.74, 'text': 'Now the output here is 9.', 'start': 803.815, 'duration': 1.925}, {'end': 812.185, 'text': 'Now this 9 indicates how many paragraphs we have and what all paragraphs are separated by a new line.', 'start': 805.74, 'duration': 6.445}, {'end': 815.667, 'text': 'Although it might seem like a one paragraph.', 'start': 812.705, 'duration': 2.962}, {'end': 819.01, 'text': 'It is not the original structure of the data remains intact.', 'start': 815.707, 'duration': 3.303}, {'end': 824.894, 'text': 'Now another important key term in tokenization are bigrams diagrams and grams.', 'start': 819.73, 'duration': 5.164}, {'end': 831.321, 'text': 'Now, what does this mean? Now, bigrams refers to tokens of two consecutive words known as a bigram.', 'start': 825.72, 'duration': 5.601}, {'end': 839.303, 'text': 'Similarly, tokens of three consecutive written words are known as trigram, and similarly, we have ngrams for the n consecutive written words.', 'start': 831.781, 'duration': 7.522}], 'summary': 'Tokenization involves identifying top 10 tokens with frequencies, and using different types of tokenizers like blank tokenizer, bigrams, trigrams, and ngrams.', 'duration': 68.714, 'max_score': 770.589, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA770589.jpg'}], 'start': 658.543, 'title': 'Nltk and tokenization in ai', 'summary': 'Covers nltk usage for tokenizing an ai paragraph, resulting in 273 tokens, and demonstrates token frequency analysis, including distinct elements identification, case sensitivity avoidance, and utilization of different tokenizers. it also discusses tokenization and n-grams creation in nltk, illustrating the application of bigrams, trigrams, and n-grams in natural language processing.', 'chapters': [{'end': 723.853, 'start': 658.543, 'title': 'Nltk and tokenization in ai', 'summary': 'Demonstrates the usage of nltk for tokenizing a paragraph on artificial intelligence, resulting in 273 tokens.', 'duration': 65.31, 'highlights': ['By using NLTK, programmers can learn about different features and applications of language processing.', 'The transcript showcases the process of tokenizing a paragraph on artificial intelligence, resulting in 273 tokens.', 'The word underscore tokenized function is used to tokenize all the words in the paragraph.']}, {'end': 815.667, 'start': 726.112, 'title': 'Token frequency analysis', 'summary': 'Demonstrates the process of analyzing token frequency in a given paragraph using nltk probability, including identifying distinct elements, converting tokens to lowercase to avoid case sensitivity, and utilizing different tokenizers to tokenize the paragraph.', 'duration': 89.555, 'highlights': ['The chapter demonstrates the process of analyzing token frequency in a given paragraph using NLTK probability, including identifying distinct elements, converting tokens to lowercase to avoid case sensitivity, and utilizing different tokenizers to tokenize the paragraph.', 'The paragraph contains 30 commas, 9 full stops, and 1 accomplishment, demonstrating the frequency of punctuation and specific words.', 'The analysis also involves converting tokens into lowercase to avoid considering uppercase and lowercase as different, ensuring accurate frequency counts.', "The top 10 tokens with the highest frequency include 'comma' 30 times, 'the' 13 times, 'of' 12 times, and 'and' 12 times, showcasing the most frequently occurring words in the paragraph.", 'The chapter introduces the concept of a blank tokenizer and its application in tokenizing the paragraph, resulting in 9 separate paragraphs being identified.', 'The output of using the blank tokenizer indicates the presence of 9 separate paragraphs, despite them appearing as a single paragraph, highlighting the functionality of the tokenizer in identifying paragraph breaks.']}, {'end': 927.903, 'start': 815.707, 'title': 'Tokenization and n-grams in nltk', 'summary': 'Discusses tokenization and n-grams in nltk, demonstrating the creation of bigrams, trigrams, and n-grams from a given string and their application in natural language processing.', 'duration': 112.196, 'highlights': ['The process of tokenization involves breaking a text into individual words or terms, and bigrams, trigrams, and ngrams refer to tokens of two, three, and n consecutive written words respectively, with examples demonstrated using NLTK. (Relevance: 5)', 'Demonstrated the creation of bigrams, trigrams, and n-grams from a given string using NLTK, with examples of tokens in pairs, three words, and a specified number of consecutive words. (Relevance: 4)', "Applied NLTK's functions to split a given sentence into tokens, create biograms, trigrams, and n-grams, and showcase the tokens in pairs and three words. (Relevance: 3)"]}], 'duration': 269.36, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA658543.jpg', 'highlights': ['The transcript showcases the process of tokenizing a paragraph on artificial intelligence, resulting in 273 tokens.', 'The chapter demonstrates the process of analyzing token frequency in a given paragraph using NLTK probability, including identifying distinct elements, converting tokens to lowercase to avoid case sensitivity, and utilizing different tokenizers to tokenize the paragraph.', "The top 10 tokens with the highest frequency include 'comma' 30 times, 'the' 13 times, 'of' 12 times, and 'and' 12 times, showcasing the most frequently occurring words in the paragraph.", 'The output of using the blank tokenizer indicates the presence of 9 separate paragraphs, despite them appearing as a single paragraph, highlighting the functionality of the tokenizer in identifying paragraph breaks.', 'By using NLTK, programmers can learn about different features and applications of language processing.', 'The process of tokenization involves breaking a text into individual words or terms, and bigrams, trigrams, and ngrams refer to tokens of two, three, and n consecutive written words respectively, with examples demonstrated using NLTK.']}, {'end': 1182.892, 'segs': [{'end': 973.365, 'src': 'embed', 'start': 929.071, 'weight': 0, 'content': [{'end': 933.453, 'text': 'So as you can see we have the output in the form of four tokens.', 'start': 929.071, 'duration': 4.382}, {'end': 938.054, 'text': 'Now once we have the tokens we need to make some changes to the tokens.', 'start': 934.593, 'duration': 3.461}, {'end': 946.618, 'text': 'So for that we have stemming now stemming usually refers to normalizing words into its base form or the root form.', 'start': 939.175, 'duration': 7.443}, {'end': 954.361, 'text': 'So, if we have a look at the words here, we have affectation, affects, affections, affected, affection and affecting.', 'start': 947.498, 'duration': 6.863}, {'end': 957.362, 'text': 'so, as you might have guessed, the root word here is affect.', 'start': 954.361, 'duration': 3.001}, {'end': 963.076, 'text': 'So one thing to keep in mind here is that the result may not be the root word always.', 'start': 958.391, 'duration': 4.685}, {'end': 967.96, 'text': 'Simming algorithm works by cutting off the end or the beginning of the word,', 'start': 964.197, 'duration': 3.763}, {'end': 973.365, 'text': 'taking into account a list of common prefixes and suffixes that can be found in an infected word.', 'start': 967.96, 'duration': 5.405}], 'summary': 'Stemming reduces words to root form, 4 tokens output.', 'duration': 44.294, 'max_score': 929.071, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA929071.jpg'}, {'end': 1070.612, 'src': 'embed', 'start': 1039.409, 'weight': 3, 'content': [{'end': 1043.713, 'text': 'Now the use of each of these stemmers depend on the type of task that you want to perform.', 'start': 1039.409, 'duration': 4.304}, {'end': 1050.638, 'text': 'For example, if you want to check how many times the words GIV is used above, you can use the Lancaster stemmer,', 'start': 1043.772, 'duration': 6.866}, {'end': 1053.461, 'text': 'and for other purposes you have the Potter stemmer as well.', 'start': 1050.638, 'duration': 2.823}, {'end': 1056.443, 'text': 'But there are a lot of stemmers.', 'start': 1055.242, 'duration': 1.201}, {'end': 1063.609, 'text': 'There is one snowball stemmer also present where you need to specify the language which you are using and then use the snowball stemmer.', 'start': 1056.463, 'duration': 7.146}, {'end': 1070.612, 'text': 'Now, as we discussed that stemming algorithm works by cutting off the end or the beginning of the word.', 'start': 1064.77, 'duration': 5.842}], 'summary': 'Various stemmers like lancaster, potter, and snowball are used for different tasks, with the snowball stemmer requiring language specification, and stemming algorithms work by cutting off word endings or beginnings.', 'duration': 31.203, 'max_score': 1039.409, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA1039409.jpg'}, {'end': 1119.465, 'src': 'embed', 'start': 1076.794, 'weight': 4, 'content': [{'end': 1084.337, 'text': 'Now in order to do so it is necessary to have a detailed dictionary which the algorithm can look into to link the form back to its lemma.', 'start': 1076.794, 'duration': 7.543}, {'end': 1090.691, 'text': 'Limitization what it does is groups together different infected forms of a word which are called lemma.', 'start': 1085.426, 'duration': 5.265}, {'end': 1094.734, 'text': 'It is somehow similar to stemming as it maps several words into a common root.', 'start': 1091.111, 'duration': 3.623}, {'end': 1100.8, 'text': 'Now one of the most important thing here to consider is that the output of limitization is a proper word.', 'start': 1095.355, 'duration': 5.445}, {'end': 1106.605, 'text': 'Unlike stemming in that case where we got the output as giv, giv is not any word.', 'start': 1101.38, 'duration': 5.225}, {'end': 1107.846, 'text': "It's just a stem.", 'start': 1106.945, 'duration': 0.901}, {'end': 1119.465, 'text': 'Now for example, if a lemmatization should work on go on going and went it all stems into go because that is the root of the all the three words here.', 'start': 1108.655, 'duration': 10.81}], 'summary': "Lemmatization groups word forms, unlike stemming, produces proper words, e.g., 'go' for 'going', 'went'.", 'duration': 42.671, 'max_score': 1076.794, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA1076794.jpg'}], 'start': 929.071, 'title': 'Stemming and tokenization in nlp', 'summary': 'Covers tokenization and stemming in nlp, illustrating word normalization into base form with examples, and discusses the process of stemming and lemmatization, comparing various stemmers and emphasizing the significance of lemmatization in producing proper words.', 'chapters': [{'end': 973.365, 'start': 929.071, 'title': 'Stemming and tokenization in nlp', 'summary': "Discusses the process of tokenization and stemming in nlp, demonstrating how words are normalized into their base form or root form, with an example of the word 'affect' and its variations, showcasing the application of stemming algorithms.", 'duration': 44.294, 'highlights': ['Stemming algorithm works by cutting off the end or the beginning of the word, taking into account a list of common prefixes and suffixes that can be found in an infected word.', "The chapter demonstrates the process of tokenization, showing the output in the form of four tokens and then discusses the application of stemming in normalizing words into their base form using the example of the word 'affect' and its variations.", "The root word 'affect' is identified from the words 'affectation, affects, affections, affected, affection, and affecting' as an example of the application of stemming in NLP."]}, {'end': 1182.892, 'start': 974.066, 'title': 'Stemming and lemmatization in nlp', 'summary': 'Discusses the process of stemming and lemmatization in nlp, comparing different types of stemmers such as potter, lancaster, and snowball, and highlighting the differences in their aggressiveness and use cases, as well as the importance of lemmatization in producing proper words from root forms.', 'duration': 208.826, 'highlights': ['The use of each of these stemmers depend on the type of task that you want to perform.', 'Lemmatization what it does is groups together different infected forms of a word which are called lemma.', 'The output of limitization is a proper word.']}], 'duration': 253.821, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA929071.jpg', 'highlights': ["The root word 'affect' is identified from the words 'affectation, affects, affections, affected, affection, and affecting' as an example of the application of stemming in NLP.", "The chapter demonstrates the process of tokenization, showing the output in the form of four tokens and then discusses the application of stemming in normalizing words into their base form using the example of the word 'affect' and its variations.", 'Stemming algorithm works by cutting off the end or the beginning of the word, taking into account a list of common prefixes and suffixes that can be found in an infected word.', 'The use of each of these stemmers depend on the type of task that you want to perform.', 'Lemmatization what it does is groups together different infected forms of a word which are called lemma.', 'The output of limitization is a proper word.']}, {'end': 1785.498, 'segs': [{'end': 1209.109, 'src': 'embed', 'start': 1184.292, 'weight': 6, 'content': [{'end': 1191.757, 'text': "So, as you can see here, the limitizer has kept the words as it is, and this is because we haven't assigned any POS tags here,", 'start': 1184.292, 'duration': 7.465}, {'end': 1195.419, 'text': 'and hence it has assumed all the words as nouns.', 'start': 1191.757, 'duration': 3.662}, {'end': 1197.381, 'text': 'now you might be wondering what a POS tags.', 'start': 1195.419, 'duration': 1.962}, {'end': 1200.963, 'text': "Well, I'll tell you what a POS tags later in this video.", 'start': 1198.001, 'duration': 2.962}, {'end': 1209.109, 'text': "So for just now, let's keep it as simple as that is that POS tags usually tell us what exactly the given word is.", 'start': 1201.644, 'duration': 7.465}], 'summary': 'The limitizer kept words as nouns due to lack of pos tags.', 'duration': 24.817, 'max_score': 1184.292, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA1184292.jpg'}, {'end': 1339.029, 'src': 'heatmap', 'start': 1302.785, 'weight': 1, 'content': [{'end': 1311.97, 'text': 'what you can see here is that, except intelligent and intelligence, most of the words are either punctuation or stop words and hence can be removed.', 'start': 1302.785, 'duration': 9.185}, {'end': 1318.258, 'text': "Now we'll use the compile from the re module to create a string that matches any digit or special character,", 'start': 1312.515, 'duration': 5.743}, {'end': 1320.999, 'text': "and then we'll see how we can remove the stop words.", 'start': 1318.258, 'duration': 2.741}, {'end': 1329.444, 'text': 'So if we have a look at the output of the post punctuation, you can see there are no stop words here in the particular given output.', 'start': 1321.72, 'duration': 7.724}, {'end': 1339.029, 'text': "And if you have a look at the output of the length of the post punctuation, it's 233 compared to the 273 the length of the AI underscore tokens.", 'start': 1330.264, 'duration': 8.765}], 'summary': 'Removing stop words resulted in a post punctuation length of 233 compared to 273 for ai tokens.', 'duration': 36.244, 'max_score': 1302.785, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA1302785.jpg'}, {'end': 1346.837, 'src': 'embed', 'start': 1318.258, 'weight': 2, 'content': [{'end': 1320.999, 'text': "and then we'll see how we can remove the stop words.", 'start': 1318.258, 'duration': 2.741}, {'end': 1329.444, 'text': 'So if we have a look at the output of the post punctuation, you can see there are no stop words here in the particular given output.', 'start': 1321.72, 'duration': 7.724}, {'end': 1339.029, 'text': "And if you have a look at the output of the length of the post punctuation, it's 233 compared to the 273 the length of the AI underscore tokens.", 'start': 1330.264, 'duration': 8.765}, {'end': 1346.837, 'text': 'Now, this is very necessary in language processing as it removes the all the unnecessary words which do not hold any much more meaning.', 'start': 1340.074, 'duration': 6.763}], 'summary': 'Demonstrating stop words removal with 233 vs 273 length difference in language processing.', 'duration': 28.579, 'max_score': 1318.258, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA1318258.jpg'}, {'end': 1442.249, 'src': 'embed', 'start': 1408.763, 'weight': 0, 'content': [{'end': 1413.247, 'text': 'Now we can use POS tags as a statistical NLP task.', 'start': 1408.763, 'duration': 4.484}, {'end': 1421.293, 'text': 'It distinguishes the sense of the word, which is very helpful in text realization, and it is easy to evaluate, as in how many tags are correct,', 'start': 1413.687, 'duration': 7.606}, {'end': 1424.296, 'text': 'and you can also infer semantic information from the given text.', 'start': 1421.293, 'duration': 3.003}, {'end': 1427.158, 'text': "So let's have a look at some of the examples of POS.", 'start': 1424.736, 'duration': 2.422}, {'end': 1432.643, 'text': 'So take the sentence the dog killed the bat.', 'start': 1427.779, 'duration': 4.864}, {'end': 1442.249, 'text': 'So here D is a determiner dog is a noun killed is a verb and again the bat are determiner and noun respectively.', 'start': 1433.74, 'duration': 8.509}], 'summary': 'Pos tagging aids in text understanding and evaluation, such as determining correct tags and inferring semantic information.', 'duration': 33.486, 'max_score': 1408.763, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA1408763.jpg'}, {'end': 1701.655, 'src': 'embed', 'start': 1657.242, 'weight': 4, 'content': [{'end': 1659.825, 'text': 'Now, this is only possible because of the POS tagging.', 'start': 1657.242, 'duration': 2.583}, {'end': 1665.31, 'text': 'Without the POS tagging, it would be very hard to detect the named entities of the given tokens.', 'start': 1659.885, 'duration': 5.425}, {'end': 1670.995, 'text': 'Now that we have understood what are named entity recognition and ERs,', 'start': 1666.571, 'duration': 4.424}, {'end': 1676.66, 'text': "let's go ahead and understand one of the most important topic in NLP and text mining, which is the syntax.", 'start': 1670.995, 'duration': 5.665}, {'end': 1679.003, 'text': 'So what is a syntax??', 'start': 1678.102, 'duration': 0.901}, {'end': 1689.267, 'text': 'So, in linguistics, syntax is the set of rules, principles and the processes that govern the structure of a given sentence in a given language.', 'start': 1680.502, 'duration': 8.765}, {'end': 1694.411, 'text': 'The term syntax is also used to refer to the study of such principles and processes.', 'start': 1689.708, 'duration': 4.703}, {'end': 1701.655, 'text': 'So what we have here are certain rules as to what part of the sentence should come at what position.', 'start': 1695.131, 'duration': 6.524}], 'summary': 'Pos tagging enables named entity recognition and syntax analysis in nlp.', 'duration': 44.413, 'max_score': 1657.242, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA1657242.jpg'}], 'start': 1184.292, 'title': 'Pos tagging and nltk stop words', 'summary': "Delves into the significance of pos tags in sentence structure, emphasizing their role in clarity and demonstrating the impact of not assigning pos tags. it also explores nltk's stop words, showcasing their removal and the relevance of pos tagging in nlp, with practical examples and applications in ner and syntax analysis.", 'chapters': [{'end': 1227.78, 'start': 1184.292, 'title': 'Pos tags and their importance', 'summary': 'Explains the importance of pos tags in determining the parts of speech of words in a sentence, highlighting their role in sentence formation and clarity, and mentioning the impact of not assigning pos tags, emphasizing the need for understanding pos tags for effective sentence structure.', 'duration': 43.488, 'highlights': ["The chapter emphasizes the importance of POS tags in determining the parts of speech of words in a sentence, highlighting their role in sentence formation and clarity. It mentions how words like 'for' and 'above' are crucial for sentence formation and understanding.", 'It explains the impact of not assigning POS tags, stating that the limitizer has assumed all the words as nouns due to the absence of POS tags, demonstrating the need for understanding POS tags for effective sentence structure.', 'It mentions the intention to explain POS tags later in the video, indicating a forthcoming detailed explanation on the concept.']}, {'end': 1785.498, 'start': 1228.57, 'title': 'Nltk stop words and pos tagging', 'summary': "Discusses nltk's list of stop words, demonstrating the removal of stop words and the importance of pos tagging in nlp, including examples and its applications in ner and syntax analysis.", 'duration': 556.928, 'highlights': ["NLTK's list of 179 stop words is helpful in language processing but not in the processing of the language.", 'The importance of removing stop words for language processing is demonstrated with an example, resulting in a reduction in the number of tokens from 273 to 233.', 'The significance of POS tagging is underscored by its role in distinguishing word senses, aiding text realization, and providing semantic information.', 'The process and types of named entity recognition (NER) are explained, emphasizing the role of POS tagging in enabling the detection of named entities.', 'The concept of syntax is defined, and its role in governing the structure of sentences and creating syntax trees is explained.']}], 'duration': 601.206, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA1184292.jpg', 'highlights': ['The significance of POS tagging in sentence formation and clarity is emphasized.', 'The impact of not assigning POS tags is demonstrated, highlighting the need for understanding POS tags.', 'The importance of removing stop words for language processing is demonstrated with an example, resulting in a reduction in the number of tokens from 273 to 233.', 'The role of POS tagging in distinguishing word senses, aiding text realization, and providing semantic information is underscored.', 'The process and types of named entity recognition (NER) are explained, emphasizing the role of POS tagging in enabling the detection of named entities.', 'The concept of syntax is defined, and its role in governing the structure of sentences and creating syntax trees is explained.', 'The intention to explain POS tags later in the video is mentioned, indicating a forthcoming detailed explanation on the concept.']}, {'end': 2398.687, 'segs': [{'end': 1948.541, 'src': 'embed', 'start': 1916.838, 'weight': 4, 'content': [{'end': 1919.079, 'text': 'All the fresh is an adjective and cheese is a noun.', 'start': 1916.838, 'duration': 2.241}, {'end': 1922.121, 'text': 'It has considered a noun phrase of these two words.', 'start': 1919.66, 'duration': 2.461}, {'end': 1926.384, 'text': 'So this is how you execute chunking in NLTK library.', 'start': 1923.122, 'duration': 3.262}, {'end': 1934.288, 'text': "So by now we have learned almost all the important steps in text processing and let's apply them all in building a machine learning classifier.", 'start': 1927.324, 'duration': 6.964}, {'end': 1934.848, 'text': 'on the movie.', 'start': 1934.288, 'duration': 0.56}, {'end': 1937.03, 'text': 'reviews from the NLTK corpora.', 'start': 1934.848, 'duration': 2.182}, {'end': 1944.294, 'text': 'So for that, first let me import all the libraries, which are the pandas, the numpy library.', 'start': 1938.17, 'duration': 6.124}, {'end': 1948.541, 'text': 'Now, these are the basic libraries needed in any machine learning algorithm.', 'start': 1945.418, 'duration': 3.123}], 'summary': 'Introduction to nltk chunking and machine learning for text processing.', 'duration': 31.703, 'max_score': 1916.838, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA1916838.jpg'}, {'end': 1985.995, 'src': 'embed', 'start': 1957.772, 'weight': 5, 'content': [{'end': 1964.279, 'text': 'So again, if we have a look at the different elements of the corpora, as we saw earlier in the beginning of our session,', 'start': 1957.772, 'duration': 6.507}, {'end': 1966.862, 'text': 'we have so many files in the given NLTK corpora.', 'start': 1964.279, 'duration': 2.583}, {'end': 1972.004, 'text': "Let's now access the movie reviews corporates under the NLTK corpora as you can see here.", 'start': 1967.68, 'duration': 4.324}, {'end': 1974.166, 'text': 'We have the movie reviews.', 'start': 1972.524, 'duration': 1.642}, {'end': 1979.49, 'text': 'for that We are going to import the movie underscore reviews from the NLTK corporates.', 'start': 1974.166, 'duration': 5.324}, {'end': 1985.995, 'text': 'So if we have a look at the different categories of the movie reviews, we have two categories which are the negative and the positive.', 'start': 1979.49, 'duration': 6.505}], 'summary': 'Nltk corpora contains many movie reviews categorized as negative and positive.', 'duration': 28.223, 'max_score': 1957.772, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA1957772.jpg'}, {'end': 2124.426, 'src': 'embed', 'start': 2096.311, 'weight': 1, 'content': [{'end': 2099.793, 'text': 'if we have a look at the length of the review list, it should be 2000.', 'start': 2096.311, 'duration': 3.482}, {'end': 2102.314, 'text': 'that is good.', 'start': 2099.793, 'duration': 2.521}, {'end': 2108.317, 'text': 'now let us now create the targets before creating the few features for our classifiers.', 'start': 2102.314, 'duration': 6.003}, {'end': 2110.378, 'text': 'so while creating the targets,', 'start': 2108.317, 'duration': 2.061}, {'end': 2117.701, 'text': 'we are using the negative reviews here we are denoting it as zero and for the positive reviews we are converting it into one,', 'start': 2110.378, 'duration': 7.323}, {'end': 2124.426, 'text': "and also we will create an empty list and we'll add thousand zeros, followed by thousand ones, into the empty list.", 'start': 2118.644, 'duration': 5.782}], 'summary': 'Preparing 2000 reviews for classification, using 1000 negative and 1000 positive reviews.', 'duration': 28.115, 'max_score': 2096.311, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA2096311.jpg'}, {'end': 2175.02, 'src': 'embed', 'start': 2146.883, 'weight': 2, 'content': [{'end': 2150.146, 'text': 'So as you can see it is thousand zeros, which will followed by thousand one.', 'start': 2146.883, 'duration': 3.263}, {'end': 2152.248, 'text': 'So the first five inputs are all zeros.', 'start': 2150.206, 'duration': 2.042}, {'end': 2158.553, 'text': 'Now we can start creating features using the count vectorizer or the bag of words for that.', 'start': 2153.349, 'duration': 5.204}, {'end': 2160.274, 'text': 'We need to import the count vectorizer.', 'start': 2158.593, 'duration': 1.681}, {'end': 2165.459, 'text': 'Now once we have initialized the vectorizer and now we need to fit it onto the rev list.', 'start': 2161.295, 'duration': 4.164}, {'end': 2170.383, 'text': 'Now, let us now have a look at the dimensions of this particular vector.', 'start': 2166.159, 'duration': 4.224}, {'end': 2175.02, 'text': "So as you can see, it's 2000 by 16228.", 'start': 2170.683, 'duration': 4.337}], 'summary': 'Using count vectorizer, created a 2000x16228 dimensional vector from input data.', 'duration': 28.137, 'max_score': 2146.883, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA2146883.jpg'}, {'end': 2257.237, 'src': 'embed', 'start': 2214.529, 'weight': 0, 'content': [{'end': 2222.111, 'text': 'Now the data frame we are going to do is now split it into training and testing sets and let us now examine the training and the test sets as well.', 'start': 2214.529, 'duration': 7.582}, {'end': 2224.732, 'text': 'So as you can see the size here we have defined as 0.25.', 'start': 2222.931, 'duration': 1.801}, {'end': 2232.498, 'text': "There's a test side that is 25% The training set will have the 75% of the particular data frame.", 'start': 2224.732, 'duration': 7.766}, {'end': 2239.24, 'text': 'So if you have a look at the shape of the X train, we have 15, 000.', 'start': 2233.239, 'duration': 6.001}, {'end': 2244.642, 'text': 'And if you have a look at the dimension of X test, this is 5, 000.', 'start': 2239.241, 'duration': 5.401}, {'end': 2246.043, 'text': 'So now our data is split now.', 'start': 2244.643, 'duration': 1.4}, {'end': 2251.185, 'text': "We'll use the naive bias classifier for text classification over the training and testing sets.", 'start': 2246.743, 'duration': 4.442}, {'end': 2257.237, 'text': 'So now most of you guys might already be aware of what a naive bias classifier is.', 'start': 2252.791, 'duration': 4.446}], 'summary': 'Data split into 75% training set and 25% testing set, with 15,000 records in training and 5,000 in testing. naive bayes classifier used for text classification.', 'duration': 42.708, 'max_score': 2214.529, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA2214529.jpg'}], 'start': 1785.618, 'title': 'Text processing and classification', 'summary': 'Discusses chunking in nlp with nltk library, including building a machine learning classifier for movie reviews and implementing naive bayes algorithm with 100% accuracy for text classification.', 'chapters': [{'end': 2049.013, 'start': 1785.618, 'title': 'Chunking in nltk library', 'summary': 'Discusses the concept of chunking in nlp and text mining, demonstrating how to implement chunking using the nltk library. it also covers the process of building a machine learning classifier for movie reviews from the nltk corpora, including importing libraries and accessing movie reviews.', 'duration': 263.395, 'highlights': ['The chapter discusses the concept of chunking in NLP and text mining, demonstrating how to implement chunking using the NLTK library.', 'The chapter covers the process of building a machine learning classifier for movie reviews from the NLTK corpora, including importing libraries and accessing movie reviews.']}, {'end': 2398.687, 'start': 2049.634, 'title': 'Text classification with naive bayes', 'summary': 'Covers creating a dataset with 2000 reviews, using count vectorizer to create features, splitting the data into training and testing sets, and implementing naive bayes algorithm with 100% accuracy for text classification.', 'duration': 349.053, 'highlights': ['Creating a dataset with 2000 reviews', 'Using Count Vectorizer to create features', 'Implementing Naive Bayes algorithm with 100% accuracy', 'Splitting the data into training and testing sets']}], 'duration': 613.069, 'thumbnail': 'https://coursnap.oss-ap-southeast-1.aliyuncs.com/video-capture/05ONoGfmKvA/pics/05ONoGfmKvA1785618.jpg', 'highlights': ['Implementing Naive Bayes algorithm with 100% accuracy', 'Creating a dataset with 2000 reviews', 'Using Count Vectorizer to create features', 'Splitting the data into training and testing sets', 'Demonstrating how to implement chunking using the NLTK library', 'Covering the process of building a machine learning classifier for movie reviews from the NLTK corpora']}], 'highlights': ['NLP applications include sentiment analysis, speech recognition, chatbot implementation, machine translation, spell checking, keyword search, information extraction, and advertisement matching.', 'The session covers the importance of language and its evolution, along with the connection between text mining and NLP.', 'The various applications of NLP in the industry and the different components of NLP are discussed.', 'The session concludes with an end-to-end demo using NLP along with machine learning.', 'Tokenization is a key process in NLP, involving breaking strings into tokens, understanding the importance of each word in a sentence, and producing a structural description of the input sentence.', 'The NLTK library is essential for NLP, providing tools for working with human language data, performing functions like classification, tokenization, stemming, and tagging.', 'The transcript showcases the process of tokenizing a paragraph on artificial intelligence, resulting in 273 tokens.', 'The chapter demonstrates the process of analyzing token frequency in a given paragraph using NLTK probability, including identifying distinct elements, converting tokens to lowercase to avoid case sensitivity, and utilizing different tokenizers to tokenize the paragraph.', "The root word 'affect' is identified from the words 'affectation, affects, affections, affected, affection, and affecting' as an example of the application of stemming in NLP.", 'The significance of POS tagging in sentence formation and clarity is emphasized.']}