Javatpoint Logo
Javatpoint Logo

Python Newspaper Module

Most of us are often not interested in reading the complete newspaper or even a complete article. In such cases, we only want to know about keywords, titles, or many such smaller things of the article so that we don't have to spend so much reading a complete article. This also becomes useful when we only want to read selected articles, but we don't know how we can select useful articles for us. We all must be aware of what web scraping is and how it works. We also know that how much web scrapping is important and how it is helpful for us to extract all the useful information from a source website. We can also perform this operation on the newspaper websites too from where we can take a link of an article and scrap and curate useful information from that article. It is possible for us to perform all this by using a Python program, and for this task, Python provides us with a very useful module, i.e., Newspaper Module. We are going to learn about this Newspaper Module of Python in this tutorial, and we will learn how we can use this module to perform newspaper scrapping and curation using a Python program.

Newspaper Module in Python

Newspaper in a Python Module, which is basically designed to curate useful information from a newspaper article. Therefore, we can use Python's Newspaper Module to perform scrapping and curation from an article by problem its web link in the Python program. We can retrieve all the useful information from an article, such as title, keywords, etc., by using the functions of the Newspaper Module. The newspaper Module of Python uses advanced algorithms with web scrapping features so that all the useful text can be extracted from a newspaper website. The Newspaper Module works very amazingly on the online newspaper websites we generally use in our daily life.

Note: We should note that the Newspaper Module performs a web scrapping process on the online newspaper websites. That's why, if we make multiple requests simultaneously from a website, it may lead to blocking from that website. Therefore, we have to use this module accordingly whenever we actually need to use it.

Newspaper Module: Installation

Newspaper Module is not a built-in module in Python, and therefore, we have to install this module first in our system, and only after that can we curate useful information from an article's web link. We can install this Newspaper Module from multiple sources and by using multiple methods, but the method we suggest is using the pip installer. Through pip installer, we can install the Newspaper Module very easily by using the following command in the command prompt terminal:

Once we write the command given above in the terminal shell of our device, we should press enter key to start the installation process, and then we have to wait for a while for the completion of the installation process. Once the installation process of this Newspaper Module is completed, it will show us the following success installation message window in the terminal shell:

Python Newspaper Module
Python Newspaper Module

We can see, Newspaper Module of Python is successfully installed in our system, and now, we can use functions from Newspaper Module by importing it into a Python program to perform newspaper scrapping.

Newspaper Module: Languages Supported

The Newspaper Module of Python supports it, and that's why it becomes even more popular because one can scrap news articles from their choice of language. Following languages are supported in the Newspaper Module, and their input codes are mentioned with them:

Sr. No. Language Input code for the language
1 Arabic ar
2 Chinese zh
3 Greek el
4 Danish da
5 Italian it
6 German de
7 ... and many more ....

We have to provide the input code of a language while we create an instance for an article's web link in the program. The language code we provide in the program will help the newspaper module to execute and use its specific set of algorithms for a particular language to scrape and curate from the article.

Newspaper Module: Implementation

We have installed the newspaper module in our system, and now we all want to perform its implementation so that we can understand how this module works. Implementation of the newspaper module will also help us to learn how to curate multiple keywords and useful information from the article. But, before we use this newspaper module will in a Python program, we should note that we have to create an instance first for the article link. The instance for the article we create will fetch all the information from the article by using functions of the newspaper module. Therefore, first, we should learn about the syntax of the instance of the article and what parameters are used in it.

Syntax for Creating an Instance:

Following syntax, we have to use in the program to create an instance for the article:

As we can see in the syntax written above, we have used the following two parameters:

  1. urlOfArticle: Here, we have to provide the article's web link from where we will curate the useful information from the article.
  2. Language: We have to provide the input code of the language in which the article is written.

We have now learned the syntax for creating an article's instance, and now we can move forward with the implementation part of the newspaper module. We will use the following example program to understand the implementation of this Newspaper Module.

NLTK Module: The nltk module is also very important while performing the newspaper scrapping, and we have to use this module with the newspaper module in order to perform the newspaper scrapping on an article successfully. The nltk module is used to perform NLP on the article's link, and without performing NLP, we cannot curate useful information from the article. Therefore, while using the newspaper module in the example, we also have to use the nltk module with it and download the nltk data using the download('punkt') function in the program. We should also make sure that the ntlk module is installed in our system, and if it is not installed in our system, we can use the following command to install it:


Python Newspaper Module

After completing the installation process of the ntlk module, we can move ahead with the implementation part of the Newspaper Module and use it as an example program to perform newspaper scrapping.

Example 1: Look at the following Python program where we have used an article of TOI to perform newspaper scrapping with Newspaper Module:

Output:

Python Newspaper Module

We have successfully performed scrapping from the piece of article's link given in the code!

Explanation:

We have first imported the article from the newspaper module and nltk module in the program so that we can use the functions of both these modules to perform newspaper scrapping. After that, we gave an article's link (a TOI's latest article) inside the URL variable. Then, we initialized the instance of the article by using the article() function and gave initialized URL as an argument in it. Also, we have provided the language input code with the URL variable in the article() function. After that, we used the download() function to download the article in the program, and then we parsed the article using the parse() function. After that, we performed natural language processing (NLP) on the parsed article piece with the help of the nlp() function. Now, after performing NLP on the parsed article, we were able to print useful information from the article. Therefore, first, we used the ".title" function to print the article's title, and then, we printed text from the article using the ".text" function. Next, we printed the summary of the article using the ".summary" function, and after that, we printed important keywords of the article using the ".keywords" function.

Note: We should note that TOI removes some of their articles from time to time, and therefore, this article link of TOI given in the example may not work in the future. So, while using this example, we have to use a new link of another article.

Newspaper Module: Some Useful Functions

When we performed the scrapping from the article's link using the newspaper module, we used some important functions to complete that task successfully. These functions are very useful to perform newspaper scrapping & curation and to print important information in the output. Here, in this section, we will learn about such important functions of the Newspaper Module, which are given below:

Sr No Function name Function's Working
1 article() First, we have to create an article's instance to curate any useful information from the article's link given in the example, and we can use the article() function to create one.
2 download() With the help of the download() function, we can download the article which URL we provided in the program.
3 parse() After downloading the article, we have to parse the article, and we can do this using the parse() function.
4 nlp() We also have to perform NLP on the parsed article before we curate any useful information from the article, and we can do this using the nlp() function.
5 instanceName.title It is used to print the title of the article.
6 instanceName.text We can use this function if we want to print the text of the article.
7 instanceName.keywords This function is very helpful as it prints all the important keywords from the article.
8 instanceName.summary If we want to print the summary of the article, we can use this function.

These are all important functions of the Newspaper Module, and we can use them according to our choice of what type of information (like keywords, title, etc.) we want from the article.

Conclusion

It is not possible for all of us to read the complete newspaper and therefore, we want only useful information from the article. Newspaper Module provides us the option from where we can only get useful information of an article by performing newspaper scrapping on the article. We can perform newspaper scrapping using the functions of the Newspaper Module in a Python program and print all the useful information from an article's link in the output.







Youtube For Videos Join Our Youtube Channel: Join Now

Feedback


Help Others, Please Share

facebook twitter pinterest

Learn Latest Tutorials


Preparation


Trending Technologies


B.Tech / MCA