Let's start coding

Following are the steps to make a RSS Feed parser.

  1. Import the feedparser library in python file. 
    import feedparser
  2. Make a function(rss_feed_parser) that takes input as URL of RSS feed(feed_link) and returns the parsed information from the RSS file in the form of python list(feed_result).
    def rss_feed_parser(feed_link):
        feed_result = []
        return feed_result
    
  3. Use feedparser to parse the file to get information about all the items in it. We get a python list items, each element of this list contains information about 1 post.
    def rss_feed_parser(feed_link):
        feed_result = []
        data = feedparser.parse(feed_link)
        items = data.entries
        return feed_result
  4. We would iterate over the items list and store the necessary information about each item in our feed_result python list. Each entry of item list is of type dictionary. 
    def rss_feed_parser(feed_link):
        data = feedparser.parse(feed_link)
        items = data.entries
        feed_result = []
        for item in items:
            feed_keys = list(item.keys())
            result = {'title': item['title'], 'link': item['link']}
            if 'content' in feed_keys:
                result['content'] = item['content'][0]['value']
            if 'published' in feed_keys:
                result['date_publication'] = item['published']
            if 'authors' in feed_keys:
                result['authors'] = item['authors']
            if 'tags' in feed_keys:
                tags = []
                for tag in item['tags']:
                    tags.append(tag['term'])
                result['tags'] = tags
            if 'enclosures' in feed_keys:
                result['enclosures'] = item['enclosures']
            feed_result.append(result)
        return feed_result
    feed_keys contains all the keys in the item dictionary. Each key of this dictionary contains the some information about the post which we discussed in the previous section. We use this to check whether keys that we want are present in the dictionary or not. Like in some RSS files, we donot have tag information then key tags would be missing. We store content, publication information, authors, tags, title, content and enclosures of each item in result dictionary. We append this result dictionary in our feed_result list. Feed_list now contains the extracted information of each post in the RSS file. We return this list.

Now, lets test this on Hacker Noon RSS feed and see the results.

print(rss_feed_parser('https://hackernoon.com/feed'))

We get a list of dictionaries containing information about each post on the Hacker Noon present in current the RSS File :)

Discussion

2

0