Extract data from news articles using ChatGPT in Python

Afzal Muhammad
5 min readJul 7, 2023

--

Introduction

ChatGPT, or the Chat Generative Pre-Trained Transformer (GPT), is a large language model created by the artificial intelligence research lab OpenAI. Since its announcement, it has been in the limelight and achieved tremendous interest from everyone, “literally” from everyone!!! Today, many enterprises have already implemented its use cases within their applications to gain competitive edge and many more are in the process of integrating it.

ChatGPT API is a part of OpenAI which provides access to OpenAI advanced language models such as GPT-3 or GPT-4. ChatGPT API is disruptive and revolutionizing and enable developers in building conversational AI applications. It uses Natural Language Processing (NLP) to understand and generate human like responses. It’s an ideal platform for building chatbots, generating contents, analyzing sentiments, summarizing and extracting data, and building other conversational applications.

In this article, I would be describing how easy is to extract key information from a content. I would further explain how to retrieve API key from OpenAI Azure resource, and how you can build your own AI driven application using ChatGPT API.

Implementation

Perform the following steps to implement this use case.

Get the API Key

In order to use ChatGPT API, you need an API key. This can be obtained using OpenAI’s website Overview — OpenAI API by selecting the view API keys after logging in.

However, in this article, I have used Azure OpenAI resource. Login to Azure portal and get to Azure OpenAI resource.

Click Keys and Endpoint and capture one of the keys as shown below.

Python code

Let’s start coding!!!

In this example, I have used Flask which is very light weight Python framework for building web application. I used visual studio code for coding and created the virtual environment for python and deployed relevant libraries within that environment. How to use visual studio code for flask is beyond the scope of this article. However, you can find details about it at Python and Flask Tutorial in Visual Studio Code

Following files are needed to complete this example.

1- requirements.txt

Flask==2.0.2
openai
gunicorn

2- app.py

import os
import openai

from flask import (Flask, redirect, render_template, request,
send_from_directory, url_for)

app = Flask(__name__)


def get_prompt():
return '''retrieve company name, revenue, net income, earnings per share (EPS)
from the following news article
return the reponse in the following JSON string. the format of the string should be
{
"company": "Microsoft",
"ticker": "MSFT",
"revenue": "143015",
"revenue period ended": "2020",
"revenue growth": "14.3",
"net income": "61271",
"net income growth": "13.6",
"earning per share": "8.05"
"earning per share growth": "15.4"

}
News Article:
=================
'''

@app.route('/')
def index():
print('Request for index page received')
return render_template('chatgpt.html')


@app.route('/chatgptresponse', methods=['POST'])
def chatgptresponse():
openai.api_type = "azure"
openai.api_base ="https://openaichatgptdemo12345.openai.azure.com/"
openai.api_version = "2023-03-15-preview"
# provide your api key
openai.api_key = "xxxxxxxxxxxxxxxxxxxxxxxxxx"

strText = request.form.get('txtchatgpt')
if strText:
print('Chat GPT Response for %s' % strText)

strText = get_prompt() + strText
response = openai.ChatCompletion.create(
engine="demo",


messages=[
{"role": "user", "content": strText },
],
temperature=0.7,
max_tokens=800,
top_p=0.95,
frequency_penalty=0,
presence_penalty=0,
stop=None)


result = ''
for choice in response.choices:
result += choice.message.content

return render_template('chatgptresponse.html', result=result)

if __name__ == '__main__':
app.run()

3- chatgpt.html

<!doctype html>
<head>
<title>Hello Azure - Python Quickstart</title>
<link rel="stylesheet" href="{{ url_for('static', filename='bootstrap/css/bootstrap.min.css') }}">
<link rel="shortcut icon" href="{{ url_for('static', filename='favicon.ico') }}">
</head>
<html>
<body>
<main>
<div class="px-4 py-3 my-2 text-center">
<img class="d-block mx-auto mb-4" src="{{ url_for('static', filename='images/azure-icon.svg') }}" alt="Azure Logo" width="192" height="192"/>
<!-- <img src="/docs/5.1/assets/brand/bootstrap-logo.svg" alt="" width="72" height="57"> -->
<h1 class="display-6 fw-bold text-primary">Welcome to Azure ChatGPT Demo</h1>
</div>
<form method="post" action="{{url_for('chatgptresponse')}}">
<div class="col-md-6 mx-auto text-center">
<label for="name" class="form-label fw-bold fs-5">Type your text here</label>

<!-- <p class="lead mb-2">Could you please tell me your name?</p> -->
<div class="d-grid gap-2 d-sm-flex justify-content-sm-center align-items-center my-1">
<textarea class="form-control" id="txtchatgpt" name="txtchatgpt" rows="5" cols="20"></textarea>
</div>
<div class="d-grid gap-2 d-sm-flex justify-content-sm-center my-2">
<button type="submit" class="btn btn-primary btn-lg px-4 gap-3">Submit text</button>
</div>
</div>
</form>
</main>
</body>
</html>

4- chatgptresponse.html

<!doctype html>
<head>
<title>Hello Azure - Python Quickstart</title>
<link rel="stylesheet" href="{{ url_for('static', filename='bootstrap/css/bootstrap.min.css') }}">
<link rel="shortcut icon" href="{{ url_for('static', filename='favicon.ico') }}">
</head>
<html>
<body>
<main>
<div class="px-4 py-3 my-2 text-center">
<img class="d-block mx-auto mb-4" src="{{ url_for('static', filename='images/azure-icon.svg') }}" alt="Azure Logo" width="192" height="192"/>
<!-- <img src="/docs/5.1/assets/brand/bootstrap-logo.svg" alt="" width="72" height="57"> -->

<p class="fs-5">
{{result}}
</p>


<a href="{{ url_for('index') }}" class="btn btn-primary btn-lg px-4 gap-3">Back</a>
</div>
</main>
</body>
</html>

Once you setup the code in visual studio code, you can run the following in VS code terminal.

py -m venv .venv
.venv/Scripts/Activate.ps1
pip install -r requirements.txt

flask run

flask run will start the web server on localhost at default port of 5000 as shown below. Remember, this webserver is not recommended for production setup.

Launch the Application

To open your default browser to the rendered page, Ctrl+click the http://127.0.0.1:5000/ URL in the terminal. Or launch the browser and type the URL http://127.0.0.1:5000

You will see the following as shown in the below figure. I captured the financial summary from a financial website and copy and paste it as shown below.

Click submit text. ChatGPT API will extract the data as specified in the python code in get_prompt method.

Conclusion

This is a very basic example, may or may not be perfect, but offers a great conceptual start to implement data extraction use case using ChatGPT. The output of this example can also be loaded into the database to generate reports based on many companies data. This could be an ETL task within the workflow. Opportunities are endless. ChatGPT is making the most daunting tasks easy with great level of accuracy!!!

--

--

Afzal Muhammad

Innovative and transformative cross domain cloud solution architect @Microsoft (& xCisco). Helping companies to digitally transform!!!