0

I have the following code block:

from jira import JIRA
import pandas as pd

cert_path = 'C:\\cert.crt'

start_date = '2020-10-01'
end_date = '2020-10-31'

a_session = JIRA(server='https://jira.myinstance-A.com', options={'verify': cert_path}, kerberos=True)

b_session = JIRA(server='https://jira.myinstance-B.com', options={'verify': cert_path}, kerberos=True)

c_session = JIRA(server='https://jira.myinstance-C.com', options={'verify': cert_path}, kerberos=True)



query_1 = 'project = \"Test Project 1\" and issuetype = Incident and resolution = Resolved and updated >= {} and updated <= {}'.format(start_date, end_date)

query_2 = 'project = \"Test Project 2\" and issuetype = Incident and resolution = Resolved and updated >= {} and updated <= {}'.format(start_date, end_date)

query_3 = 'project = \"Test Project 3\" and issuetype = Defect and resolution = Resolved and releasedate >= {} and releasedate <= {}'.format(start_date, end_date)

query_4 = 'project = \"Test Project 4\" and issuetype = Enhancement and resolution = Done and completed >= {} and completed <= {}'.format(start_date, end_date)

block_size = 100
block_num = 0

all_issues = []
while True:
    start = block_num * block_size
    issues = a_session.search_issues(query_1, start, block_size)
    if len(issues) == 0:
        break
    block_num += 1
    for issue in issues:
        all_issues.append(issue)

issues = pd.DataFrame()

for issue in all_issues:
    d = {
        'key' : issue.key,
        'type' : issue.fields.type,
        'creator' : issue.fields.creator,
        'resolution' : issue.fields.resolution
    }

    issues = issues.append(d, ignore_index=True)

This code runs fine and allows me to:

  1. retrieve data associated with only query_1 (which connects to a_session)
  2. save that data into a Pandas dataframe

Now, I would like to be able to:

a. retrieve the data associated with query_2 (which also onnects to a_session) and save it to the issues dataframe

b. retrieve the data associated with query_3 (which connects to b_session) and save it to the issues dataframe

c. retrieve the data associated with query_4 (which connects to c_session) and save it to the issues dataframe

Notice that the structure of query_3 and query_4 is different than that of query_1 and query_2 (the field names are different, among other things).

I could write one GIANT script (which would probably work). But, I'm sure there is a more elegant way of approaching this (perhaps with a nested loop).

What's the best way of adapting this code block such that it treats cases a, b, and c above?

Any help would be much appreciated by this Python novice! Thanks in advance!



UPDATE 1:

I used the (very elegant) solution suggested by @Nick ODell. The code runs fine, but for whatever reason, I get a None result. I spent the past few hours trying to debug this and my leading theory is that the field names are not passed (as they are in d in the original code block I posted).

I tried to amend the get_all_issues function as follows:

def get_all_issues(session, query):
    start = 0
    all_issues = []
    while True:
        issues = session.search_issues(query, start, block_size)
        if len(issues) == 0:
            # No more issues
            break
        start += len(issues)
        for issue in issues:
            all_issues.append(issue)

    issues = pd.DataFrame

    for issue in all_issues:
        d = {
            'key' : issue.key,
            'type' : issue.fields.type,
            'creator' : issue.fields.creator,
            'resolution' : issue.fields.resolution
             }

    issues = issues.append(d, ignore_index=True)

But, now there is an error message saying:

ValueError:  All objects passed were None.

How would we amend the get_all_issues() function such that we can nest the following for loop and pass in the name fields, as follows?

for issue in all_issues:
    d = {
        'key' : issue.key,
        'type' : issue.fields.type,
        'creator' : issue.fields.creator,
        'resolution' : issue.fields.resolution
    }

    issues = issues.append(d, ignore_index=True)


UPDATE 2:

Instead of using pd.json_normalize(issues), I used pd.DataFrame(issues) and added a dictionary of field names. The following code works ** because all fields exist in a_session, b_session, and c_session**:

def get_all_issues(session, query):

    block_size = 50
    block_num = 0
    
    start = 0
    all_issues = []
    while True:
        issues = session.search_issues(query, start, block_size)
        if len(issues) == 0:
            # No more issues
            break
        start += len(issues)
        for issue in issues:
            all_issues.append(issue)

    issues = pd.DataFrame(issues)

    for issue in all_issues:
        d = {
            'key' : issue.key,
            'type' : issue.fields.type,
            'creator' : issue.fields.creator,
            'resolution' : issue.fields.resolution
             }

        issues = issues.append(d, ignore_index=True)

    return issues

Then, I added 3 new custom fields to the dictionary:

    for issue in all_issues:
        d = {
            'key' : issue.key,
            'type' : issue.fields.type,
            'creator' : issue.fields.creator,
            'resolution' : issue.fields.resolution,
            'system_change' : issue.fields.customfield_123,
            'system_resources' : issue.fields.customfield_456,
            'system_backup' : issue.fields.customfield_789
             }

Custom field 123 exists in a_session and b_session, but not in c_session. Custom field 456 exists only in c_session. And, custom field 789 exists in b_session and c_session.

Doing so results in the following error: AttributeError: type object 'PropertyHolder' has no attribute 'customfield_123'.

Can anyone suggest an elegant solution to handle this? (i.e. the ability to have a dictionary with any number of fields, and the code 'understands' which fields relate to a given session) Thanks!

2 Answers 2

1

Here's how I would approach your problem. I don't have a Jira instance to test against, so this code is untested.

First, define a function to fetch all issues from a given session for a given query:

def get_all_issues(session, query):
    start = 0
    all_issues = []
    while True:
        issues = session.search_issues(query, start, block_size)
        if len(issues) == 0:
            # No more issues
            break
        start += len(issues)
        for issue in issues:
            all_issues.append(issue)
    # Flatten JSON
    # This idea is from
    # https://levelup.gitconnected.com/jira-api-with-python-and-pandas-c1226fd41219
    return pd.json_normalize(issues)

Next, I would make a list of queries, and the corresponding backend.

queries = [
    (a_session, query_1),
    (a_session, query_2),
    (b_session, query_3),
    (c_session, query_4),
]

Next, I would loop over each pair of session and query, calling the function I just defined, and save the dataframe I get each time.

dataframes = []

for session, query in queries:
    dataframe = get_all_issues(session, query)
    dataframes.append(dataframe)

Now, the field names for each of these dataframes won't be the same. However, Pandas is actually tolerant to this, and if a column is present in one dataset but not in another, Pandas will fill in the missing column with NaN values. So, just concatenate the rows from each dataframe together:

all = pd.concat(dataframes)

... and that's it!

Sign up to request clarification or add additional context in comments.

5 Comments

For whatever reason, using the approach suggested by @Nick ODell results in an empty set. My leading theory is that the field names are not passed (hence the None result). How would we use pd.DataFrame instead of pd.json_normalize and pass the field names (as I've done above)? Thanks in advance!
@equanimity Not sure. I can't actually test this code, so it's hard for me to say what might be wrong. Could you give me a copy of the contents of the issues variable? e.g. print(repr(issues)) (Assuming that you're allowed to post it.)
when I print(repr(issues)), I get: Name Error: name 'issues' is not defined. I have the code working, but without using pd.json_normalize(issues). Now, the problem I'm facing is that some fields exist in certain sessions, but not in others. I'll post another update above.
I see you changed issues to all_issues in your version of the code. Therefore, can you post the contents of print(repr(all_issues))?
when I print(repr(all_issues)), I get: Name Error: name 'all_issues' is not defined.
0

Instead of making separate variables for the queries, put them in a list:

queries = [
    'query 1 here...',
    'query 2 here...',
]

And then iterate over the list:

for query in queries:
    process(query)

1 Comment

How would I deal with the fact that 3 of the 4 queries reference different <x>_session variables?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.