0

I have data in a txt file and I need to separate a sentence from a value. Every line of the txt file has the form <Sentence> <number>. I need to read the value and the sentence in two different columns, but the sentences can contain numbers, dots and every possible stuff since they are just random sentences. The numeric value in question though is always at the end of the line. For example :

This coffee is bad. -1

How can I do this in Python?

3
  • 1
    So you need regex. Python lib re. Commented Jun 28, 2022 at 20:47
  • If the format is ...anything here... then ". ##" that will be fairly simple. But the end of the sentence is the key. Is it always a "." followed by space(s)? Commented Jun 28, 2022 at 20:47
  • No it's not always a dot followed by spaces. Sometimes dots are forgotten, sometimes are 3 dots, sometimes it's a comma and whatever you may write like parenthesis or so. The only thing that's always true is that the value is at the end of the sentence separated by 3 spaces Commented Jun 29, 2022 at 8:44

3 Answers 3

1

if it always follows the format sentence / random <space><number><end> then something like:

sent, _, num = input_str.rpartition(' ')

Sign up to request clarification or add additional context in comments.

Comments

0

Here is a solution using to load the CSV as DataFrame with a regex separator:

import pandas as pd

df = pd.read_csv('file.csv', sep='\s(?=\S+$)', engine='python',
                 header=None, names=['sentence', 'Value'])

Output:

              sentence  value
0  This coffee is bad.     -1
1        other example    123

You can then easily convert to lists:

df.to_dict('list')

Output:

{'sentence': ['This coffee is bad.', 'other example'],
 'value': [-1, 123]}

Used text input:

This coffee is bad. -1
other example 123

2 Comments

This worked smoothly. Just to understand, how am i supposed to know this separator works for multiple spaces?
I don't get the question, this regex works only for 2 columns as described: sentence + single "word"/digits in the end
0

There are many ways to do it.

The simple/dirty solution is as follows:

  • Run regex pattern to extract digit groups then select the last one as the second column.
  • Subtract what you find in the first step from the string/line and make it the first column.

This code should give you an idea.

import re

sample = "This coffee 5656 is bad. -134 -454"
    
result = re.findall('[0-9]+', sample)
    
first_column = sample.replace(result[-1], '')
second_column = result[-1]

print(f'First Column: {first_column}')
print(f'Second Column: {second_column}')

Output

First Column: This coffee 5656 is bad. -134 -
Second Column: 454

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.