507

In order to test some functionality I would like to create a DataFrame from a string. Let's say my test data looks like:

TESTDATA="""col1;col2;col3
1;4.4;99
2;4.5;200
3;4.7;65
4;3.2;140
"""

What is the simplest way to read that data into a Pandas DataFrame?

7 Answers 7

852

A simple way to do this is to use StringIO.StringIO (python2) or io.StringIO (python3) and pass that to the pandas.read_csv function. E.g:

import sys
if sys.version_info[0] < 3: 
    from StringIO import StringIO
else:
    from io import StringIO

import pandas as pd

TESTDATA = StringIO("""col1;col2;col3
    1;4.4;99
    2;4.5;200
    3;4.7;65
    4;3.2;140
    """)

df = pd.read_csv(TESTDATA, sep=";")
Sign up to request clarification or add additional context in comments.

8 Comments

FYI - pd.read_table() is an equivalent function, just slightly better nomenclature: df = pd.read_table(TESTDATA, sep=";").
@AntonvBR Noted that one could use pandas.compat.StringIO. That way we don't have to import StringIO separately. However the pandas.compat package is considered private according to pandas.pydata.org/pandas-docs/stable/api.html?highlight=compat so leaving the answer as is for now.
I receive 'Error tokenizing data. C error: Expected 2 fields in line 26, saw 12\n',)
@gdm The "Expected fields error" is usually due to inconsistent number of fields in each row, or wrong sep/delimiter.
@Asclepius the suggestion to use pandas.compat only works for versions before pandas 0.25, but will now raise ImportError.
|
74

In one line, but first import io

import pandas as pd
import io   

TESTDATA="""col1;col2;col3
1;4.4;99
2;4.5;200
3;4.7;65
4;3.2;140
"""

df = pd.read_csv(io.StringIO(TESTDATA), sep=";")
print(df)

2 Comments

What is the difference between this and the accepted answer? Except you move io operation to read_csv, which makes no difference... Please always check if similar answer is not posted already, redundancy is unnecessary.
I like this answer because it is explained with one example in one line.
63

Split Method

data = input_string
df = pd.DataFrame([x.split(';') for x in data.split('\n')])
print(df)

4 Comments

If you want the first line to be used for column names, change the 2nd line to this: df = pd.DataFrame([x.split(';') for x in data.split('\n')[1:]], columns=[x for x in data.split('\n')[0].split(';')])
This is wrong, since on CSV files the newline (\n) character can be part of a field.
This is not very robust, and most people would be better with the accepted answer. There is a very partial list of things that can go wrong with this at thomasburette.com/blog/2014/05/25/…
This is a weak and unnecessary approach. This circumvents the built-in pd.read_csv() call which handles any CSV and has been battle-tested in favor of a naive couple of string splits that don't handle quoting and are liable to fail in many cases. It's also more verbose. Please use the CSV parser Pandas kindly gives you!
26

A quick and easy solution for interactive work is to copy-and-paste the text by loading the data from the clipboard.

Select the content of the string with your mouse:

Copy data for pasting into a Pandas dataframe

In the Python shell use read_clipboard()

>>> pd.read_clipboard()
  col1;col2;col3
0       1;4.4;99
1      2;4.5;200
2       3;4.7;65
3      4;3.2;140

Use the appropriate separator:

>>> pd.read_clipboard(sep=';')
   col1  col2  col3
0     1   4.4    99
1     2   4.5   200
2     3   4.7    65
3     4   3.2   140

>>> df = pd.read_clipboard(sep=';') # save to dataframe

1 Comment

Not good for reproducibility, but otherwise a pretty neat solution!
9

This answer applies when a string is manually entered, not when it's read from somewhere.

A traditional variable-width CSV is unreadable for storing data as a string variable. Especially for use inside a .py file, consider fixed-width pipe-separated data instead. Various IDEs and editors may have a plugin to format pipe-separated text into a neat table.

Using read_csv

Store the following in a utility module, e.g. util/pandas.py. An example is included in the function's docstring.

import io
import re

import pandas as pd


def read_psv(str_input: str, **kwargs) -> pd.DataFrame:
    """Read a Pandas object from a pipe-separated table contained within a string.

    Input example:
        | int_score | ext_score | eligible |
        |           | 701       | True     |
        | 221.3     | 0         | False    |
        |           | 576       | True     |
        | 300       | 600       | True     |

    The leading and trailing pipes are optional, but if one is present,
    so must be the other.

    `kwargs` are passed to `read_csv`. They must not include `sep`.

    In PyCharm, the "Pipe Table Formatter" plugin has a "Format" feature that can 
    be used to neatly format a table.

    Ref: https://stackoverflow.com/a/46471952/
    """

    substitutions = [
        ('^ *', ''),  # Remove leading spaces
        (' *$', ''),  # Remove trailing spaces
        (r' *\| *', '|'),  # Remove spaces between columns
    ]
    if all(line.lstrip().startswith('|') and line.rstrip().endswith('|') for line in str_input.strip().split('\n')):
        substitutions.extend([
            (r'^\|', ''),  # Remove redundant leading delimiter
            (r'\|$', ''),  # Remove redundant trailing delimiter
        ])
    for pattern, replacement in substitutions:
        str_input = re.sub(pattern, replacement, str_input, flags=re.MULTILINE)
    return pd.read_csv(io.StringIO(str_input), sep='|', **kwargs)

Non-working alternatives

The code below doesn't work properly because it adds an empty column on both the left and right sides.

df = pd.read_csv(io.StringIO(df_str), sep=r'\s*\|\s*', engine='python')

As for read_fwf, it doesn't actually use so many of the optional kwargs that read_csv accepts and uses. As such, it shouldn't be used at all for pipe-separated data.

1 Comment

I found (by trial&error) that read_fwf takes more of read_csvs arguments than is documented, but it's true that some have no effect.
4

Object: Take string make dataframe.

Solution

def str2frame(estr, sep = ',', lineterm = '\n', set_header = True):
    dat = [x.split(sep) for x in estr.strip(lineterm).split(lineterm)]
    df = pd.DataFrame(dat)
    if set_header:
        df = df.T.set_index(0, drop = True).T # flip, set ix, flip back
    return df

Example

estr = """
sym,date,strike,genus
APPLE,20MAY20,50.0,Malus
ORANGE,22JUL20,50.0,Rutaceae
"""

df = str2frame(estr)

print(df)
0     sym     date strike     genus
1   APPLE  20MAY20   50.0     Malus
2  ORANGE  22JUL20   50.0  Rutaceae

4 Comments

Please use descriptive variable names and not estr, dat, cdf.
The style is intentional; feel free to copy this answer and use your own notation.
Same remark as this answer. Please don't reinvent the wheel, just use pd.read_csv().
If you have one, a better solution depending only on pandas would be appreciated.
0

Emample:

text = [ ['This is the NLP TASKS ARTICLE written by Anjum**'] ,['IN this article I”ll be explaining various DATA-CLEANING techniques '], ['So stay tuned for FURther More && '],['Nah I dont think he goes to usf ; he lives around']]
df = pd.DataFrame({'text':text})

Output enter image description here

3 Comments

This has nothing to do with the question. OP wants to read a CSV from a string, completely missing here.
@ggorlen. First see the question: "Create Pandas DataFrame from a string" and then see my answer. lol
Sorry, I did that and don't see what you mean. There's no CSV string in your answer. You've hardcoded a list, completely bypassing a conversion step that all of the other answers here address. The question has nothing to do with converting lists to dataframes. Even if you were going to hardcode lists, this is a poor solution because it adds extra single-element inner list wrappers on each of the strings for no apparent reason, making the text column an object rather than a string datatype.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.