Pandas Split Dataframe into two Dataframes at a specific column

Question

I have pandas DataFrame which I have composed from concat. One row consists of 96 values, I would like to split the DataFrame from the value 72.

So that the first 72 values of a row are stored in Dataframe1, and the next 24 values of a row in Dataframe2.

I create my DF as follows:

temps = DataFrame(myData)
datasX = concat(
[temps.shift(72), temps.shift(71), temps.shift(70), temps.shift(69), temps.shift(68), temps.shift(67),
 temps.shift(66), temps.shift(65), temps.shift(64), temps.shift(63), temps.shift(62), temps.shift(61),
 temps.shift(60), temps.shift(59), temps.shift(58), temps.shift(57), temps.shift(56), temps.shift(55),
 temps.shift(54), temps.shift(53), temps.shift(52), temps.shift(51), temps.shift(50), temps.shift(49),
 temps.shift(48), temps.shift(47), temps.shift(46), temps.shift(45), temps.shift(44), temps.shift(43),
 temps.shift(42), temps.shift(41), temps.shift(40), temps.shift(39), temps.shift(38), temps.shift(37),
 temps.shift(36), temps.shift(35), temps.shift(34), temps.shift(33), temps.shift(32), temps.shift(31),
 temps.shift(30), temps.shift(29), temps.shift(28), temps.shift(27), temps.shift(26), temps.shift(25),
 temps.shift(24), temps.shift(23), temps.shift(22), temps.shift(21), temps.shift(20), temps.shift(19),
 temps.shift(18), temps.shift(17), temps.shift(16), temps.shift(15), temps.shift(14), temps.shift(13),
 temps.shift(12), temps.shift(11), temps.shift(10), temps.shift(9), temps.shift(8), temps.shift(7),
 temps.shift(6), temps.shift(5), temps.shift(4), temps.shift(3), temps.shift(2), temps.shift(1), temps,
 temps.shift(-1), temps.shift(-2), temps.shift(-3), temps.shift(-4), temps.shift(-5), temps.shift(-6),
 temps.shift(-7), temps.shift(-8), temps.shift(-9), temps.shift(-10), temps.shift(-11), temps.shift(-12),
 temps.shift(-13), temps.shift(-14), temps.shift(-15), temps.shift(-16), temps.shift(-17), temps.shift(-18),
 temps.shift(-19), temps.shift(-20), temps.shift(-21), temps.shift(-22), temps.shift(-23)], axis=1)

Question is: How can split them? :)

please edit the question to specify that you want to split vertically along columns and not horizontally along rows. — Nikhil VJ
– Nikhil VJ, Commented Jun 5, 2020 at 12:01

zabop · Accepted Answer · 2020-07-25 16:23:58Z

134

`iloc`

df1 = datasX.iloc[:, :72]
df2 = datasX.iloc[:, 72:]

(iloc docs)

edited Jul 25, 2020 at 16:23

zabop

8,1124 gold badges56 silver badges112 bronze badges

answered Jan 12, 2017 at 22:36

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mcp Over a year ago

Note this comment. If you were trying to split on rows it would just be df[:72].

MaxU - stand with Ukraine · Accepted Answer · 2017-01-12 22:38:49Z

84

use np.split(..., axis=1):

Demo:

In [255]: df = pd.DataFrame(np.random.rand(5, 6), columns=list('abcdef'))

In [256]: df
Out[256]:
          a         b         c         d         e         f
0  0.823638  0.767999  0.460358  0.034578  0.592420  0.776803
1  0.344320  0.754412  0.274944  0.545039  0.031752  0.784564
2  0.238826  0.610893  0.861127  0.189441  0.294646  0.557034
3  0.478562  0.571750  0.116209  0.534039  0.869545  0.855520
4  0.130601  0.678583  0.157052  0.899672  0.093976  0.268974

In [257]: dfs = np.split(df, [4], axis=1)

In [258]: dfs[0]
Out[258]:
          a         b         c         d
0  0.823638  0.767999  0.460358  0.034578
1  0.344320  0.754412  0.274944  0.545039
2  0.238826  0.610893  0.861127  0.189441
3  0.478562  0.571750  0.116209  0.534039
4  0.130601  0.678583  0.157052  0.899672

In [259]: dfs[1]
Out[259]:
          e         f
0  0.592420  0.776803
1  0.031752  0.784564
2  0.294646  0.557034
3  0.869545  0.855520
4  0.093976  0.268974

np.split() is pretty flexible - let's split an original DF into 3 DFs at columns with indexes [2,3]:

In [260]: dfs = np.split(df, [2,3], axis=1)

In [261]: dfs[0]
Out[261]:
          a         b
0  0.823638  0.767999
1  0.344320  0.754412
2  0.238826  0.610893
3  0.478562  0.571750
4  0.130601  0.678583

In [262]: dfs[1]
Out[262]:
          c
0  0.460358
1  0.274944
2  0.861127
3  0.116209
4  0.157052

In [263]: dfs[2]
Out[263]:
          d         e         f
0  0.034578  0.592420  0.776803
1  0.545039  0.031752  0.784564
2  0.189441  0.294646  0.557034
3  0.534039  0.869545  0.855520
4  0.899672  0.093976  0.268974

answered Jan 12, 2017 at 22:38

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

3 Comments

Steve Scott Over a year ago

note np.split has been deprecated and will give a FutureWarning when it is used.

Skippy le Grand Gourou Jun 20 at 9:27

@SteveScott np.split doesn’t seem deprecated in Numpy 2.3, works pretty well without any warning.

Skippy le Grand Gourou Jun 20 at 9:40

Oh, however on Pandas DataFrames it uses a deprecated Pandas component (swapaxes) and no fix is planned, so it should be used on the index to ensure future compatibility, see stackoverflow.com/a/77858849/812102.

Leo Ufimtsev · Accepted Answer · 2020-08-24 23:52:35Z

20

I generally use array split because it's easier simple syntax and scales better with more than 2 partitions.

import numpy as np
partitions = 2
dfs = np.array_split(df, partitions)

np.split(df, [100,200,300], axis=0] wants explicit index numbers which may or may not be desirable.

answered Aug 24, 2020 at 23:52

Leo Ufimtsev

6,5826 gold badges44 silver badges50 bronze badges

1 Comment

neves Over a year ago

array_split is deprecated too :-( It use np.split

Skippy le Grand Gourou · Accepted Answer · 2025-06-20 09:52:12Z

A bit too long for a comment : note that np.split also accepts a number of sections instead of the indices, but will raise an error if it cannot split into equal length dataframes :

>>> len(df)
20

>>> a = np.split(df, 4)
>>> [len(u) for u in a]
[5, 5, 5, 5]

>>> a = np.split(df, 3)
ValueError: array split does not result in an equal division

np.array_split does the same but will use a best fit instead of raising an error :

>>> a = np.array_split(df, 4)
>>> [len(u) for u in a]
[5, 5, 5, 5]

>>> a = np.array_split(df, 3)
>>> [len(u) for u in a]
[7, 7, 6]

WARNING : as noted in the comments above, when applied on Pandas DataFrames, these functions use pandas.swapaxes, which is deprecated, and it won’t be fixed. It should be used on the dataframe index instead. A convenient function :

def split(df, num_chunks):
    return [
        df.loc[chunk_idx]
        for chunk_idx in np.array_split(df.index, num_chunks)
    ]

>>> a = split(df, 4)
>>> [len(u) for u in a]
[5, 5, 5, 5]

mathfux · Accepted Answer · 2025-09-23 05:51:34Z

In addition to deprecated DataFrame.swapaxes call under the hood of numpy.split we can look at a source code which is equivalent to:

import numpy as np
def split_df(df, sections, axis=0):
    Nsections = len(sections) + 1
    div_points = [0] + list(sections) + [df.shape[axis]]
    sub_arrays = []
    sary = np.swapaxes(df, axis, 0) #throws warning of DataFrame.swapaxes deprecation
    for i in range(Nsections):
        st = div_points[i]
        end = div_points[i + 1]
        sub_arrays.append(np.swapaxes(sary[st:end], axis, 0))
    return sub_arrays

It looks like use of np.swapaxes transposes df in order to call a single slicing on it and transpose it back. So it could be replaced with advanced indexing of df :

def split_df(df, sections, axis=0):
    Nsections = len(sections) + 1
    div_points = [0] + list(sections) + [df.shape[axis]]
    sub_arrays = []
    for i in range(Nsections):
        st = div_points[i]
        end = div_points[i + 1]
        if axis == 0:
            sub_arrays.append(df.iloc[st:end, :])
        elif axis == 1:
            sub_arrays.append(df.iloc[:, st:end])
    return sub_arrays

or more pure pythonic approach:

def split_df(df, sections, axis=0):
    div_points = [0] + list(sections) + [df.shape[axis]]
    if axis == 0:
        sub_arrays = [df.iloc[st:end, :] for st, end in zip(div_points[:-1], div_points[1:])]
    elif axis == 1:
        sub_arrays = [df.iloc[:, st:end] for st, end in zip(div_points[:-1], div_points[1:])]
    return sub_arrays

df = pd.DataFrame(np.random.rand(5, 6), columns=list('abcdef'))
split_df(df, [2, 3], axis=0)

41 72 6c · Accepted Answer · 2025-09-23 08:42:03Z

If you just want to split by a column position, use iloc with a split index (also mentioned in the first answer) and take a .copy to avoid the SettingWithCopyWarning:

k = 72
df_left  = datasX.iloc[:, :k].copy() # for the first 72 columns
df_right = datasX.iloc[:, :k].copy() # for the remaining 24 columns

If you prefer to define the split by last 24 columns, compute k from the shape:

k = datasX.shape[1] - 24
df_left  = datasX.iloc[:, :k].copy()
df_right = datasX.iloc[:, k:].copy()

And you could also wrap it into a small helper to make it reusable, so it would be something like:

def split_at(df, k):
    """Return two DataFrames split at column index k."""
    return df.iloc[:, :k].copy(), df.iloc[:, k:].copy()

df1, df2 = split_at(datasX, 72)

And with that you would keep the original index and column names on both outputs.

Collectives™ on Stack Overflow

Pandas Split Dataframe into two Dataframes at a specific column

6 Answers 6

`iloc`

1 Comment

3 Comments

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

iloc

1 Comment

3 Comments

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

`iloc`