Confusing title, let me explain. I have 2 dataframes like this:
dataframe named df1: Looks like this (with million of rows in original):
id ` text c1
1 Hello world how are you people 1
2 Hello people I am fine people 1
3 Good Morning people -1
4 Good Evening -1
Dataframe named df2 looks like this:
Word count Points Percentage
hello 2 2 100
world 1 1 100
how 1 1 100
are 1 1 100
you 1 1 100
people 3 1 33.33
I 1 1 100
am 1 1 100
fine 1 1 100
Good 2 -2 -100
Morning 1 -1 -100
Evening 1 -1 -100
-1
df2 columns explaination:
count means the total number of times that word appeared in df1
points is points given to each word by some kind of algorithm
percentage = points/count*100
Now, I want to add 40 new columns in df1, according to the point & percentage. They will look like this:
perc_-90_2 perc_-80_2 perc_-70_2 perc_-60_2 perc_-50_2 perc_-40_2 perc_-20_2 perc_-10_2 perc_0_2 perc_10_2 perc_20_2 perc_30_2 perc_40_2 perc_50_2 perc_60_2 perc_70_2 perc_80_2 perc_90_2
perc_-90_1 perc_-80_1 perc_-70_1 perc_-60_1 perc_-50_1 perc_-40_1 perc_-20_1 perc_-10_1 perc_0_1 perc_10_1 perc_20_1 perc_30_1 perc_40_1 perc_50_1 perc_60_ perc_70_1 perc_80_1 perc_90_1
Let me break it down. The column name contain 3 parts:
1.) perc just a string, means nothing
2.) Numbers from range -90 to +90. For example, Here -90 means, the percentage is -90 in df2. Now for example, If a word has percentage value in range 81-90, then there will be a value of 1 in that row, and column named prec_-80_xx. The xx is the third part.
3.) The third part is the count. Here I want two type of counts. 1 and 2. As the example given in point 2, If the word count is in range of 0 to 1, then the value will be 1 in prec_-80_1 column. If the word count is 2 or more, then the value will be 1 in prec_-80_2 column.
I hope it is not very on confusing.
33.33is valueperc_30_2?33.33will be inperc_30_2