0

I have list of keywords

val mykeywords = Array("Glass", "Bone China", "Ceramic", "Clock")

I have dataframe

val df = Seq("Title1", "Title2", "glass", "CloCK").toDF("Title)

I want to generate dataframe

Title Flag
Title1 0
Title2 0
glass 1
CloCK 1

My Current Code

val mykeywords: Array[String] = Array("Glass", "Bone China", "Ceramic", "Clock").map(_.toLowerCase())
val df2 = df.withColumn("Flag", lower(col("Title")).rlike(mykeywords.mkString("|")).cast(IntegerType))

Which is currently not working properly for some string matches, Please point out if there is better way

2
  • 1
    Can you provide what is the error or the scenario that you code is not working ?? Commented Dec 4, 2019 at 6:56
  • It says true for match of "wailed" with "led" Commented Dec 4, 2019 at 7:37

3 Answers 3

1

Convert mykeywords into a dataframe too and right join both data frames on values in case insensitive way. Replace null match with 0 and non-null match with 1.

df
.join(right = mykeywordsDf, joinType = "right", joinExprs = lower(df("Title")).equalTo(lower(mykeywordsDf("Title")))
.withColumn(
   "Flag", 
   when(mykeywordsDf("Title").isNull, 0)
    .when(mykeywordsDf("Title").isNotNull, 1)
)
Sign up to request clarification or add additional context in comments.

Comments

0

Your approach is nearly correct just you have to put string literal in rlike like this

df.withColumn("Flag", col("Title").rlike("(?i)^"+ mykeywords.mkString("|")+ "$").cast("Integer")).show()

This above code will work.

1 Comment

If it solves your given problem then accept the answer so that other peeps might get help
0

You can also use inin function to check if column expression is contained in your mykeywords list:

df.withColumn("Flag", 
              lower(col("Title")).isin(mykeywords.map(_.toLowerCase): _*).cast("int")
             ).show()

Output:

+------+----+
| Title|Flag|
+------+----+
|Title1|   0|
|Title2|   0|
| glass|   1|
| CloCK|   1|
+------+----+

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.