I have a Spark DataFrame, which looks like this:
+--------------------+------+----------------+-----+--------+
| Name | Sex| Ticket |Cabin|Embarked|
+--------------------+------+----------------+-----+--------+
|Braund, Mr. Owen ...| male| A/5 21171| null| S|
|Cumings, Mrs. Joh...|female| PC 17599| C85| C|
|Heikkinen, Miss. ...|female|STON/O2. 3101282| null| S|
|Futrelle, Mrs. Ja...|female| 113803| C123| S|
|Palsson, Master. ...| male| 349909| null| S|
+--------------------+------+----------------+-----+--------+
Now I need to filter the 'Name' column such that it contains only the title -i.e. Mr., Mrs., Miss., Master. So the resulting column would be:
+--------------------+------+----------------+-----+--------+
| Name | Sex| Ticket |Cabin|Embarked|
+--------------------+------+----------------+-----+--------+
|Mr. | male| A/5 21171| null| S|
|Mrs. |female| PC 17599| C85| C|
|Miss. |female|STON/O2. 3101282| null| S|
|Mrs. |female| 113803| C123| S|
|Master. | male| 349909| null| S|
+--------------------+------+----------------+-----+--------+
I tried to apply sub-string operation:
List<String> list = Arrays.asList("Mr.","Mrs.", "Mrs.","Master.");
Dataset<Row> categoricalDF2 = categoricalDF.filter(col("Name").isin(list.stream().toArray(String[]::new)));
but it seems it's not that easy in Java. How can do it in Java. Please note that I'm using Spark 2.2.0.