In a data.frame, I have a categorical variable for the language of a text. But, while most texts are only in one language, some have multiple languages. In my data, they appear in the same column, divided by comas:
text = c("Text1", "Text2", "Text3")
lang = c("fr", "en", "fr,en")
d = data.frame(text, lang)
Visually:
text lang
1 Text1 fr
2 Text2 en
3 Text3 fr,en
I'd like to plot the number of texts in each language, with Text3 being counted both in fr and in en.
I found how to split, with:
d$lang <- strsplit(d$lang, ",")
But then I can't find a way to plot it correctly, e.g. with a qplot barplot like this one:
qplot(lang, data=d)
Am I doing it right? Is there a better approach?
qplotlike that and its default plot is a scatter plot. Tryqplot(x=unlist(strsplit(as.character(d$lang), ",")), geom="bar")or for a non-ggplotanswer.barplot(table(unlist(strsplit(as.character(d$lang), ","))))orunlistwhile maintaining other columns of data? In the above example, let's say I also have a third column which I want to keep aligned with lang, is there a way? Maybe by duplicating observations?