1

I want to replace some characters in a vector of maps with a text in it.

This should be a part of a bigger program wich counts all the words in a list of texts.

The input-vector looks like this:

[{:text "bla. Bla! Blabla, foo"}
   {:text "hello foo? bla Foo, blabla"}
   {:text "bla blub Foo Bla blub"}]

The output should looks like this and should be sorted on the values:

{:bla 3 :Bla 2 :blub 2 :foo 2 :Foo 2 ... } 

But first I want too clean the strings from some characters.

I tried it with map but I don't understand why this code is not working right:

(defn clean-texts []
  (map (fn [x] (clojure.string/replace x #"[.,]" "")) (:text texts)))

The whole code looks like this:

(ns keyword-finder.core
  (:gen-class))

(def texts
  [{:text "bla. Bla! Blabla, foo"}
   {:text "hello foo? bla Foo, blabla"}
   {:text "bla blub Foo Bla blub"}])

(defn clean-texts []
  (map (fn [x] (clojure.string/replace x #"[.,]" "")) (:text texts))
)
4
  • 2
    When you pose a question about code that does not work, it helps immensely if you take the time to describe what you expect to happen, and what's happening instead. Commented Oct 12, 2014 at 13:59
  • Sorry I edited right now Commented Oct 12, 2014 at 14:11
  • What results do you get from running clean-texts and why are they incorrect? Commented Oct 12, 2014 at 14:20
  • I get a empty lazy-seq. Commented Oct 12, 2014 at 14:25

2 Answers 2

5

What you want is something like this:

(defn tokenize [s]
  (-> s
    (.replaceAll "[^a-zA-Z\\s]" "")
    (clojure.string/split #" ")))

This removes all non-letters from a string, so when applied to "bla. blah, blah" it will give you "bla blah blah"

(defn word-counts [texts]
  (let [tokens
    (->> texts
        (map (comp tokenize :text))
        (apply concat)
        (map keyword))]
   (frequencies tokens)))

This function extracts the values for the key :text from your map, applies tokenize to all resulting strings, concatenates them into a list of words, converts them into keywords, and finally returns the keyword counts using the built-in function frequencies.

(word-counts texts)

produces {:bla 3, :Bla 2, :Blabla 1, :foo 2, :hello 1, :Foo 2, :blabla 1, :blub 2}

Sign up to request clarification or add additional context in comments.

2 Comments

Yes but the regex should be "[a-zA-z]\\s". But still thank you
And to make it perfect how can I sort it by values but still returning the same result? Do I have to write a own comparator?
4

You're applying map to the wrong sequence:

(:text texts)

returns nil since :text is applied to the whole texts list.

What you probably wanted to do was to map the inner function on the whole texts list, while extracting :text for each element:

(defn clean-texts []
     (map (fn [x] (clojure.string/replace (:text x) #"[.,]" "")) texts))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.