Determine the number of texts in which the word
-
Texts:
text1 = "Шла Саша по шоссе" text2 = "Ехал Грека через реку" text3 = "Где труд там и счастье" text4 = "Доброта и труд рядом живут" text5 = "Без труда не выловишь и рыбку из пруда"
A list of the following vocabularies should be established:
result = [{"word": "труд", "count": 3, "id": [2, 3, 4]}, {"word": "доброта", "count": 1, "id": [3]}, ..... ]
Specify how to calculate the number of repetitions and the text in which it is used, given that in the end, the words should be in the original form (i.e., not the labor, but the labor).
There is an example of the function:
def f(texts, word): res = {"word":word, "count":0, "id":[]}
for key, value in texts.items(): n = value.count(word) if n: res["count"] += n res["id"].append(key) return res
But there's only a word counting, which means that two vocabularies will be created for the word "truck."
-
Such an option would be achieved (specially not reduced):
texts = [ "Шла Саша по шоссе", "Ехал Грека через реку", "Где труд там и счастье", "Доброта и труд рядом живут", "Без труда не выловишь и рыбку из пруда", ]
tmp = dict()
for text in enumerate(texts):
for word in text[1].split():
if word not in tmp:
tmp[word] = {"word": word, "count": 1, "id": [text[0]]}
else:
tmp[word]["count"] += 1
tmp[word]["id"] += [text[0]]res = []
for key in tmp:
tmp[key]["id"].sort()
res.append(tmp[key])print(res)
It's a little shorter.
tmp = dict()
for text in enumerate(texts):
for word in text[1].split():
if word not in tmp:
tmp[word] = {"word": word, "count": 1, "id": [text[0]]}
else:
tmp[word]["count"] += 1
tmp[word]["id"] = sorted(tmp[word]["id"] + [text[0]])res = list(tmp.values())
print(res)
If you need to sort the bulb, you can use this code:
res = sorted(list(tmp.values()), key=lambda obj: -obj['count'])