# Break the text on the proposals with the retention of the divider

• It may be limited to the conclusion of the proposal:

"start letter" or "or"?

For example:

"Hi! I'm a simple text. Can you share me?"

['Hi,'I'm a simple text.', 'Can you separate me?'

There was an attempt, but it was a bad one:

re.split(r'\w[.!?]+\s+[А-Я]', "Hello! I'm John. Are you OK? fine... and so")

• It's a gap, but it's used. https://ru.wikipedia.org/wiki/%D0%A0%D0%B5%D0%B3%D1%83%D0%BB%D1%8F%D1%80%D0%BD%D1%8B%D0%B5_%D0%B2%D1%8B%D1%80%D0%B0%D0%B6%D0%B5%D0%BD%D0%B8%D1%8F#.D0.9F.D1.80.D0.BE.D1.81.D0.BC.D0.BE.D1.82.D1.80_.D0.B2.D0.BF.D0.B5.D1.80.D1.91.D0.B4_.D0.B8_.D0.BD.D0.B0.D0.B7.D0.B0.D0.B4 To make sure there's a letter in front of the protein, and...

import re

result = re.split(r'(?<=\w[.!?]) ', "Hello! I'm John. Are you OK? fine... and so")
print (result)

result = re.split(r'(?<=\w[.!?]) ', "Привет! Я простой текст. Ты сможешь разделить меня?")
print (result)

Result:

['Hello!', "I'm John.", 'Are you OK?', 'fine... and so']
['Привет!', 'Я простой текст.', 'Ты сможешь разделить меня?']

P. S. I didn't check on Junicode. Testing. https://repl.it/languages/python3

UPD \w Perhaps to be replaced by the listing of permissible symbols, as these may be letters, figures and sign

2

2

2

2

2

2

2

2

2

2

2

2

2

2

2