Scrapy+splash at the Russian text parsing is a young man.
-
When the Russian text is steamed, the ploy is returned by a young man.
is the result of the retention in json:
[ {"name": "3-\\u043a\\u043e\\u043c\\u043d. \\u043a\\u0432\\u0430\\u0440\\u0442\\u0438\\u0440\\u0430, 150 \\u043c\\u00b2"} ]
Also retained in csv:
,name 0,"3-\u043a\u043e\u043c\u043d. \u043a\u0432\u0430\u0440\u0442\u0438\u0440\u0430, 150 \u043c\u00b2"
(I tried to use...decode and .encode, but it's the same thing)
I also wrote in a scrapy.
FEED_EXPORT_ENCODING = 'utf-8'
but it doesn't help.With the English text parsing, it's all good, it only happens to the Russians.
Here's the code.
class LinkSpider(scrapy.Spider): url = 'link' name = 'link' allowed_domains = ['link'] start_urls = ['link'] script = ''' function main(splash, args) splash.private_mode_enabled = false assert(splash:go(args.url)) assert(splash:wait(3)) splash:set_viewport_full() return {splash:png(), splash:html()} end '''
def start_requests(self): yield SplashRequest(url=url, callback=self.parse, endpoint='execute', args={'lua_source': self.script}) def parse(self, response): name = response.xpath('//h1/text()').get() df = pd.DataFrame({'name': [name]}) df.to_csv("result.csv") yield { "name":name, }
-
print(u"3-\u043a\u043e\u043c\u043d. \u043a\u0432\u0430\u0440\u0442\u0438\u0440\u0430, 150 \u043c\u00b2")
Code iso-8859
or it'll help.
https://ru.stackoverflow.com/questions/1328837/%D0%9A%D0%B0%D0%BA-%D0%BF%D0%B5%D1%80%D0%B5%D0%BA%D0%BE%D0%B4%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D1%82%D1%8C-%D1%82%D0%B5%D0%BA%D1%81%D1%82-%D1%81%D0%B0%D0%B9%D1%82%D0%B0-%D0%B2-%D0%BA%D0%BE%D0%B4%D0%B8%D1%80%D0%BE%D0%B2%D0%BA%D0%B5-cp1251-%D1%87%D1%82%D0%BE%D0%B1%D1%8B-%D0%BE%D0%BD-%D0%B1%D1%8B%D0%BB-%D1%87%D0%B8%D1%82%D0%B0%D0%B5%D0%BC%D1%8B%D0%BC