Scrapy+splash at the Russian text parsing is a young man.



  • When the Russian text is steamed, the ploy is returned by a young man.

    is the result of the retention in json:

    [ {"name": "3-\\u043a\\u043e\\u043c\\u043d. \\u043a\\u0432\\u0430\\u0440\\u0442\\u0438\\u0440\\u0430, 150 \\u043c\\u00b2"} ]

    Also retained in csv:

    ,name 0,"3-\u043a\u043e\u043c\u043d. \u043a\u0432\u0430\u0440\u0442\u0438\u0440\u0430, 150 \u043c\u00b2"

    (I tried to use...decode and .encode, but it's the same thing)

    I also wrote in a scrapy. FEED_EXPORT_ENCODING = 'utf-8' but it doesn't help.

    With the English text parsing, it's all good, it only happens to the Russians.

    Here's the code.

    class LinkSpider(scrapy.Spider):
        url = 'link'
        name = 'link'
        allowed_domains = ['link']
        start_urls = ['link']
        script = '''
               function main(splash, args)
                 splash.private_mode_enabled = false
                 assert(splash:go(args.url))
                 assert(splash:wait(3))
                 splash:set_viewport_full()
                 return {splash:png(), splash:html()}
               end
           '''
    
    def start_requests(self):
        yield SplashRequest(url=url, callback=self.parse,
                            endpoint='execute', args={'lua_source': self.script})
    
    def parse(self, response):
    
        name = response.xpath('//h1/text()').get()
        df = pd.DataFrame({'name': [name]})
        df.to_csv("result.csv")
        yield {
            "name":name,
        }
    



  • print(u"3-\u043a\u043e\u043c\u043d. \u043a\u0432\u0430\u0440\u0442\u0438\u0440\u0430, 150 \u043c\u00b2")
    

    Code iso-8859

    or it'll help.
    https://ru.stackoverflow.com/questions/1328837/%D0%9A%D0%B0%D0%BA-%D0%BF%D0%B5%D1%80%D0%B5%D0%BA%D0%BE%D0%B4%D0%B8%D1%80%D0%BE%D0%B2%D0%B0%D1%82%D1%8C-%D1%82%D0%B5%D0%BA%D1%81%D1%82-%D1%81%D0%B0%D0%B9%D1%82%D0%B0-%D0%B2-%D0%BA%D0%BE%D0%B4%D0%B8%D1%80%D0%BE%D0%B2%D0%BA%D0%B5-cp1251-%D1%87%D1%82%D0%BE%D0%B1%D1%8B-%D0%BE%D0%BD-%D0%B1%D1%8B%D0%BB-%D1%87%D0%B8%D1%82%D0%B0%D0%B5%D0%BC%D1%8B%D0%BC



Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2