Python 3.x errors



  • Hello, everyone. I thought I'd write a bunch of pie-bags with one table on the python. There's a utf-8 encryption in the "meta" page. I'm counting all the data I need from this table, but the Russian symbols are recorded in an unknown abracadabra. Here's the software code:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    # vim:fileencoding=utf-8
    import lxml.html as html
    import requests
    page = requests.get('https://org.mephi.ru/pupil-rating/get-rating/entity/4575/original/no')
    tree = html.fromstring(page.content)
    range_list = tree.xpath('//tr[@class="trPosBen"]/td[1]/text()')
    unique_list=tree.xpath('//tr[@class="trPosBen"]/td[3]/text()')
    fio_list=tree.xpath('//tr[@class="trPosBen"]/td[4]/text()')
    hostel_list=tree.xpath('//tr[@class="trPosBen"]/td[5]/text()')
    score_list=tree.xpath('//tr[@class="trPosBen"]/td[6]/span[1]/text()')
    sum_score_list=tree.xpath('//tr[@class="trPosBen"]/td[7]/text()')
    docs_list=tree.xpath('//tr[@class="trPosBen"]/td[8]/text()')
    

    Then I combine all these lists in 'result_list' to get the table. When the FR is removed, all the work shall be done without error, but the Russian symbols shall be produced as follows: Pound ̧ ̧ In an attempt to record this tablist in the text file, there's a mistake in coding:

    Traceback (most recent call last):
      File "C:/Users/Vasiiil/PycharmProjects/untitled/HelloWorld.py", line 55, in <module>
      f.write(str(i[j]) + " ")
    File "C:\Program Files (x86)\Python35-32\lib\encodings\cp1251.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
    UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-7: character maps to <undefined>
    

    After adding the "encoding='utf-8's parameter to the file variable, there's no mistake. But in the file, the same abracacacabra is recorded: ̧ ̧ ̧ ̧ ̧ ̧ ̧ ̧ ̧ ̧ ̄ Help me, please. I've been on the Internet for the third day looking for a solution to this problem, but I haven't found anything.



  • Probe.

    html.fromstring(page.raw.read().decode('utf-8'))
    

    or

    page.encoding = 'utf-8'
    html.fromstring(page.text)
    



Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2