Python 3.x errors
-
Hello, everyone. I thought I'd write a bunch of pie-bags with one table on the python. There's a utf-8 encryption in the "meta" page. I'm counting all the data I need from this table, but the Russian symbols are recorded in an unknown abracadabra. Here's the software code:
#!/usr/bin/env python # -*- coding: utf-8 -*- # vim:fileencoding=utf-8 import lxml.html as html import requests page = requests.get('https://org.mephi.ru/pupil-rating/get-rating/entity/4575/original/no') tree = html.fromstring(page.content) range_list = tree.xpath('//tr[@class="trPosBen"]/td[1]/text()') unique_list=tree.xpath('//tr[@class="trPosBen"]/td[3]/text()') fio_list=tree.xpath('//tr[@class="trPosBen"]/td[4]/text()') hostel_list=tree.xpath('//tr[@class="trPosBen"]/td[5]/text()') score_list=tree.xpath('//tr[@class="trPosBen"]/td[6]/span[1]/text()') sum_score_list=tree.xpath('//tr[@class="trPosBen"]/td[7]/text()') docs_list=tree.xpath('//tr[@class="trPosBen"]/td[8]/text()')
Then I combine all these lists in 'result_list' to get the table. When the FR is removed, all the work shall be done without error, but the Russian symbols shall be produced as follows: Pound ̧ ̧ In an attempt to record this tablist in the text file, there's a mistake in coding:
Traceback (most recent call last): File "C:/Users/Vasiiil/PycharmProjects/untitled/HelloWorld.py", line 55, in <module> f.write(str(i[j]) + " ") File "C:\Program Files (x86)\Python35-32\lib\encodings\cp1251.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-7: character maps to <undefined>
After adding the "encoding='utf-8's parameter to the file variable, there's no mistake. But in the file, the same abracacacabra is recorded: ̧ ̧ ̧ ̧ ̧ ̧ ̧ ̧ ̧ ̧ ̄ Help me, please. I've been on the Internet for the third day looking for a solution to this problem, but I haven't found anything.
-
Probe.
html.fromstring(page.raw.read().decode('utf-8'))
or
page.encoding = 'utf-8' html.fromstring(page.text)