Import HTML in Excel (Com-object, Powershell)
-
There's HTML-Table. How to import HTML in Excel using COM?
-
It's possible either through Regex (which is more convenient and faster) or through the Com facility. Through the COM, we need to read the whole table and every line, take it to the mass and load it to the ssv. If your table looks like:
<H3>Заголовок</H3> <TABLE ID="table6" BORDER=1 CELLPADDING=2> <TR><TH><B>Condition</B></TH><TH><B>\LogicalDisk(*)\Disk Transfers/sec</B></TH><TH><B>Min</B></TH><TH><B>Avg</B></TH><TH><B>Max</B></TH><TH><B>Hourly Trend</B></TH><TH><B>Std Deviation</B></TH><TH><B>10% of Outliers Removed</B></TH><TH><B>20% of Outliers Removed</B></TH><TH><B>30% of Outliers Removed</B></TH></TR> <TR><TD>No Thresholds</TD><TD>MACHINENAME/C:</TD><TD>1</TD><TD>7</TD><TD>310</TD><TD>0</TD><TD>11</TD><TD>5</TD><TD>5</TD><TD>5</TD></TR> <TR><TD>No Thresholds</TD><TD>MACHINENAME/D:</TD><TD>0</TD><TD>0</TD><TD>0</TD><TD>0</TD><TD>0</TD><TD>0</TD><TD>0</TD><TD>0</TD></TR> <TR><TD>No Thresholds</TD><TD>MACHINENAME/E:</TD><TD>0</TD><TD>24</TD><TD>164</TD><TD>-1</TD><TD>11</TD><TD>22</TD><TD>21</TD><TD>20</TD></TR> <TR><TD>No Thresholds</TD><TD>MACHINENAME/HarddiskVolume5</TD><TD>0</TD><TD>0</TD><TD>2</TD><TD>0</TD><TD>0</TD><TD>0</TD><TD>0</TD><TD>0</TD></TR> </TABLE>
That COM code would look like:
# С помощью IE создаем HTML файл. $oIE = New-Object -ComObject InternetExplorer.Application $oIE.Navigate("file.html") $oHtmlDoc = $oIE.Document
Читаем таблицу по её ID.
$oTable = $oHtmlDoc.getElementByID("table6")
Выгружаем строки в массив.
$oTbody = $oTable.childNodes | Where-Object { $.tagName -eq "tbody" }
$cTrs = $oTbody.childNodes | Where-Object { $.tagName -eq "tr" }Создаем массив из заголовков таблицы
$cThs = $cTrs[0].childNodes | Where-Object { $.tagName -eq "th" }
$cHeaders = @()
foreach ($oTh in $cThs) {
$cHeaders += `
($oTh.childNodes | Where-Object { $.tagName -eq "b" }).innerHTML
}Конвертим строки в массив PS объектов и экспортируем в CSV
$cCsv = @()
foreach ($oTr in $cTrs) {
$cTds = $oTr.childNodes | Where-Object { $_.tagName -eq "td" }
# Пропускаем первую строку (заголовки).
if ([String]::IsNullOrEmpty($cTds)) { continue }
$oRow = New-Object PSObject
for ($i = 0; $i -lt $cHeaders.Count; $i++) {
$oRow | Add-Member -MemberType NoteProperty -Name $cHeaders[$i] `
-Value $cTds[$i].innerHTML
}
$cCsv += $oRow
}Закрываем IE
$oIE.Quit()
Экспорт в CSV.
$cCsv | Export-Csv -Path "file.csv" -NoTypeInformation