Parsers¶
First¶
Get only one element matching the specified xpath. This is the default parser of Element.
<html>
<body>
<p>
AAA
<br>
BBB
<br>
CCCC
</p>
</body>
</html>
el = Element(xpath='//html/body/p/text()', parser=First)
text = el.parse(html)
assert 'AAA' == text
All¶
It converts all elements matching xpath into text and returns it as list.
<html>
<body>
<p>
AAA
<br>
BBB
<br>
CCCC
</p>
</body>
</html>
el = Element(xpath='//html/body/p/text()', parser=All)
texts = el.parse(html)
assert ['AAA', 'BBB', 'CCC'] == texts
ParseTable¶
Parse basic table and return it as list.
<html>
<body>
<table>
<tr>
<th>Company</th>
<th>Contact</th>
<th>Country</th>
</tr>
<tr>
<td>Alfreds Futterkiste</td>
<td>Maria Anders</td>
<td>Germany</td>
</tr>
<tr>
<td>Centro comercial Moctezuma</td>
<td>Francisco Chang</td>
<td>Mexico</td>
</tr>
</table>
</body>
</html>
el = Element(xpath='//html/body/table', parser=ParseTable())
data = el.parse(html)
assert [
['Alfreds Futterkiste', 'Maria Anders', 'Germany'],
['Centro comercial Moctezuma', 'Francisco Chang', 'Mexico'],
] == data
If there is a header in table, passing has_header = True will return dict with the value of header as key.
el = Element(xpath='//html/body/table', parser=ParseTable(has_header=True))
data = el.parse(html)
assert [
{
'Company': 'Alfreds Futterkiste',
'Contact': 'Maria Anders',
'Country': 'Germany',
},
{
'Company': 'Centro comercial Moctezuma',
'Contact': 'Francisco Chang',
'Country': 'Mexico',
},
] == data
ParseList¶
Parse elements such as <ul> and <ol> and return them as list.
<html>
<body>
<ol>
<li>Coffee</li>
<li>Tea</li>
<li>Milk</li>
</ol>
</body>
</html>
el = Element(xpath='//html/body/ol', parser=ParseList())
data = el.parse(html)
assert ['Coffee', 'Tea', 'Milk'] == data
ParseDefinitionList¶
It parses <dl> and returns it as dict.
<html>
<body>
<dl>
<dt>Coffee</dt>
<dd>black hot drink</dd>
<dt>Milk</dt>
<dd>white cold drink</dd>
<dd>white hot drink</dd>
</dl>
</body>
</html>
el = Element(xpath='//html/body/dl', parser=ParseDefinitionList())
data = el.parse(html)
assert {
'Coffee': 'black hot drink',
'Milk': [
'white cold drink',
'white hot drink',
]
} = data