指南

基于原版本1.4.3翻译

PyQuery文档

pyquery 允许您对 xml 文档进行 jquery 查询。 API 尽可能地类似于 jquerypyquery 使用 lxml 包 进行快速 xmlhtml 操作。

这不是(或至少现在还不是)生成或与 javascript 代码交互的库。 我只是喜欢 jquery API,但我在 python 中错过了它,所以我告诉自己“嘿,让我们在 python 中制作 jquery”。 这就是结果。

项目正在 Github 上的 git 存储库上积极开发。 我的政策是向任何想要它的人提供推送访问权限,然后审查他们所做的事情。 因此,如果您想贡献,请给我发电子邮件。

请在 github 问题跟踪器上报告错误。

我花了好几个小时维护这个软件,充满爱意。 如果您喜欢,请考虑给小费:

  • BTC: 1PruQAwByDndFZ7vTeJhyWefAghaZx9RZg
  • ETH: 0xb6418036d8E06c60C4D91c17d72Df6e1e5b15CE6
  • LTC: LY6CdZcDbxnBX9GFBJ45TqVj8NykBBqsmT

快速开始

您可以使用 PyQuery 类从字符串lxml 文档文件url 加载 xml 文档:

>>> from pyquery import PyQuery as pq
>>> from lxml import etree
>>> import urllib
>>> d = pq("<html></html>")
>>> d = pq(etree.fromstring("<html></html>"))
>>> d = pq(url=your_url)
>>> d = pq(url=your_url,
...        opener=lambda url, **kw: urlopen(url).read())
>>> d = pq(filename=path_to_html_file)

现在 d 就像 jquery 中的 $

>>> d("#hello")
[<p#hello.hello>]
>>> p = d("#hello")
>>> print(p.html())
Hello world !
>>> p.html("you know <a href='http://python.org/'>Python</a> rocks")
[<p#hello.hello>]
>>> print(p.html())
you know <a href="http://python.org/">Python</a> rocks
>>> print(p.text())
you know Python rocks

您可以使用一些在 jQuery 中可用但在 css 中不标准的伪类,例如 :first :last :even :odd :eq :lt :gt :checked :selected :file:

>>> d('p:first')
[<p#hello.hello>]

属性

使用属性选择特定标签在属性选择器中,值应该是有效的 CSS 标识符或引用为字符串:

>>> d = pq("<option value='1'><option value='2'>")
>>> d('option[value="1"]')
[<option>]

您可以使用 jquery API 使用属性:

>>> p = pq('<p id="hello" class="hello"></p>')('p')
>>> p.attr("id")
'hello'
>>> p.attr("id", "plop")
[<p#plop.hello>]
>>> p.attr("id", "hello")
[<p#hello.hello>]

或者以更 Pythonic 的方式:

>>> p.attr.id = "plop"
>>> p.attr.id
'plop'
>>> p.attr["id"] = "ola"
>>> p.attr["id"]
'ola'
>>> p.attr(id='hello', class_='hello2')
[<p#hello.hello2>]
>>> p.attr.class_
'hello2'
>>> p.attr.class_ = 'hello'

css

你可以使用css类:

>>> p.addClass("toto")
[<p#hello.hello.toto>]
>>> p.toggleClass("titi toto")
[<p#hello.hello.titi>]
>>> p.removeClass("titi")
[<p#hello.hello>]

或 css 样式:

>>> p.css("font-size", "15px")
[<p#hello.hello>]
>>> p.attr("style")
'font-size: 15px'
>>> p.css({"font-size": "17px"})
[<p#hello.hello>]
>>> p.attr("style")
'font-size: 17px'

pythonic 方式相同( '_' 字符被翻译成 '-' ):

>>> p.css.font_size = "16px"
>>> p.attr.style
'font-size: 16px'
>>> p.css['font-size'] = "15px"
>>> p.attr.style
'font-size: 15px'
>>> p.css(font_size="16px")
[<p#hello.hello>]
>>> p.attr.style
'font-size: 16px'
>>> p.css = {"font-size": "17px"}
>>> p.attr.style
'font-size: 17px'

使用伪类

:button

Matches all button input elements and the button element:

匹配所有 type=button类型的 input元素 和 button 元素:

>>> from pyquery import PyQuery
>>> d = PyQuery(('<div><input type="button"/>'
...      '<button></button></div>'))
>>> d(':button')
[<input>, <button>]

:checkbox

Matches all checkbox input elements:

匹配所有复选框 input 元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><input type="checkbox"/></div>')
>>> d('input:checkbox')
[<input>]

:checked

Matches all checked input elements:

匹配所有被选中的 input 元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><input checked="checked"/></div>')
>>> d('input:checked')
[<input>]

:child

right is an immediate child of left

right 是 left 的直接孩子

:contains

Matches all elements that contain the given text

匹配包含给定文本的所有元素

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><h1/><h1 class="title">title</h1></div>')
>>> d('h1:contains("title")')
[<h1.title>]

:descendant

right is a child, grand-child or further descendant of left

rightleft 的孩子、孙子或进一步的后代

:disabled

Matches all elements that are disabled:

匹配所有被禁用的元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><input disabled="disabled"/></div>')
>>> d('input:disabled')
[<input>]

:empty

Match all elements that do not contain other elements:

匹配所有不包含其他元素的元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><h1><span>title</span></h1><h2/></div>')
>>> d(':empty')
[<h2>]

:enabled

Matches all elements that are enabled:

匹配所有启用的元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><input value="foo" /></div>')
>>> d('input:enabled')
[<input>]

:eq()

Matches a single element by its index:

按索引匹配单个元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><h1 class="first"/><h1 class="last"/></div>')
>>> d('h1:eq(0)')
[<h1.first>]
>>> d('h1:eq(1)')
[<h1.last>]

:even

Matches even elements, zero-indexed:

匹配偶数元素,零索引:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><p></p><p class="last"></p></div>')
>>> d('p:even')
[<p>]

:file

Matches all input elements of type file:

匹配文件类型的所有input元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><input type="file"/></div>')
>>> d('input:file')
[<input>]

:first

Matches the first selected element:

匹配第一个选定的元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><p class="first"></p><p></p></div>')
>>> d('p:first')
[<p.first>]

:gt()

Matches all elements with an index over the given one:

匹配索引超过给定元素的所有元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><h1 class="first"/><h1 class="last"/></div>')
>>> d('h1:gt(0)')
[<h1.last>]

:has()

Matches elements which contain at least one element that matches the specified selector. https://api.jquery.com/has-selector/

匹配至少包含一个与指定选择器匹配的元素的元素。参考: https://api.jquery.com/has-selector/

>>> from pyquery import PyQuery
>>> d = PyQuery('<div class="foo"><div class="bar"></div></div>')
>>> d('.foo:has(".baz")')
[]
>>> d('.foo:has(".foo")')
[]
>>> d('.foo:has(".bar")')
[<div.foo>]
>>> d('.foo:has(div)')
[<div.foo>]

Matches all header elelements (h1, …, h6):

匹配所有标题元素(h1,...,h6):

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><h1>title</h1></div>')
>>> d(':header')
[<h1>]

:hidden

Matches all hidden input elements:

匹配所有隐藏的input元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><input type="hidden"/></div>')
>>> d('input:hidden')
[<input>]

:image

Matches all image input elements:

匹配所有 type=imageinput 元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><input type="image"/></div>')
>>> d('input:image')
[<input>]

:input

Matches all input elements:

匹配所有input元素:

>>> from pyquery import PyQuery
>>> d = PyQuery(('<div><input type="file"/>'
...         '<textarea></textarea></div>'))
>>> d(':input')
[<input>, <textarea>]

:last

Matches the last selected element:

匹配最后选择的元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><p></p><p class="last"></p></div>')
>>> d('p:last')
[<p.last>]

:lt()

Matches all elements with an index below the given one:

匹配索引低于给定索引的所有元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><h1 class="first"/><h1 class="last"/></div>')
>>> d('h1:lt(1)')
[<h1.first>]

:odd

Matches odd elements, zero-indexed:

匹配奇数元素,零索引:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><p></p><p class="last"></p></div>')
>>> d('p:odd')
[<p.last>]

:parent

Match all elements that contain other elements:

匹配所有包含其他元素的元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><h1><span>title</span></h1><h1/></div>')
>>> d('h1:parent')
[<h1>]

:password

Matches all password input elements:

匹配所有密码输入元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><input type="password"/></div>')
>>> d('input:password')
[<input>]

:pseudo

Translate a pseudo-element.

翻译一个伪元素。

Defaults to not supporting pseudo-elements at all, but can be overridden by sub-classes

默认根本不支持伪元素,但可以被子类覆盖

:radio

Matches all radio input elements:

匹配所有单选输入元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><input type="radio"/></div>')
>>> d('input:radio')
[<input>]

reset

Matches all reset input elements:

匹配所有复位输入元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><input type="reset"/></div>')
>>> d('input:reset')
[<input>]

selected

Matches all elements that are selected:

匹配所有被选中的元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<select><option selected="selected"/></select>')
>>> d('option:selected')
[<option>]

submit

Matches all submit input elements:

匹配所有提交输入元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><input type="submit"/></div>')
>>> d('input:submit')
[<input>]

text

Matches all text input elements:

匹配所有文本输入元素:

>>> from pyquery import PyQuery
>>> d = PyQuery('<div><input type="text"/></div>')
>>> d('input:text')
[<input>]

控制内容

您还可以在标签末尾添加内容:

>>> d = pq('<p class="hello" id="hello">you know Python rocks</p>')
>>> d('p').append(' check out <a href="http://reddit.com/r/python"><span>reddit</span></a>')
[<p#hello.hello>]
>>> print(d)
<p class="hello" id="hello">you know Python rocks check out <a href="http://reddit.com/r/python"><span>reddit</span></a></p>

或从头开始:

>>> p = d('p')
>>> p.prepend('check out <a href="http://reddit.com/r/python">reddit</a>')
[<p#hello.hello>]
>>> print(p.html())
check out <a href="http://reddit.com/r/python">reddit</a>you know ...

将一个元素添加或附加到另一个元素中:

>>> d = pq('<html><body><div id="test"><a href="http://python.org">python</a> !</div></body></html>')
p.prependTo(d('#test'))
[<p#hello.hello>]
print(d('#test').html())
<p class="hello" ...

在一个元素后面插入一个元素

>>> p.insertAfter(d('#test'))
[<p#hello.hello>]
>>> print(d('#test').html())
<a href="http://python.org">python</a> !

或之前:

>>> p.insertBefore(d('#test'))
[<p#hello.hello>]
>>> print(d('body').html())
<p class="hello" id="hello">...

为每个元素做某些事:

>>> p.each(lambda i, e: pq(e).addClass('hello2'))
[<p#hello.hello.hello2>]

删除一个元素:

>>> d = pq('<html><body><p id="id">Yeah!</p><p>python rocks !</p></div></html>')
>>> d.remove('p#id')
[<html>]
>>> d('p#id')
[]

清空元素内容

>>> d('p').empty()
[<p>]

你可以取回修改后的html:

>>> print(d)
<html><body><p/></body></html>

你可以生成 html :

>>> from pyquery import PyQuery as pq
>>> print(pq('<div>Yeah !</div>').addClass('myclass') + pq('<b>cool</b>'))
<div class="myclass">Yeah !</div><b>cool</b>

删除所有命名空间:

>>> d = pq('<foo xmlns="http://example.com/foo"></foo>')
>>> d
[<{http://example.com/foo}foo>]
>>> d.remove_namespaces()
[<foo>]

连调

支持一些 jQuery 遍历方法。 这里有一些例子。

您可以使用字符串选择器过滤选择列表:

>>> d = pq('<p id="hello" class="hello"><a/></p><p id="test"><a/></p>')
>>> d('p').filter('.hello')
[<p#hello.hello>]

可以使用 eq 选择单个元素:

>>> d('p').eq(0)
[<p#hello.hello>]

您可以找到嵌套元素:

>>> d('p').find('a')
[<a>, <a>]
>>> d('p').eq(1).find('a')
[<a>]

使用 end 也支持突破一个遍历级别:

>>> d('p').find('a').end()
[<p#hello.hello>, <p#test>]
>>> d('p').eq(0).end()
[<p#hello.hello>, <p#test>]
>>> d('p').filter(lambda i: i == 1).end()
[<p#hello.hello>, <p#test>]

如果你想选择一个 '.' ,如果你需要转义 '.'

>>> d = pq('<p id="hello.you"><a/></p><p id="test"><a/></p>')
>>> d('#hello\.you')
[<p#hello.you>]
<title>pyquery – PyQuery complete API &#8212; pyquery 1.4.3 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/nature.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Scraping" href="scrap.html" />
<link rel="prev" title="Traversing" href="traversing.html" /> 
<div class="document">
  <div class="documentwrapper">
    <div class="bodywrapper">
      <div class="body" role="main">

pyquery – PyQuery complete API

class pyquery.pyquery.PyQuery(*args, **kwargs)[source]

The main class

class Fn[source]

Hook for defining custom function (like the jQuery.fn):

>>> fn = lambda: this.map(lambda i, el: PyQuery(this).outerHtml())
>>> PyQuery.fn.listOuterHtml = fn
>>> S = PyQuery(
...   '<ol>   <li>Coffee</li>   <li>Tea</li>   <li>Milk</li>   </ol>')
>>> S('li').listOuterHtml()
['<li>Coffee</li>', '<li>Tea</li>', '<li>Milk</li>']
addClass(value)

Alias for add_class()

add_class(value)[source]

Add a css class to elements:

>>> d = PyQuery('<div></div>')
>>> d.add_class('myclass')
[<div.myclass>]
>>> d.addClass('myclass')
[<div.myclass>]
after(value)[source]

add value after nodes

append(value)[source]

append value to each nodes

appendTo(value)

Alias for append_to()

append_to(value)[source]

append nodes to value

property base_url

Return the url of current html document or None if not available.

before(value)[source]

insert value before nodes

children(selector=None)[source]

Filter elements that are direct children of self using optional selector:

>>> d = PyQuery('<span><p class="hello">Hi</p><p>Bye</p></span>')
>>> d
[<span>]
>>> d.children()
[<p.hello>, <p>]
>>> d.children('.hello')
[<p.hello>]
clone()[source]

return a copy of nodes

closest(selector=None)[source]
>>> d = PyQuery(
...  '<div class="hello"><p>This is a '
...  '<strong class="hello">test</strong></p></div>')
>>> d('strong').closest('div')
[<div.hello>]
>>> d('strong').closest('.hello')
[<strong.hello>]
>>> d('strong').closest('form')
[]
contents()[source]

Return contents (with text nodes):

>>> d = PyQuery('hello <b>bold</b>')
>>> d.contents()  
['hello ', <Element b at ...>]
each(func)[source]

apply func on each nodes

empty()[source]

remove nodes content

property encoding

return the xml encoding of the root element

end()[source]

Break out of a level of traversal and return to the parent level.

>>> m = '<p><span><em>Whoah!</em></span></p><p><em> there</em></p>'
>>> d = PyQuery(m)
>>> d('p').eq(1).find('em').end().end()
[<p>, <p>]
eq(index)[source]

Return PyQuery of only the element with the provided index:

>>> d = PyQuery('<p class="hello">Hi</p><p>Bye</p><div></div>')
>>> d('p').eq(0)
[<p.hello>]
>>> d('p').eq(1)
[<p>]
>>> d('p').eq(2)
[]
extend(other)[source]

Extend with anoter PyQuery object

filter(selector)[source]

Filter elements in self using selector (string or function):

>>> d = PyQuery('<p class="hello">Hi</p><p>Bye</p>')
>>> d('p')
[<p.hello>, <p>]
>>> d('p').filter('.hello')
[<p.hello>]
>>> d('p').filter(lambda i: i == 1)
[<p>]
>>> d('p').filter(lambda i: PyQuery(this).text() == 'Hi')
[<p.hello>]
>>> d('p').filter(lambda i, this: PyQuery(this).text() == 'Hi')
[<p.hello>]
find(selector)[source]

Find elements using selector traversing down from self:

>>> m = '<p><span><em>Whoah!</em></span></p><p><em> there</em></p>'
>>> d = PyQuery(m)
>>> d('p').find('em')
[<em>, <em>]
>>> d('p').eq(1).find('em')
[<em>]
hasClass(name)

Alias for has_class()

has_class(name)[source]

Return True if element has class:

>>> d = PyQuery('<div class="myclass"></div>')
>>> d.has_class('myclass')
True
>>> d.hasClass('myclass')
True
height(value=<NoDefault>)[source]

set/get height of element

hide()[source]

Remove display:none to elements style:

>>> print(PyQuery('<div style="display:none;"/>').hide())
<div style="display: none"/>
html(value=<NoDefault>, **kwargs)[source]

Get or set the html representation of sub nodes.

Get the text value:

>>> d = PyQuery('<div><span>toto</span></div>')
>>> print(d.html())
<span>toto</span>

Extra args are passed to lxml.etree.tostring:

>>> d = PyQuery('<div><span></span></div>')
>>> print(d.html())
<span/>
>>> print(d.html(method='html'))
<span></span>

Set the text value:

>>> d.html('<span>Youhou !</span>')
[<div>]
>>> print(d)
<div><span>Youhou !</span></div>
insertAfter(value)

Alias for insert_after()

insertBefore(value)

Alias for insert_before()

insert_after(value)[source]

insert nodes after value

insert_before(value)[source]

insert nodes before value

is_(selector)[source]

Returns True if selector matches at least one current element, else False:

>>> d = PyQuery('<p class="hello"><span>Hi</span></p><p>Bye</p>')
>>> d('p').eq(0).is_('.hello')
True
>>> d('p').eq(0).is_('span')
False
>>> d('p').eq(1).is_('.hello')
False
items(selector=None)[source]

Iter over elements. Return PyQuery objects:

>>> d = PyQuery('<div><span>foo</span><span>bar</span></div>')
>>> [i.text() for i in d.items('span')]
['foo', 'bar']
>>> [i.text() for i in d('span').items()]
['foo', 'bar']
>>> list(d.items('a')) == list(d('a').items())
True

Make all links absolute.

map(func)[source]

Returns a new PyQuery after transforming current items with func.

func should take two arguments - ‘index’ and ‘element’. Elements can also be referred to as ‘this’ inside of func:

>>> d = PyQuery('<p class="hello">Hi there</p><p>Bye</p><br />')
>>> d('p').map(lambda i, e: PyQuery(e).text())
['Hi there', 'Bye']

>>> d('p').map(lambda i, e: len(PyQuery(this).text())) [8, 3]

>>> d('p').map(lambda i, e: PyQuery(this).text().split()) ['Hi', 'there', 'Bye']

nextAll(selector=None)

Alias for next_all()

next_all(selector=None)[source]
>>> h = '<span><p class="hello">Hi</p><p>Bye</p><img scr=""/></span>'
>>> d = PyQuery(h)
>>> d('p:last').next_all()
[<img>]
>>> d('p:last').nextAll()
[<img>]
not_(selector)[source]

Return elements that don’t match the given selector:

>>> d = PyQuery('<p class="hello">Hi</p><p>Bye</p><div></div>')
>>> d('p').not_('.hello')
[<p>]
outerHtml(method='html')

Alias for outer_html()

outer_html(method='html')[source]

Get the html representation of the first selected element:

>>> d = PyQuery('<div><span class="red">toto</span> rocks</div>')
>>> print(d('span'))
<span class="red">toto</span> rocks
>>> print(d('span').outer_html())
<span class="red">toto</span>
>>> print(d('span').outerHtml())
<span class="red">toto</span>

>>> S = PyQuery('<p>Only <b>me</b> & myself</p>') >>> print(S('b').outer_html()) <b>me</b>

parents(selector=None)[source]
>>> d = PyQuery('<span><p class="hello">Hi</p><p>Bye</p></span>')
>>> d('p').parents()
[<span>]
>>> d('.hello').parents('span')
[<span>]
>>> d('.hello').parents('p')
[]
prepend(value)[source]

prepend value to nodes

prependTo(value)

Alias for prepend_to()

prepend_to(value)[source]

prepend nodes to value

prevAll(selector=None)

Alias for prev_all()

prev_all(selector=None)[source]
>>> h = '<span><p class="hello">Hi</p><p>Bye</p><img scr=""/></span>'
>>> d = PyQuery(h)
>>> d('p:last').prev_all()
[<p.hello>]
>>> d('p:last').prevAll()
[<p.hello>]
remove(expr=<NoDefault>)[source]

Remove nodes:

>>> h = (
... '<div>Maybe <em>she</em> does <strong>NOT</strong> know</div>'
... )
>>> d = PyQuery(h)
>>> d('strong').remove()
[<strong>]
>>> print(d)
<div>Maybe <em>she</em> does   know</div>
removeAttr(name)

Alias for remove_attr()

removeClass(value)

Alias for remove_class()

remove_attr(name)[source]

Remove an attribute:

>>> d = PyQuery('<div id="myid"></div>')
>>> d.remove_attr('id')
[<div>]
>>> d.removeAttr('id')
[<div>]
remove_class(value)[source]

Remove a css class to elements:

>>> d = PyQuery('<div class="myclass"></div>')
>>> d.remove_class('myclass')
[<div>]
>>> d.removeClass('myclass')
[<div>]
remove_namespaces()[source]

Remove all namespaces:

>>> doc = PyQuery('<foo xmlns="http://example.com/foo"></foo>')
>>> doc
[<{http://example.com/foo}foo>]
>>> doc.remove_namespaces()
[<foo>]
replaceAll(expr)

Alias for replace_all()

replaceWith(value)

Alias for replace_with()

replace_all(expr)[source]

replace nodes by expr

replace_with(value)[source]

replace nodes by value:

>>> doc = PyQuery("<html><div /></html>")
>>> node = PyQuery("<span />")
>>> child = doc.find('div')
>>> child.replace_with(node)
[<div>]
>>> print(doc)
<html><span/></html>
property root

return the xml root element

serialize()[source]

Serialize form elements as a URL-encoded string.

>>> h = (
... '<form><input name="order" value="spam">'
... '<input name="order2" value="baked beans"></form>'
... )
>>> d = PyQuery(h)
>>> d.serialize()
'order=spam&order2=baked%20beans'
serializeArray()

Alias for serialize_array()

serializeDict()

Alias for serialize_dict()

serializePairs()

Alias for serialize_pairs()

serialize_array()[source]

Serialize form elements as an array of dictionaries, whose structure mirrors that produced by the jQuery API. Notably, it does not handle the deprecated keygen form element.

>>> d = PyQuery('<form><input name="order" value="spam"></form>')
>>> d.serialize_array() == [{'name': 'order', 'value': 'spam'}]
True
>>> d.serializeArray() == [{'name': 'order', 'value': 'spam'}]
True
serialize_dict()[source]

Serialize form elements as an ordered dictionary. Multiple values corresponding to the same input name are concatenated into one list.

>>> d = PyQuery('''<form>
...             <input name="order" value="spam">
...             <input name="order" value="eggs">
...             <input name="order2" value="ham">
...             </form>''')
>>> d.serialize_dict()
OrderedDict([('order', ['spam', 'eggs']), ('order2', 'ham')])
>>> d.serializeDict()
OrderedDict([('order', ['spam', 'eggs']), ('order2', 'ham')])
serialize_pairs()[source]

Serialize form elements as an array of 2-tuples conventional for typical URL-parsing operations in Python.

>>> d = PyQuery('<form><input name="order" value="spam"></form>')
>>> d.serialize_pairs()
[('order', 'spam')]
>>> d.serializePairs()
[('order', 'spam')]
show()[source]

Add display:block to elements style:

>>> print(PyQuery('<div />').show())
<div style="display: block"/>
siblings(selector=None)[source]
>>> h = '<span><p class="hello">Hi</p><p>Bye</p><img scr=""/></span>'
>>> d = PyQuery(h)
>>> d('.hello').siblings()
[<p>, <img>]
>>> d('.hello').siblings('img')
[<img>]
text(value=<NoDefault>, **kwargs)[source]

Get or set the text representation of sub nodes.

Get the text value:

>>> doc = PyQuery('<div><span>toto</span><span>tata</span></div>')
>>> print(doc.text())
tototata
>>> doc = PyQuery('''<div><span>toto</span>
...               <span>tata</span></div>''')
>>> print(doc.text())
toto tata

Get the text value, without squashing newlines:

>>> doc = PyQuery('''<div><span>toto</span>
...               <span>tata</span></div>''')
>>> print(doc.text(squash_space=False))
toto
tata

Set the text value:

>>> doc.text('Youhou !')
[<div>]
>>> print(doc)
<div>Youhou !</div>
toggleClass(value)

Alias for toggle_class()

toggle_class(value)[source]

Toggle a css class to elements

>>> d = PyQuery('<div></div>')
>>> d.toggle_class('myclass')
[<div.myclass>]
>>> d.toggleClass('myclass')
[<div>]
val(value=<NoDefault>)[source]

Set the attribute value:

>>> d = PyQuery('<input />')
>>> d.val('Youhou')
[<input>]

Get the attribute value:

>>> d.val()
'Youhou'

Set the selected values for a select element with the multiple attribute:

>>> d = PyQuery('''
...             <select multiple>
...                 <option value="you"><option value="hou">
...             </select>
...             ''')
>>> d.val(['you', 'hou'])
[<select>]

Get the selected values for a select element with the multiple attribute:

>>> d.val()
['you', 'hou']
width(value=<NoDefault>)[source]

set/get width of element

wrap(value)[source]

A string of HTML that will be created on the fly and wrapped around each target:

>>> d = PyQuery('<span>youhou</span>')
>>> d.wrap('<div></div>')
[<div>]
>>> print(d)
<div><span>youhou</span></div>
wrapAll(value)

Alias for wrap_all()

wrap_all(value)[source]

Wrap all the elements in the matched set into a single wrapper element:

>>> d = PyQuery('<div><span>Hey</span><span>you !</span></div>')
>>> print(d('span').wrap_all('<div id="wrapper"></div>'))
<div id="wrapper"><span>Hey</span><span>you !</span></div>

>>> d = PyQuery('<div><span>Hey</span><span>you !</span></div>') >>> print(d('span').wrapAll('<div id="wrapper"></div>')) <div id="wrapper"><span>Hey</span><span>you !</span></div>

xhtml_to_html()[source]

Remove xhtml namespace:

>>> doc = PyQuery(
...         '<html xmlns="http://www.w3.org/1999/xhtml"></html>')
>>> doc
[<{http://www.w3.org/1999/xhtml}html>]
>>> doc.xhtml_to_html()
[<html>]
        <div class="clearer"></div>
      </div>
    </div>
  </div>
  <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
    <div class="sphinxsidebarwrapper">

Previous topic

Traversing

Next topic

Scraping

This Page

爬取

PyQuery 能够从 url 加载 html 文档:

>>> pq(your_url)
[<html>]

默认使用pythonurllib

如果安装了 requests ,那么它将使用它。 这允许您使用大多数 requests 参数:

>>> pq(your_url, headers={'user-agent': 'pyquery'})
[<html>]

>>> pq(your_url, {'q': 'foo'}, method='post', verify=True)
[<html>]

超时

默认超时为 60 秒,您可以通过设置转发到底层 urllibrequests 库的超时参数来更改它。

会话

使用 requests 库时,您可以实例化一个在 http 调用之间保持状态的 Session 对象(例如 - 保持 cookie )。 您可以设置会话参数以使用此会话对象。

tips

转化为绝对链接

您可以将链接设为绝对链接,这对于页面抓取很有用:

>>> d = pq(url=your_url, parser='html')
>>> d('form').attr('action')
'/form-submit'
>>> d.make_links_absolute()
[<html>]

使用不一样的解析器

默认情况下,pyquery 使用 lxml 包的 xml 解析器,如果它不起作用,则继续尝试来自 lxml.htmlhtml 解析器。 xml 解析器在解析 xhtml 页面时有时会出现问题,因为解析器不会引发错误,而是会给出一个不可用的树(示例在 w3c.org 上)。

您还可以选择显式使用哪个解析器:

>>> pq('<html><body><p>toto</p></body></html>', parser='xml')
[<html>]
>>> pq('<html><body><p>toto</p></body></html>', parser='html')
[<html>]
>>> pq('<html><body><p>toto</p></body></html>', parser='html_fragments')
[<p>]

测试

如果您想运行上面可以看到的测试,您应该执行以下操作:

>$ git clone git://github.com/gawel/pyquery.git
>$ cd pyquery
>$ python bootstrap.py
>$ bin/buildout install tox
>$ bin/tox

您可以通过执行以下操作来构建 Sphinx 文档:

>$ cd docs
>$ make html

计划

历史

更多文档

Index — pyquery 1.4.3 documentation
<div class="document">
  <div class="documentwrapper">
    <div class="bodywrapper">
      <div class="body" role="main">

Index

A | B | C | E | F | H | I | M | N | O | P | R | S | T | V | W | X

A

B

C

E

F

H

I

M

  • module
      <ul>
        <li><a href="api.html#module-pyquery.pyquery">pyquery.pyquery</a>
    

N

O

P

R

S

T

V

W

X

        <div class="clearer"></div>
      </div>
    </div>
  </div>
  <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
    <div class="sphinxsidebarwrapper">
Python Module Index — pyquery 1.4.3 documentation
<div class="document">
  <div class="documentwrapper">
    <div class="bodywrapper">
      <div class="body" role="main">

Python Module Index

p
 
p
pyquery
    pyquery.pyquery
        <div class="clearer"></div>
      </div>
    </div>
  </div>
  <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
    <div class="sphinxsidebarwrapper">
<title>pyquery: a jquery-like library for python &#8212; pyquery 1.4.3 documentation</title>
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
<link rel="stylesheet" type="text/css" href="_static/nature.css" />
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
<script src="_static/jquery.js"></script>
<script src="_static/underscore.js"></script>
<script src="_static/doctools.js"></script>
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Attributes" href="attributes.html" /> 
<div class="document">
  <div class="documentwrapper">
    <div class="bodywrapper">
      <div class="body" role="main">

pyquery: a jquery-like library for python

Build Status

pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jquery. pyquery uses lxml for fast xml and html manipulation.

This is not (or at least not yet) a library to produce or interact with javascript code. I just liked the jquery API and I missed it in python so I told myself “Hey let’s make jquery in python”. This is the result.

The project is being actively developped on a git repository on Github. I have the policy of giving push access to anyone who wants it and then to review what they do. So if you want to contribute just email me.

Please report bugs on the github issue tracker.

I’ve spent hours maintaining this software, with love. Please consider tiping if you like it:

BTC: 1PruQAwByDndFZ7vTeJhyWefAghaZx9RZg

ETH: 0xb6418036d8E06c60C4D91c17d72Df6e1e5b15CE6

LTC: LY6CdZcDbxnBX9GFBJ45TqVj8NykBBqsmT

Quickstart

You can use the PyQuery class to load an xml document from a string, a lxml document, from a file or from an url:

>>> from pyquery import PyQuery as pq
>>> from lxml import etree
>>> import urllib
>>> d = pq("<html></html>")
>>> d = pq(etree.fromstring("<html></html>"))
>>> d = pq(url=your_url)
>>> d = pq(url=your_url,
...        opener=lambda url, **kw: urlopen(url).read())
>>> d = pq(filename=path_to_html_file)

Now d is like the $ in jquery:

>>> d("#hello")
[<p#hello.hello>]
>>> p = d("#hello")
>>> print(p.html())
Hello world !
>>> p.html("you know <a href='http://python.org/'>Python</a> rocks")
[<p#hello.hello>]
>>> print(p.html())
you know <a href="http://python.org/">Python</a> rocks
>>> print(p.text())
you know Python rocks

You can use some of the pseudo classes that are available in jQuery but that are not standard in css such as :first :last :even :odd :eq :lt :gt :checked :selected :file:

>>> d('p:first')
[<p#hello.hello>]

Full documentation

More documentation

First there is the Sphinx documentation here. Then for more documentation about the API you can use the jquery website. The reference I’m now using for the API is … the color cheat sheet. Then you can always look at the code.

Indices and tables

        <div class="clearer"></div>
      </div>
    </div>
  </div>
  <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
    <div class="sphinxsidebarwrapper">

Table of Contents

Next topic

Attributes

This Page