Logo Search packages:      
Sourcecode: wapiti version File versions  Download package

BeautifulSoup::ICantBelieveItsBeautifulSoup Class Reference

Inheritance diagram for BeautifulSoup::ICantBelieveItsBeautifulSoup:

BeautifulSoup::BeautifulSoup BeautifulSoup::BeautifulStoneSoup BeautifulSoup::Tag BeautifulSoup::PageElement

List of all members.

Detailed Description

The BeautifulSoup class is oriented towards skipping over
common HTML errors like unclosed tags. However, sometimes it makes
errors of its own. For instance, consider this fragment:


This is perfectly valid (if bizarre) HTML. However, the
BeautifulSoup class will implicitly close the first b tag when it
encounters the second 'b'. It will think the author wrote
"<b>Foo<b>Bar", and didn't close the first 'b' tag, because
there's no real-world reason to bold something that's already
bold. When it encounters '</b></b>' it will close two more 'b'
tags, for a grand total of three tags closed instead of two. This
can throw off the rest of your document structure. The same is
true of a number of other tags, listed below.

It's much more common for someone to forget to close a 'b' tag
than to actually use nested 'b' tags, and the BeautifulSoup class
handles the common case. This class handles the not-co-common
case: where you can't believe someone wrote what they did, but
it's valid HTML and BeautifulSoup screwed up by assuming it
wouldn't be.

Definition at line 1425 of file BeautifulSoup.py.

Public Member Functions

def __getattr__
def __init__
def __init__
def endData
def extract
def findAllNext
def findAllPrevious
def findNext
def findNextSibling
def findNextSiblings
def findParent
def findParents
def findPrevious
def findPreviousSibling
def findPreviousSiblings
def handle_charref
def handle_comment
def handle_data
def handle_decl
def handle_entityref
def handle_pi
def insert
def isSelfClosingTag
def nextGenerator
def nextSiblingGenerator
def parentGenerator
def parse_declaration
def popTag
def previousGenerator
def previousSiblingGenerator
def pushTag
def replaceWith
def reset
def setup
def start_meta
def substituteEncoding
def toEncoding
def unknown_endtag
def unknown_starttag

Public Attributes


Static Public Attributes

tuple CHARSET_RE = re.compile("((^|;)\s*charset=)([^;]*)")
 fetchNextSiblings = findNextSiblings
 fetchParents = findParents
 fetchPrevious = findAllPrevious
 fetchPreviousSiblings = findPreviousSiblings
string HTML_ENTITIES = "html"
list NESTABLE_BLOCK_TAGS = ['blockquote', 'div', 'fieldset', 'ins', 'del']
list NON_NESTABLE_BLOCK_TAGS = ['address', 'form', 'p', 'pre']
dictionary QUOTE_TAGS = {'script': None}
string ROOT_TAG_NAME = u'[document]'
string XML_ENTITIES = "xml"

The documentation for this class was generated from the following file:

Generated by  Doxygen 1.6.0   Back to index