Sponsorised links
October 2009
Whatpm::HTML - An HTML Parser and Serializer
Whatpm::HTML - An HTML Parser and Serializer
rdfa_parser | gemcutter | awesome gem hosting
Yields each triple, or generate in-memory graph
pyparsing
ONLamp.com: Building Recursive Descent Parsers with Python
What is "parsing"? Parsing is processing a series of symbols to extract their meaning. Typically, this means reading the words of a sentence and drawing information from them. When application programs need to process data that is provided as text, they must use some form of parsing logic. This logic scans the text characters and character groups (words) and recognizes patterns of groups to extract the underlying commands or information.
Sponsorised links
August 2009
Character encoding detection for external scripts
This is (EF BB BF) C3 B6 3D 22 21 22 loaded into browsers under various labels. That happens to be properly formed ECMAScript code for all the encodings used. The bogus results for Opera9 can easily be reproduced in context of the testing script, but probably not individually from a clean cache; what's going on there is unknown. I also noted in running these tests that Opera claims "Opera supports the entire ECMA-262 2nd and 3rd standards with no exceptions" while in fact their implementation does not, the parser rejects code that follows the IdentifierStart :: UnicodeEscapeSequence production of ECMA-262 section 7.6. Instead it implements Opera-only extensions, like comma-free arrays ala [ 1 2 3 ]. Other fun facts include: IE does not implement onload for iframes and cannot modify the innerHTML or tr elements; Firefox ignores "tags" when setting the innerHTML of dynamically created tr elements with no ownerElement... Oh and Opera again needs /th "tags" so it won't nest adjacent th elements when setting innerHTML.
RDFa Fragment Parser
Paste a chunk of XHTML RDFa below, and click "Parse."
make sure you do the right thing for RDFa validation when you eventually place this chunk inside a web page
July 2009
Sparkles everywhere, CubicWeb gets fizzy (CubicWeb's Forge)
Fyzz parses the SPARQL query and generates something we decided to call an AST although it's still a bit rough for now. Fyzz understands simple triples, distincts, limits, offsets and other basic functionalities.
fyzz (fyzz is a sparkling Python parser for the Sparql query language) (Logilab.org)
fyzz is a sparkling Python parser for the Sparql query language
John Resig - HTML 5 Parsing
If you're interested in giving the new parser a try (it's doubtful that you'll see many obvious changes - but any help in hunting down bugs would be appreciated) you can download a nightly of Firefox, open about:config, and set html5.enable to true.
May 2009
Python Package Index : pyWxSVG 0.1
View and print svg file or svg content, convert svg to raster graphics. Partial support svg format. Tested with Python 2.5 and wxPython 2.8.9.2. Drawing use wx.GraphicsContext class. Path parser from Enable - SVGPathParser class.
March 2009
RFC (2)822 & 3696 Email Address Parser in PHP
The test suite shows results for each parser, based on these test definitions. These are borrowed from Dominic Sayers who has a similar parser. We are still arguing over certain tests ;)
February 2009
Les parsers HTML5 - La Tortue Cynique / The Cynical Turtle
Bref, on a donc besoin d'un parser spécifique (après 30 ans à travailler avec des parsers génériques GML et SGML),
January 2009
November 2008
PHP Simple HTML DOM Parser
- A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
- Require PHP 5+.
- Supports invalid HTML.
- Find tags on an HTML page with selectors just like jQuery.
- Extract contents from HTML in a single line.
