Rather than receive verbose output of each robot in your assembly, you can instead choose to have only the parts of the
result(s) that you require by passing a map
parameter within the step assembly. map
is a dict of key/value pairs
where:
key
= the key you want used in the returned resultsvalue
= the JSON pointer to the value you wish to extract and
assign to key
For example, the following template extracts H2 tags from our homepage and passes the attr->style
string to a regex bot. We can map
the output to reduce it to just the H2 text from the xpath robot and the colour hex code from the
regex robot:
{ "auth": {"key": "your-api-key"}, "url": "http://www.extractbot.com/", "steps": [{ "name": "your-step-identifier", "robot": "xpath", "value": "//h2", "map": { "text": "/text", "colour": "/next/your-sub-step-identifier/result/0/1" }, "steps": [{ "use": "/attr/style", "name": "your-sub-step-identifier", "robot": "regex", "value": ".*(#.*?)$" }] }] }
{ "your-step-identifier": [ { "colour": "#989d91", "text": "Meet our robots" }, { "colour": "#989d91", "text": "Content extraction" }, { "colour": "#989d91", "text": "Super fast API" }, { "colour": "#0f0e0d", "text": "Pricing" }, { "colour": "#989d91", "text": "Try it out!" } ] }
Compare this to the original response and you will begin to see how useful map
can be.
{ "your-step-identifier": { "robot": "xpath", "value": "//h2", "result": [ { "html": "<h2 style=\"color:#989d91\">Meet our robots</h2>", "inner_html": "Meet our robots", "text": "Meet our robots", "attr": { "style": "color:#989d91" }, "next": { "your-sub-step-identifier": { "robot": "regex", "value": ".*(#.*?)$", "result": [ { "0": "color:#989d91", "1": "#989d91" } ] } } }, { "html": "<h2 style=\"color:#989d91\">Content extraction</h2>", "inner_html": "Content extraction", "text": "Content extraction", "attr": { "style": "color:#989d91" }, "next": { "your-sub-step-identifier": { "robot": "regex", "value": ".*(#.*?)$", "result": [ { "0": "color:#989d91", "1": "#989d91" } ] } } }, { "html": "<h2 style=\"color:#989d91\">Super fast API</h2>", "inner_html": "Super fast API", "text": "Super fast API", "attr": { "style": "color:#989d91" }, "next": { "your-sub-step-identifier": { "robot": "regex", "value": ".*(#.*?)$", "result": [ { "0": "color:#989d91", "1": "#989d91" } ] } } }, { "html": "<h2 style=\"color:#0f0e0d\">Pricing</h2>", "inner_html": "Pricing", "text": "Pricing", "attr": { "style": "color:#0f0e0d" }, "next": { "your-sub-step-identifier": { "robot": "regex", "value": ".*(#.*?)$", "result": [ { "0": "color:#0f0e0d", "1": "#0f0e0d" } ] } } }, { "html": "<h2 style=\"color:#989d91\">Try it out!</h2>", "inner_html": "Try it out!", "text": "Try it out!", "attr": { "style": "color:#989d91" }, "next": { "your-sub-step-identifier": { "robot": "regex", "value": ".*(#.*?)$", "result": [ { "0": "color:#989d91", "1": "#989d91" } ] } } } ] } }