Mapping output

Rather than receive verbose output of each robot in your assembly, you can instead choose to have only the parts of the result(s) that you require by passing a map parameter within the step assembly. map is a dict of key/value pairs where:

  • key = the key you want used in the returned results
  • value = the JSON pointer to the value you wish to extract and assign to key

For example, the following template extracts H2 tags from our homepage and passes the attr->style string to a regex bot. We can map the output to reduce it to just the H2 text from the xpath robot and the colour hex code from the regex robot:

Request

{
    "auth": {"key": "your-api-key"},
    "url": "http://www.extractbot.com/",
    "steps": [{
        "name": "your-step-identifier",
        "robot": "xpath",
        "value": "//h2",
        "map": {
            "text": "/text",
            "colour": "/next/your-sub-step-identifier/result/0/1"
        },
        "steps": [{
            "use": "/attr/style",
            "name": "your-sub-step-identifier",
            "robot": "regex",
            "value": ".*(#.*?)$"
        }]
    }]
}

Response

{
    "your-step-identifier": [
        {
            "colour": "#989d91",
            "text": "Meet our robots"
        },
        {
            "colour": "#989d91",
            "text": "Content extraction"
        },
        {
            "colour": "#989d91",
            "text": "Super fast API"
        },
        {
            "colour": "#0f0e0d",
            "text": "Pricing"
        },
        {
            "colour": "#989d91",
            "text": "Try it out!"
        }
    ]
}

Compare this to the original response and you will begin to see how useful map can be.

Original response

{
  "your-step-identifier": {
    "robot": "xpath",
    "value": "//h2",
    "result": [
      {
        "html": "<h2 style=\"color:#989d91\">Meet our robots</h2>",
        "inner_html": "Meet our robots",
        "text": "Meet our robots",
        "attr": {
          "style": "color:#989d91"
        },
        "next": {
          "your-sub-step-identifier": {
            "robot": "regex",
            "value": ".*(#.*?)$",
            "result": [
              {
                "0": "color:#989d91",
                "1": "#989d91"
              }
            ]
          }
        }
      },
      {
        "html": "<h2 style=\"color:#989d91\">Content extraction</h2>",
        "inner_html": "Content extraction",
        "text": "Content extraction",
        "attr": {
          "style": "color:#989d91"
        },
        "next": {
          "your-sub-step-identifier": {
            "robot": "regex",
            "value": ".*(#.*?)$",
            "result": [
              {
                "0": "color:#989d91",
                "1": "#989d91"
              }
            ]
          }
        }
      },
      {
        "html": "<h2 style=\"color:#989d91\">Super fast API</h2>",
        "inner_html": "Super fast API",
        "text": "Super fast API",
        "attr": {
          "style": "color:#989d91"
        },
        "next": {
          "your-sub-step-identifier": {
            "robot": "regex",
            "value": ".*(#.*?)$",
            "result": [
              {
                "0": "color:#989d91",
                "1": "#989d91"
              }
            ]
          }
        }
      },
      {
        "html": "<h2 style=\"color:#0f0e0d\">Pricing</h2>",
        "inner_html": "Pricing",
        "text": "Pricing",
        "attr": {
          "style": "color:#0f0e0d"
        },
        "next": {
          "your-sub-step-identifier": {
            "robot": "regex",
            "value": ".*(#.*?)$",
            "result": [
              {
                "0": "color:#0f0e0d",
                "1": "#0f0e0d"
              }
            ]
          }
        }
      },
      {
        "html": "<h2 style=\"color:#989d91\">Try it out!</h2>",
        "inner_html": "Try it out!",
        "text": "Try it out!",
        "attr": {
          "style": "color:#989d91"
        },
        "next": {
          "your-sub-step-identifier": {
            "robot": "regex",
            "value": ".*(#.*?)$",
            "result": [
              {
                "0": "color:#989d91",
                "1": "#989d91"
              }
            ]
          }
        }
      }
    ]
  }
}