Skip to main content

Webgrph

Site structure mapping, crawl, and hierarchy

1 min read


Webgrph maps the full structure of a website using artifact chaining.

crawl

result = client.webgrph.crawl(
    "https://example.com",
    max_depth=3,
    max_pages=500,
    artifact_id=page.artifact_id,
)
print(result.total_pages, result.job_id)
const result = await client.webgrph.crawl('https://example.com', {
  maxDepth: 3,
  artifactId: page.artifactId,
});

crawl_all (auto-pagination)

for p in client.webgrph.crawl_all("https://example.com", max_depth=3):
    print(p.url, p.depth)

crawlStream (SSE)

for await (const event of client.webgrph.crawlStream('https://example.com')) {
  console.log(event.type, event.url, event.depth);
}

get_hierarchy

tree = client.webgrph.get_hierarchy("https://example.com")
print(tree.root.url, len(tree.root.children))

PageNode fields

| Field | Type | Description | |-------|------|-------------| | url | string | Page URL | | depth | integer | Depth from root | | status_code | integer | HTTP status | | title | string or null | Page title | | children | PageNode list | Child pages |

Was this page helpful?