HTML转JSON字符串怎么做？

酷盾叔 • 2025年6月17日 12:42 • 前端开发 • 阅读 0

使用JavaScript解析HTML字符串为DOM对象，遍历节点提取数据构建JavaScript对象，再通过JSON.stringify()方法转换为JSON字符串。

要将HTML转换为JSON字符串，需通过JavaScript解析DOM结构并递归映射为JSON对象,以下是详细步骤和代码示例：

核心实现原理

DOM解析
使用浏览器内置的DOMParser解析HTML字符串,生成结构化DOM树。
递归遍历节点
将每个DOM节点转换为JSON对象，包含标签名、属性、子节点和文本内容。
处理特殊节点
区分元素节点、文本节点、注释节点,确保完整转换。

完整代码实现

function htmlToJson(htmlString) {
  // 1. 解析HTML字符串为DOM文档
  const parser = new DOMParser();
  const doc = parser.parseFromString(htmlString, "text/html");
  // 2. 递归转换DOM节点为JSON
  function convertNode(node) {
    // 处理元素节点
    if (node.nodeType === Node.ELEMENT_NODE) {
      const attributes = {};
      // 提取所有属性
      for (const attr of node.attributes) {
        attributes[attr.name] = attr.value;
      }
      // 递归处理子节点
      const children = [];
      node.childNodes.forEach(child => {
        children.push(convertNode(child));
      });
      return {
        tag: node.tagName.toLowerCase(),
        attributes,
        children
      };
    }
    // 处理文本节点（非空文本）
    else if (node.nodeType === Node.TEXT_NODE && node.textContent.trim()) {
      return { text: node.textContent.trim() };
    }
    // 处理注释节点
    else if (node.nodeType === Node.COMMENT_NODE) {
      return { comment: node.textContent };
    }
    // 忽略其他节点类型
    return null;
  }
  // 3. 从根节点开始转换
  const rootNode = doc.body.firstChild;
  return rootNode ? convertNode(rootNode) : null;
}
// 使用示例
const html = `
  <div class="container" id="main">
    <h1>标题</h1>
    <p>段落文本</p>
    <!-- 注释 -->
    <ul>
      <li>列表1</li>
      <li>列表2</li>
    </ul>
  </div>
`;
const jsonData = htmlToJson(html);
const jsonString = JSON.stringify(jsonData, null, 2); // 格式化输出
console.log(jsonString);

关键步骤解析

DOMParser解析HTML
parser.parseFromString()将字符串转换为可操作的DOM文档对象。
节点类型判断
- nodeType === 1：元素节点（如<div>）
- nodeType === 3：文本节点
- nodeType === 8：注释节点
属性提取
遍历node.attributes集合,将属性存入键值对对象。
子节点递归
对每个子节点递归调用convertNode(),生成嵌套结构。
处理
用.trim()移除空白字符,避免无效文本节点。

输出JSON结构示例

{
  "tag": "div",
  "attributes": {
    "class": "container",
    "id": "main"
  },
  "children": [
    {
      "tag": "h1",
      "attributes": {},
      "children": [{ "text": "标题" }]
    },
    {
      "tag": "p",
      "attributes": {},
      "children": [{ "text": "段落文本" }]
    },
    { "comment": "注释" },
    {
      "tag": "ul",
      "attributes": {},
      "children": [
        {
          "tag": "li",
          "attributes": {},
          "children": [{ "text": "列表1" }]
        },
        {
          "tag": "li",
          "attributes": {},
          "children": [{ "text": "列表2" }]
        }
      ]
    }
  ]
}

注意事项

浏览器环境限制
需在浏览器或支持DOMParser的环境（如Puppeteer）中运行。
复杂结构处理
- 自闭合标签（如<img>）会正常解析为无子节点元素
- 脚本标签<script>内容会以文本节点保留
性能优化
深层嵌套HTML建议分块处理,避免递归栈溢出。
数据安全
避免直接解析用户输入的HTML，防止XSS攻击（需先消毒）。

应用场景

服务端渲染（SSR）数据脱水/注水结构化存储
爬虫数据提取
富文本编辑器内容转换

引用说明：本文代码基于MDN Web Docs的DOMParser API和Node接口标准实现，遵循W3C DOM规范。

原创文章，发布者：酷盾叔，转转请注明出处：https://www.kd.cn/ask/28095.html

HTML转JSON字符串怎么做？

核心实现原理

完整代码实现

关键步骤解析

输出JSON结构示例

注意事项

应用场景

发表回复

联系我们

400-880-8834

HTML转JSON字符串怎么做？

核心实现原理

完整代码实现

关键步骤解析

输出JSON结构示例

注意事项

应用场景

相关推荐

HTML怎样快速创建表格？高效技巧一网打尽！

怎样消除HTML页面中的上下滑动效果？

如何设置HTML页脚居左？

HTML如何轻松添加6像素边框？

ASP.NET如何输出html

发表回复

联系我们

400-880-8834