搜索嵌套子文档

本节介绍可用于搜索深度嵌套文档的潜在技术,展示如何使用 Solr 的一些查询解析器和文档转换器构建更复杂的查询。

这些功能需要在模式中声明 _root__nest_path_。有关模式和索引配置的详细信息,请参阅索引嵌套文档

本节不演示对嵌套文档进行分面。有关嵌套文档分面,请参阅块连接分面计数部分。

查询示例

对于即将到来的示例,我们将假设一个索引包含与索引嵌套文档中相同的文档

[{ "id": "P11!prod",
   "name_s": "Swingline Stapler",
   "description_t": "The Cadillac of office staplers ...",
   "skus": [ { "id": "P11!S21",
               "color_s": "RED",
               "price_i": 42,
               "manuals": [ { "id": "P11!D41",
                              "name_s": "Red Swingline Brochure",
                              "pages_i":1,
                              "content_t": "..."
                            } ]
             },
             { "id": "P11!S31",
               "color_s": "BLACK",
               "price_i": 3
             } ],
   "manuals": [ { "id": "P11!D51",
                  "name_s": "Quick Reference Guide",
                  "pages_i":1,
                  "content_t": "How to use your stapler ..."
                },
                { "id": "P11!D61",
                  "name_s": "Warranty Details",
                  "pages_i":42,
                  "content_t": "... lifetime guarantee ..."
                } ]
 },
 { "id": "P22!prod",
   "name_s": "Mont Blanc Fountain Pen",
   "description_t": "A Premium Writing Instrument ...",
   "skus": [ { "id": "P22!S22",
               "color_s": "RED",
               "price_i": 89,
               "manuals": [ { "id": "P22!D42",
                              "name_s": "Red Mont Blanc Brochure",
                              "pages_i":1,
                              "content_t": "..."
                            } ]
             },
             { "id": "P22!S32",
               "color_s": "BLACK",
               "price_i": 67
             } ],
   "manuals": [ { "id": "P22!D52",
                  "name_s": "How To Use A Pen",
                  "pages_i":42,
                  "content_t": "Start by removing the cap ..."
                } ]
 } ]

子文档转换器

默认情况下,与查询匹配的文档不会在响应中包含其任何嵌套子项。[child]文档转换器可用于使用文档的后代来丰富查询结果。

有关此转换器的详细说明,以及其语法和限制的详细信息,请参阅[child - ChildDocTransformerFactory]部分。

一个简单的查询,匹配所有描述中包含“staplers”的文档

$ curl 'https://127.0.0.1:8983/solr/gettingstarted/select?omitHeader=true&q=description_t:staplers'
{
  "response":{"numFound":1,"start":0,"maxScore":0.30136836,"numFoundExact":true,"docs":[
      {
        "id":"P11!prod",
        "name_s":"Swingline Stapler",
        "description_t":"The Cadillac of office staplers ...",
        "_version_":1672933224035123200}]
  }}

以下显示了添加了[child]转换器的相同查询。请注意,numFound没有改变,我们仍然匹配同一组文档,但在返回这些文档时,嵌套的子项也会作为伪字段返回。

$ curl 'https://127.0.0.1:8983/solr/gettingstarted/select?omitHeader=true&q=description_t:staplers&fl=*,[child]'
{
  "response":{"numFound":1,"start":0,"maxScore":0.30136836,"numFoundExact":true,"docs":[
      {
        "id":"P11!prod",
        "name_s":"Swingline Stapler",
        "description_t":"The Cadillac of office staplers ...",
        "_version_":1672933224035123200,
        "skus":[
          {
            "id":"P11!S21",
            "color_s":"RED",
            "price_i":42,
            "_version_":1672933224035123200,
            "manuals":[
              {
                "id":"P11!D41",
                "name_s":"Red Swingline Brochure",
                "pages_i":1,
                "content_t":"...",
                "_version_":1672933224035123200}]},

          {
            "id":"P11!S31",
            "color_s":"BLACK",
            "price_i":3,
            "_version_":1672933224035123200}],
        "manuals":[
          {
            "id":"P11!D51",
            "name_s":"Quick Reference Guide",
            "pages_i":1,
            "content_t":"How to use your stapler ...",
            "_version_":1672933224035123200},

          {
            "id":"P11!D61",
            "name_s":"Warranty Details",
            "pages_i":42,
            "content_t":"... lifetime guarantee ...",
            "_version_":1672933224035123200}]}]
  }}

子查询解析器

{!child}查询解析器可用于搜索与包装的查询匹配的父文档的后代文档。有关此解析器的详细说明,请参阅块连接子项查询解析器部分。

让我们再次考虑上面使用的description_t:staplers查询 - 如果我们将该查询包装在{!child}查询解析器中,那么我们将匹配并返回产品级别的文档,而是匹配原始查询的所有后代子文档

$ curl 'https://127.0.0.1:8983/solr/gettingstarted/select' -d 'omitHeader=true' -d 'q={!child of="*:* -_nest_path_:*"}description_t:staplers'
{
  "response":{"numFound":5,"start":0,"maxScore":0.30136836,"numFoundExact":true,"docs":[
      {
        "id":"P11!D41",
        "name_s":"Red Swingline Brochure",
        "pages_i":1,
        "content_t":"...",
        "_version_":1672933224035123200},
      {
        "id":"P11!S21",
        "color_s":"RED",
        "price_i":42,
        "_version_":1672933224035123200},
      {
        "id":"P11!S31",
        "color_s":"BLACK",
        "price_i":3,
        "_version_":1672933224035123200},
      {
        "id":"P11!D51",
        "name_s":"Quick Reference Guide",
        "pages_i":1,
        "content_t":"How to use your stapler ...",
        "_version_":1672933224035123200},
      {
        "id":"P11!D61",
        "name_s":"Warranty Details",
        "pages_i":42,
        "content_t":"... lifetime guarantee ...",
        "_version_":1672933224035123200}]
  }}

在此示例中,我们使用了*:* -_nest_path_:*作为我们的of参数来指示我们希望考虑所有没有嵌套路径的文档,即所有“根”级别文档,作为可能的父级集。

通过更改 of 参数以匹配特定 _nest_path_ 级别的祖先,我们可以缩小返回的子项列表。在下面的查询中,我们搜索 skus 的所有后代(使用一个 of 参数来识别所有具有前缀 /skus/*_nest_path_ 的文档),其 price_i 小于 50

$ curl 'https://127.0.0.1:8983/solr/gettingstarted/select' -d 'omitHeader=true' --data-urlencode 'q={!child of="*:* -_nest_path_:\\/skus\\/*"}(+price_i:[* TO 50] +_nest_path_:\/skus)'
{
  "response":{"numFound":1,"start":0,"maxScore":1.0,"numFoundExact":true,"docs":[
      {
        "id":"P11!D41",
        "name_s":"Red Swingline Brochure",
        "pages_i":1,
        "content_t":"...",
        "_version_":1675662666752851968}]
  }}
of 中双重转义 _nest_path_ 斜杠

请注意,在上面的示例中,_nest_path_ 中的 / 字符在 of 参数中被“双重转义”

  • 需要一层 \ 转义,以防止 / 被解释为 正则表达式查询

  • 还需要额外一层“转义转义字符”,因为 of 本地参数是一个带引号的字符串;因此,我们需要第二个 \ 来确保第一个 \ 被保留并原样传递给查询解析器。

(你可以看到,在查询字符串的主体中只需要一层 \ 转义即可防止正则表达式语法,因为它不是带引号的字符串本地参数)。

你可能会发现使用 参数引用 以及 其他解析器 更方便,这些解析器不会将 / 视为特殊字符,从而以更详细的形式表达相同的查询

$ curl 'https://127.0.0.1:8983/solr/gettingstarted/select' -d 'omitHeader=true' --data-urlencode 'q={!child of=$block_mask}(+price_i:[* TO 50] +{!field f="_nest_path_" v="/skus"})' --data-urlencode 'block_mask=(*:* -{!prefix f="_nest_path_" v="/skus/"})'

父查询解析器

{!child} 查询解析器的反向操作是 {!parent} 查询解析器,它允许你搜索与包装查询匹配的某些子文档的祖先文档。有关此解析器的详细说明,请参阅块连接父查询解析器部分。

首先考虑这个例子,搜索所有具有正好 1 页的 "manual" 类型文档

$ curl 'https://127.0.0.1:8983/solr/gettingstarted/select?omitHeader=true&q=pages_i:1'
{
  "response":{"numFound":3,"start":0,"maxScore":1.0,"numFoundExact":true,"docs":[
      {
        "id":"P11!D41",
        "name_s":"Red Swingline Brochure",
        "pages_i":1,
        "content_t":"...",
        "_version_":1676585794196733952},
      {
        "id":"P11!D51",
        "name_s":"Quick Reference Guide",
        "pages_i":1,
        "content_t":"How to use your stapler ...",
        "_version_":1676585794196733952},
      {
        "id":"P22!D42",
        "name_s":"Red Mont Blanc Brochure",
        "pages_i":1,
        "content_t":"...",
        "_version_":1676585794347728896}]
  }}

我们可以将该查询包装在 {!parent} 查询中,以返回这些手册的所有祖先产品详细信息

$ curl 'https://127.0.0.1:8983/solr/gettingstarted/select' -d 'omitHeader=true' --data-urlencode 'q={!parent which="*:* -_nest_path_:*"}(+_nest_path_:\/skus\/manuals +pages_i:1)'
{
  "response":{"numFound":2,"start":0,"maxScore":1.4E-45,"numFoundExact":true,"docs":[
      {
        "id":"P11!prod",
        "name_s":"Swingline Stapler",
        "description_t":"The Cadillac of office staplers ...",
        "_version_":1676585794196733952},
      {
        "id":"P22!prod",
        "name_s":"Mont Blanc Fountain Pen",
        "description_t":"A Premium Writing Instrument ...",
        "_version_":1676585794347728896}]
  }}

在此示例中,我们使用了 *:* -_nest_path_:* 作为我们的 which 参数,以表明我们要考虑所有没有嵌套路径的文档——即所有“根”级别文档——作为可能的父项集合。

通过更改 which 参数以匹配特定 _nest_path_ 级别的祖先,我们可以更改返回的祖先类型。在下面的查询中,我们搜索 skus (使用一个 which 参数来标识所有带有前缀 /skus/*_nest_path_ 文档),这些 skus 是具有正好 1 页的 manuals 的祖先

$ curl 'https://127.0.0.1:8983/solr/gettingstarted/select' -d 'omitHeader=true' --data-urlencode 'q={!parent which="*:* -_nest_path_:\\/skus\\/*"}(+_nest_path_:\/skus\/manuals +pages_i:1)'
{
  "response":{"numFound":2,"start":0,"maxScore":1.4E-45,"numFoundExact":true,"docs":[
      {
        "id":"P11!S21",
        "color_s":"RED",
        "price_i":42,
        "_version_":1676585794196733952},
      {
        "id":"P22!S22",
        "color_s":"RED",
        "price_i":89,
        "_version_":1676585794347728896}]
  }}

请注意,在上面的示例中,_nest_path_ 中的 / 字符在 which 参数中被“双重转义”,其原因与上面讨论的关于 {!child} 解析器 of 参数的原因相同

将块连接查询解析器与子文档转换器结合使用

这两个解析器与 [child] 转换器的结合使用,可以无缝创建非常强大的查询。

例如,这里有一个查询,其中:

  • 返回的 (sku) 文档必须具有 "RED" 颜色

  • 返回的 (sku) 文档必须是具有以下条件的根级别(产品)文档的后代:

    • 具有以下条件的直接子级 "manuals" 文档:

      • 在其内容中包含 "lifetime guarantee"

  • 每个返回的 (sku) 文档还包括它拥有的任何后代 (manuals) 文档

$ curl 'https://127.0.0.1:8983/solr/gettingstarted/select' -d 'omitHeader=true' -d 'fq=color_s:RED' --data-urlencode 'q={!child of="*:* -_nest_path_:*" filters=$parent_fq}' --data-urlencode 'parent_fq={!parent which="*:* -_nest_path_:*"}(+_nest_path_:"/manuals" +content_t:"lifetime guarantee")' -d 'fl=*,[child]'
{
  "response":{"numFound":1,"start":0,"maxScore":1.4E-45,"numFoundExact":true,"docs":[
      {
        "id":"P11!S21",
        "color_s":"RED",
        "price_i":42,
        "_version_":1676585794196733952,
        "manuals":[
          {
            "id":"P11!D41",
            "name_s":"Red Swingline Brochure",
            "pages_i":1,
            "content_t":"...",
            "_version_":1676585794196733952}]}]
  }}