转换和索引自定义 JSON

如果您有想要索引的 JSON 文档，而无需将其转换为 Solr 的结构，您可以通过在更新请求中包含一些参数将其添加到 Solr。

这些参数提供了有关如何将单个 JSON 文件拆分为多个 Solr 文档以及如何将字段映射到 Solr 的模式的信息。可以使用配置参数将一个或多个有效的 JSON 文档发送到 /update/json/docs 路径。

映射参数

这些参数允许您定义如何读取 JSON 文件以获取多个 Solr 文档。

split

可选

默认值：无

定义将输入 JSON 拆分为多个 Solr 文档的路径，如果您在单个 JSON 文件中有多个文档，则此参数是必需的。如果整个 JSON 构成单个 Solr 文档，则路径必须为“/”。

可以通过使用管道 (|) 分隔它们来传递多个 split 路径，例如：split=/|/foo|/foo/bar。如果一个路径是另一个路径的子路径，它们会自动成为子文档。

f

必需

默认值：无

提供多值映射，以将文档字段名称映射到 Solr 字段名称。参数的格式为 target-field-name:json-path，例如 f=first:/first。json-path 是必需的。target-field-name 是 Solr 文档字段名称，是可选的。如果未指定，则会自动从输入的 JSON 中派生。默认的目标字段名称是该字段的完全限定名称。

此处可以使用通配符，有关更多信息，请参阅下文的使用通配符进行字段名称匹配。

mapUniqueKeyOnly

可选	默认值：`false`

当输入 JSON 中的字段在模式中不可用，并且未启用无模式模式时，此参数特别方便。它会将所有字段索引到默认搜索字段（使用 df 参数），并且只有 uniqueKey 字段被映射到模式中的对应字段。如果输入 JSON 中没有 uniqueKey 字段的值，则会为其生成一个 UUID。

df

可选

默认值：无

如果启用 mapUniqueKeyOnly 标志，更新处理程序需要一个字段来索引数据。这个字段与其他处理程序用作默认搜索字段的字段相同。

srcField

可选

默认值：无

这是存储 JSON 源的字段名称。只有当 split=/ 时才能使用此参数（即，您希望将 JSON 输入文件索引为单个 Solr 文档）。请注意，原子更新会导致该字段与文档不同步。

echo

可选	默认值：`false`

这仅用于调试目的。如果您希望将文档作为响应返回，请将其设置为 true。不会进行任何索引。

例如，如果我们有一个包含两个文档的 JSON 文件，我们可以定义一个如下的更新请求

V1 API
V2 API 用户管理 / 单节点 Solr
V2 API SolrCloud

curl 'http://localhost:8983/solr/techproducts/update/json/docs'\
'?split=/exams'\
'&f=first:/first'\
'&f=last:/last'\
'&f=grade:/grade'\
'&f=subject:/exams/subject'\
'&f=test:/exams/test'\
'&f=marks:/exams/marks'\
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
    {
      "subject": "Maths",
      "test"   : "term1",
      "marks"  : 90},
    {
      "subject": "Biology",
      "test"   : "term1",
      "marks"  : 86}
  ]
}'

curl 'http://localhost:8983/api/cores/techproducts/update/json/docs'\
'?split=/exams'\
'&f=first:/first'\
'&f=last:/last'\
'&f=grade:/grade'\
'&f=subject:/exams/subject'\
'&f=test:/exams/test'\
'&f=marks:/exams/marks'\
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
    {
      "subject": "Maths",
      "test"   : "term1",
      "marks"  : 90},
    {
      "subject": "Biology",
      "test"   : "term1",
      "marks"  : 86}
  ]
}'

curl 'http://localhost:8983/api/collections/techproducts/update/json/docs'\
'?split=/exams'\
'&f=first:/first'\
'&f=last:/last'\
'&f=grade:/grade'\
'&f=subject:/exams/subject'\
'&f=test:/exams/test'\
'&f=marks:/exams/marks'\
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
    {
      "subject": "Maths",
      "test"   : "term1",
      "marks"  : 90},
    {
      "subject": "Biology",
      "test"   : "term1",
      "marks"  : 86}
  ]
}'

通过这个请求，我们定义了“exams”包含多个文档。此外，我们还将输入文档中的几个字段映射到了 Solr 字段。

当更新请求完成时，以下两个文档将被添加到索引中

{
  "first":"John",
  "last":"Doe",
  "marks":90,
  "test":"term1",
  "subject":"Maths",
  "grade":8
}
{
  "first":"John",
  "last":"Doe",
  "marks":86,
  "test":"term1",
  "subject":"Biology",
  "grade":8
}

在前面的示例中，我们希望在 Solr 中使用的所有字段都与它们在输入 JSON 中的名称相同。如果出现这种情况，我们可以通过仅指定 f 参数的 json-path 部分来简化请求，如下例所示

V1 API
V2 API 用户管理 / 单节点 Solr
V2 API SolrCloud

curl 'http://localhost:8983/solr/techproducts/update/json/docs'\
'?split=/exams'\
'&f=/first'\
'&f=/last'\
'&f=/grade'\
'&f=/exams/subject'\
'&f=/exams/test'\
'&f=/exams/marks'\
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
    {
      "subject": "Maths",
      "test"   : "term1",
      "marks"  : 90},
    {
      "subject": "Biology",
      "test"   : "term1",
      "marks"  : 86}
  ]
}'

curl 'http://localhost:8983/api/cores/techproducts/update/json/docs'\
'?split=/exams'\
'&f=/first'\
'&f=/last'\
'&f=/grade'\
'&f=/exams/subject'\
'&f=/exams/test'\
'&f=/exams/marks'\
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
    {
      "subject": "Maths",
      "test"   : "term1",
      "marks"  : 90},
    {
      "subject": "Biology",
      "test"   : "term1",
      "marks"  : 86}
  ]
}'

curl 'http://localhost:8983/api/collections/techproducts/update/json/docs'\
'?split=/exams'\
'&f=/first'\
'&f=/last'\
'&f=/grade'\
'&f=/exams/subject'\
'&f=/exams/test'\
'&f=/exams/marks'\
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
    {
      "subject": "Maths",
      "test"   : "term1",
      "marks"  : 90},
    {
      "subject": "Biology",
      "test"   : "term1",
      "marks"  : 86}
  ]
}'

在这个示例中，我们简单地命名了字段路径（例如 /exams/test）。Solr 将自动尝试将 JSON 输入中该字段的内容添加到索引中，并使用相同的名称作为字段名称。

如果字段在索引之前不存在于模式中，则在索引期间将拒绝文档。因此，如果您不使用无模式模式，则必须预先创建所有字段。但是，如果您在无模式模式下工作，则不存在的字段将在运行时创建，并使用 Solr 对字段类型的最佳猜测。

在多个请求中重用参数

您可以使用 Solr 的请求参数 API存储和重用参数。

假设我们想要定义参数以在 exams 字段处拆分文档，并映射其他几个字段。我们可以发出如下 API 请求

V1 API
V2 API 用户管理 / 单节点 Solr
V2 API SolrCloud

 curl http://localhost:8983/solr/techproducts/config/params -H 'Content-type:application/json' -d '{
 "set": {
   "my_params": {
     "split": "/exams",
     "f": ["first:/first","last:/last","grade:/grade","subject:/exams/subject","test:/exams/test"]
 }}}'

curl http://localhost:8983/api/cores/techproducts/config/params -H 'Content-type:application/json' -d '{
 "set": {
   "my_params": {
     "split": "/exams",
     "f": ["first:/first","last:/last","grade:/grade","subject:/exams/subject","test:/exams/test"]
 }}}'

curl http://localhost:8983/api/collections/techproducts/config/params -H 'Content-type:application/json' -d '{
 "set": {
   "my_params": {
     "split": "/exams",
     "f": ["first:/first","last:/last","grade:/grade","subject:/exams/subject","test:/exams/test"]
 }}}'

当我们发送文档时，我们将使用 useParams 参数以及我们定义的参数集的名称

V1 API
V2 API 用户管理 / 单节点 Solr
V2 API SolrCloud

curl 'http://localhost:8983/solr/techproducts/update/json/docs?useParams=my_params' -H 'Content-type:application/json' -d '{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [{
      "subject": "Maths",
      "test": "term1",
      "marks": 90
    },
    {
      "subject": "Biology",
      "test": "term1",
      "marks": 86
    }
  ]
}'

curl 'http://localhost:8983/api/cores/techproducts/update/json?useParams=my_params' -H 'Content-type:application/json' -d '{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [{
      "subject": "Maths",
      "test": "term1",
      "marks": 90
    },
    {
      "subject": "Biology",
      "test": "term1",
      "marks": 86
    }
  ]
}'

curl 'http://localhost:8983/api/collections/techproducts/update/json?useParams=my_params' -H 'Content-type:application/json' -d '{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [{
      "subject": "Maths",
      "test": "term1",
      "marks": 90
    },
    {
      "subject": "Biology",
      "test": "term1",
      "marks": 86
    }
  ]
}'

使用通配符进行字段名称匹配

除了显式指定所有字段名称之外，还可以指定通配符来自动映射字段。

有两个限制：通配符只能在 json-path 的末尾使用，并且拆分路径不能使用通配符。

单个星号 * 仅映射到直接子项，而双星号 ** 递归映射到所有后代。以下是通配符路径映射的示例

f=$FQN:/**：将所有字段映射到 JSON 字段的完全限定名称（$FQN）。完全限定名称是通过连接层次结构中的所有键，并以句点（.）作为分隔符获得的。如果没有指定 f 路径映射，这是默认行为。
f=/docs/*：将 docs 下的所有字段映射为 JSON 中给定的名称
f=/docs/**：将 docs 及其子项下的所有字段映射为 JSON 中给定的名称
f=searchField:/docs/*：将 /docs 下的所有字段映射到名为 ‘searchField’ 的单个字段
f=searchField:/docs/**：将 /docs 及其子项下的所有字段映射到 searchField

使用通配符，我们可以进一步简化之前的示例，如下所示

V1 API
V2 API 用户管理 / 单节点 Solr
V2 API SolrCloud

curl 'http://localhost:8983/solr/techproducts/update/json/docs'\
'?split=/exams'\
'&f=/**'\
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
    {
      "subject": "Maths",
      "test"   : "term1",
      "marks"  : 90},
    {
      "subject": "Biology",
      "test"   : "term1",
      "marks"  : 86}
  ]
}'

curl 'http://localhost:8983/api/cores/techproducts/update/json'\
'?split=/exams'\
'&f=/**'\
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
    {
      "subject": "Maths",
      "test"   : "term1",
      "marks"  : 90},
    {
      "subject": "Biology",
      "test"   : "term1",
      "marks"  : 86}
  ]
}'

curl 'http://localhost:8983/api/collections/techproducts/update/json'\
'?split=/exams'\
'&f=/**'\
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
    {
      "subject": "Maths",
      "test"   : "term1",
      "marks"  : 90},
    {
      "subject": "Biology",
      "test"   : "term1",
      "marks"  : 86}
  ]
}'

因为我们希望字段以 JSON 输入中找到的字段名称进行索引，所以 f=/** 中的双通配符会将所有字段及其后代映射到 Solr 中相同的字段。

也可以将所有值发送到单个字段并对该字段进行全文搜索。这是盲目索引和查询 JSON 文档而无需担心字段和模式的好选择。

V1 API
V2 API 用户管理 / 单节点 Solr
V2 API SolrCloud

curl 'http://localhost:8983/solr/techproducts/update/json/docs'\
'?split=/'\
'&f=txt:/**'\
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
    {
      "subject": "Maths",
      "test"   : "term1",
      "marks"  : 90},
    {
      "subject": "Biology",
      "test"   : "term1",
      "marks"  : 86}
  ]
}'

curl 'http://localhost:8983/api/cores/techproducts/update/json'\
'?split=/'\
'&f=txt:/**'\
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
    {
      "subject": "Maths",
      "test"   : "term1",
      "marks"  : 90},
    {
      "subject": "Biology",
      "test"   : "term1",
      "marks"  : 86}
  ]
}'

curl 'http://localhost:8983/api/collections/techproducts/update/json'\
'?split=/'\
'&f=txt:/**'\
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
    {
      "subject": "Maths",
      "test"   : "term1",
      "marks"  : 90},
    {
      "subject": "Biology",
      "test"   : "term1",
      "marks"  : 86}
  ]
}'

在上面的示例中，我们指定了所有字段应添加到 Solr 中名为“txt”的字段。这会将多个字段添加到单个字段，因此您选择的任何字段都应该是多值的。

默认行为是使用节点的完全限定名称（FQN）。因此，如果我们不定义任何字段映射，如下所示

V1 API
V2 API 用户管理 / 单节点 Solr
V2 API SolrCloud

curl 'http://localhost:8983/solr/techproducts/update/json/docs?split=/exams'\
    -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
    {
      "subject": "Maths",
      "test"   : "term1",
      "marks"  : 90},
    {
      "subject": "Biology",
      "test"   : "term1",
      "marks"  : 86}
  ]
}'

curl 'http://localhost:8983/api/cores/techproducts/update/json?split=/exams'\
    -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
    {
      "subject": "Maths",
      "test"   : "term1",
      "marks"  : 90},
    {
      "subject": "Biology",
      "test"   : "term1",
      "marks"  : 86}
  ]
}'

curl 'http://localhost:8983/api/collections/techproducts/update/json?split=/exams'\
    -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
    {
      "subject": "Maths",
      "test"   : "term1",
      "marks"  : 90},
    {
      "subject": "Biology",
      "test"   : "term1",
      "marks"  : 86}
  ]
}'

索引文档将添加到索引中，其字段如下所示

{
  "first":"John",
  "last":"Doe",
  "grade":8,
  "exams.subject":"Maths",
  "exams.test":"term1",
  "exams.marks":90},
{
  "first":"John",
  "last":"Doe",
  "grade":8,
  "exams.subject":"Biology",
  "exams.test":"term1",
  "exams.marks":86}

单个负载中的多个文档

此功能支持JSON Lines格式（.jsonl）的文档，该格式指定每行一个文档。

例如

V1 API
V2 API 用户管理 / 单节点 Solr
V2 API SolrCloud

curl 'http://localhost:8983/solr/techproducts/update/json/docs' -H 'Content-type:application/json' -d '
{ "first":"Steve", "last":"Jobs", "grade":1, "subject":"Social Science", "test":"term1", "marks":90}
{ "first":"Steve", "last":"Woz", "grade":1, "subject":"Political Science", "test":"term1", "marks":86}'

curl 'http://localhost:8983/api/collections/techproducts/update/json' -H 'Content-type:application/json' -d '
{ "first":"Steve", "last":"Jobs", "grade":1, "subject":"Social Science", "test":"term1", "marks":90}
{ "first":"Steve", "last":"Woz", "grade":1, "subject":"Political Science", "test":"term1", "marks":86}'

curl 'http://localhost:8983/api/collections/techproducts/update/json' -H 'Content-type:application/json' -d '
{ "first":"Steve", "last":"Jobs", "grade":1, "subject":"Social Science", "test":"term1", "marks":90}
{ "first":"Steve", "last":"Woz", "grade":1, "subject":"Political Science", "test":"term1", "marks":86}'

甚至是文档数组，如此示例中所示

V1 API
V2 API 用户管理 / 单节点 Solr
V2 API SolrCloud

curl 'http://localhost:8983/solr/techproducts/update/json/docs' -H 'Content-type:application/json' -d '[
{"first":"Steve", "last":"Jobs", "grade":1, "subject":"Computer Science", "test":"term1", "marks":90},
{"first":"Steve", "last":"Woz", "grade":1, "subject":"Calculus", "test":"term1", "marks":86}]'

curl 'http://localhost:8983/api/cores/techproducts/update/json' -H 'Content-type:application/json' -d '[
{"first":"Steve", "last":"Jobs", "grade":1, "subject":"Computer Science", "test":"term1", "marks":90},
{"first":"Steve", "last":"Woz", "grade":1, "subject":"Calculus", "test":"term1", "marks":86}]'

curl 'http://localhost:8983/api/collections/techproducts/update/json' -H 'Content-type:application/json' -d '[
{"first":"Steve", "last":"Jobs", "grade":1, "subject":"Computer Science", "test":"term1", "marks":90},
{"first":"Steve", "last":"Woz", "grade":1, "subject":"Calculus", "test":"term1", "marks":86}]'

自定义 JSON 索引的提示

无模式模式：这会自动处理字段创建。字段猜测可能与您预期的不完全相同，但它有效。最好在无模式模式下设置本地服务器，索引一些示例文档，并在索引之前使用正确的字段类型在您的真实设置中创建这些字段
预先创建的模式：使用 echo=true 将您的文档发布到 /update/json/docs 端点。这为您提供了需要创建的字段名称列表。在您实际索引之前创建字段
没有模式，只有全文搜索：您需要做的只是对您的 JSON 进行全文搜索。按照“设置 JSON 默认值”部分中给出的设置进行配置。

设置 JSON 默认值

可以将任何 JSON 发送到 /update/json/docs 端点，组件的默认配置如下

<initParams path="/update/json/docs">
  <lst name="defaults">
    <!-- this ensures that the entire JSON doc will be stored verbatim into one field -->
    <str name="srcField">_src_</str>
    <!-- This means the uniqueKeyField will be extracted from the fields and
         all fields go into the 'df' field. In this config df is already configured to be 'text'
     -->
    <str name="mapUniqueKeyOnly">true</str>
    <!-- The default search field where all the values are indexed to -->
    <str name="df">text</str>
  </lst>
</initParams>

因此，如果没有传递任何参数，则整个 JSON 文件将被索引到 _src_ 字段，并且输入 JSON 中的所有值都将进入名为 text 的字段。如果存在 uniqueKey 的值，则会存储它，并且如果无法从输入 JSON 中获取该值，则会创建一个 UUID 并将其用作 uniqueKey 字段的值。

或者，使用请求参数功能来设置这些参数，如前面“在多个请求中重用参数”部分所示。

V1 API
V2 API 用户管理 / 单节点 Solr
V2 API SolrCloud

 curl http://localhost:8983/solr/techproducts/config/params -H 'Content-type:application/json' -d '{
"set": {
  "full_txt": {
    "srcField": "_src_",
    "mapUniqueKeyOnly" : true,
    "df": "text"
}}}'

 curl http://localhost:8983/api/cores/techproducts/config/params -H 'Content-type:application/json' -d '{
"set": {
  "full_txt": {
    "srcField": "_src_",
    "mapUniqueKeyOnly" : true,
    "df": "text"
}}}'

 curl http://localhost:8983/api/collections/techproducts/config/params -H 'Content-type:application/json' -d '{
"set": {
  "full_txt": {
    "srcField": "_src_",
    "mapUniqueKeyOnly" : true,
    "df": "text"
}}}'

要使用这些参数，请在每个请求中发送参数 useParams=full_txt。