Linker API

The Sefaria Linker relies on a POST API, documented below.

Introduction

Takes in text input and returns the location and a Sefaria link for each citation in the text. Below is a table detailing current support by language.

LanguageSupport
EnglishBasic support. Relies on a regular expression algorithm that can cause a high level of false positives and false negatives. Works better on simple sources like Tanakh, Mishnah, Talmud etc. Works more poorly on books with complicated titles or citations.
HebrewRelies on machine learning models. Aim is to limit false positives. Supports ibid citations. Will return multiple options if a citation is ambiguous.

API

POST /api/find-refs

This endpoint takes text as input and returns the location and a Sefaria link for each citation.

URL parameters

URL paramDescriptionTypeDefault
with_textReturn the text for each citation. See with_text format section for details.0 or 10
debugReturn debug information for each citation. See debug format section for details.0 or 10
max_segmentsWhen with_text is 1, what is the max number of segments to return for a citation. Limits size of response for general citations like פרשת בראשיתint. 0 means no limit.0

POST body

POST body should be a serialized JSON with the following fields

FieldDescriptionRequired
textText to search for citations in. See text format section for details.Yes

Response format

Response is in JSON in the following format. See example.

{
   "title": {
      // data corresponding to results in the "title" field of the POST data
      "results": [
         "startChar": <int>,    // start character of result relative to input text.
         "endChar": <int>,      // end character of result relative to input text.
         "text": <str>,         // text of the citation for this result. corresponds to data.title.substring(startChar, endChar)
         "linkFailed": <bool>,  // true if reference was located but failed to be linked to a source on Sefaria. Could mean an error in the linker or the source doesn't exist on Sefaria.
         "refs": [<str>],       // list of references in Sefaria format that this result could be linked to. If more than one reference, then the citation was considered to be ambiguous.
      ],
      "refData": {
         // object with keys for every ref in "results".
         <ref>: {
            heRef: <str>,           // Hebrew translation of reference
            url: <str>,             // reference in URL form. Create a URL to Sefaria with this format: www.sefaria.org/<url>
            primaryCategory: <str>  // primary category of reference in Sefaria. Corresponds to location in table of contents ([sefaria.org/texts](sefaria.org/texts).
            he: <str>,              // Hebrew text of <ref>. Only included if `with_text` URL param is `1`.
            en: <str>,              // English text of <ref>. Only included if `with_text` URL param is `1`.
            isTruncated: <bool>     // Was text truncated. Only included if `with_text` URL param is `1` and `max_segments` URL param is greater than 0.
         }
      },  
      "debugData": [],  // array of debug data where each element corresponds to an element in "results". only included if "debug" URL param is `1`. see "debug format" section.
   },
   "body": {
       // data corresponding to results in the "body" field of the POST data
       // same format as "title" object
   }
}

with_text format

When with_text URL param is 1, the following keys are added to the response object at response.title.refData and response.body.refData.

FieldDescription
heHebrew text of where <ref> is the key of the refData element.
enEnglish text of where <ref> is the key of the refData element.
isTruncatedWas text truncated according to max_segments URL param.

debug format

When the debug URL param is 1, the debugData field is added to the response object at response.title and response.body. The debugData field has the same shape as response.title.results and response.body.results (2-D array where outer array is for each citation found and inner arrays are for each possible way of parsing a given citation). Below is an example of a single element in debugData with comments explaining each field:

 {
               "orig_part_strs":[  // original input string broken into parts
                  "איוב",
                  "פרק יז"
               ],
               "orig_part_types":[  // associated types for each part in `orig_part_strs`
                  "NAMED",
                  "NUMBERED"
               ],
               "final_part_strs":[  // final part strings after processing (often times this is the same as `orig_part_strs` but can differ if in some cases)
                  "איוב",
                  "פרק יז"
               ],
               "final_part_types":[  // associated types for each part in `final_part_strs`
                  "NAMED",
                  "NUMBERED"
               ],
               "resolved_part_strs":[  // actual parts that were resolved. This is important because sometimes the parts are merged during resolution
                  "איוב",
                  "פרק יז"
               ],
               "resolved_part_types":[  // associated types for each part in `resolved_part_strs`
                  "NAMED",
                  "NUMBERED"
               ],
               "resolved_part_classes":[  // associated classes for each part in `resolved_part_strs`. This can help debug cases where context and ibid were used.
                  "RawRefPart",
                  "RawRefPart"
               ],
               "context_ref":null,  // If a context ref was used to resolve the citation (e.g. in the case of an ibid citation), it will appear here.
               "context_type":null  // associated type of context for `context_ref`.
            }

text format

The POST data field text should be formatted as follows:

{
   "title": <str>,  // text of the title of the document. May be empty string when there is not title. Any citations found in the title will be assumed as context for citations found in the "body". This is useful for articles that focus on one Perek of Tanakh for example.
   "body": <str>  // text of the body of the document
}

Example

Below is an example in cURL which uses all the URL parameters. Note, none of the URL parameters are required. The example shows how a citation found in the "title" field can be used as context for citations in the "body" field.

Input

curl -X POST 'https://www.sefaria.org/api/find-refs?debug=1&with_text=1&max_segments=5' --data-raw '{"text":{"body": "ראה מה שכתוב בפסוק א.", "title": "עיון על איוב פרק יז"}}'

Output

{
   "title":{
      "results":[
         {
            "startChar":8,
            "endChar":19,
            "text":"איוב פרק יז",
            "linkFailed":false,
            "refs":[
               "Job 17"
            ]
         }
      ],
      "refData":{
         "Job 17":{
            "heRef":"איוב י״ז",
            "url":"Job.17",
            "primaryCategory":"Tanakh",
            "he":[
               "רוּחִ֣י חֻ֭בָּלָה יָמַ֥י נִזְעָ֗כוּ קְבָרִ֥ים לִֽי׃",
               "אִם־לֹ֣א הֲ֭תֻלִים עִמָּדִ֑י וּ֝בְהַמְּרוֹתָ֗ם תָּלַ֥ן עֵינִֽי׃",
               "שִֽׂימָה־נָּ֭א עׇרְבֵ֣נִי עִמָּ֑ךְ מִ֥י ה֝֗וּא לְיָדִ֥י יִתָּקֵֽעַ׃",
               "כִּֽי־לִ֭בָּם צָפַ֣נְתָּ מִּשָּׂ֑כֶל עַל־כֵּ֝֗ן לֹ֣א תְרֹמֵֽם׃",
               "לְ֭חֵלֶק יַגִּ֣יד רֵעִ֑ים וְעֵינֵ֖י בָנָ֣יו תִּכְלֶֽנָה׃"
            ],
            "en":[
               "My spirit is crushed, my days run out;<br/>The graveyard waits for me.<br/>",
               "Surely mocking men keep me company,<br/>And with their provocations I close my eyes.",
               "Come now, stand surety for me!<br/>Who will give his hand on my behalf?",
               "You have hidden understanding from their minds;<br/>Therefore You must not exalt [them].",
               "He informs on his friends for a share [of their property],<br/>And his children’s eyes pine away.<br/>"
            ],
            "isTruncated":true
         }
      },
      "debugData":[
         [
            {
               "orig_part_strs":[
                  "איוב",
                  "פרק יז"
               ],
               "orig_part_types":[
                  "NAMED",
                  "NUMBERED"
               ],
               "final_part_strs":[
                  "איוב",
                  "פרק יז"
               ],
               "final_part_types":[
                  "NAMED",
                  "NUMBERED"
               ],
               "resolved_part_strs":[
                  "איוב",
                  "פרק יז"
               ],
               "resolved_part_types":[
                  "NAMED",
                  "NUMBERED"
               ],
               "resolved_part_classes":[
                  "RawRefPart",
                  "RawRefPart"
               ],
               "context_ref":null,
               "context_type":null
            }
         ]
      ]
   },
   "body":{
      "results":[
         {
            "startChar":13,
            "endChar":20,
            "text":"בפסוק א",
            "linkFailed":false,
            "refs":[
               "Job 17:1"
            ]
         }
      ],
      "refData":{
         "Job 17:1":{
            "heRef":"איוב י״ז:א׳",
            "url":"Job.17.1",
            "primaryCategory":"Tanakh",
            "he":[
               "רוּחִ֣י חֻ֭בָּלָה יָמַ֥י נִזְעָ֗כוּ קְבָרִ֥ים לִֽי׃"
            ],
            "en":[
               "My spirit is crushed, my days run out;<br/>The graveyard waits for me.<br/>"
            ],
            "isTruncated":false
         }
      },
      "debugData":[
         [
            {
               "orig_part_strs":[
                  "בפסוק א"
               ],
               "orig_part_types":[
                  "NUMBERED"
               ],
               "final_part_strs":[
                  "בפסוק א"
               ],
               "final_part_types":[
                  "NUMBERED"
               ],
               "resolved_part_strs":[
                  "job",
                  "SectionContext(AddressPerek(0), 'Chapter', 17)",
                  "בפסוק א"
               ],
               "resolved_part_types":[
                  "NAMED",
                  "NUMBERED",
                  "NUMBERED"
               ],
               "resolved_part_classes":[
                  "TermContext",
                  "SectionContext",
                  "RawRefPart"
               ],
               "context_ref":"Job 17",
               "context_type":"CURRENT_BOOK"
            }
         ]
      ]
   }
}

Debug using local webpage

In order to easily debug why certain citations aren't working, it is useful to use the linker.js plugin which has certain debug tools built-in. Below is a skeleton HTML page that can be used to test content against the linker. This webpage has linker.js embedded in debug mode which will show all citations caught, including one's that weren't linked. See here for more information on how the debug option works.

<html>
    <body>
        <h1>TITLE HERE</h1>
        <p>BODY HERE</p>
    </body>
    <script type="text/javascript" charset="utf-8" src="https://www.sefaria.org/linker.v3.js"></script>
    <script>sefaria.link({debug: true});</script>
</html>