Configuration Options for Parsing from JSON

suppressPackageStartupMessages({
  library(yyjsonr)
})

Overview

This vignette:

  • introduces the opts argument for reading JSON with the read_json_X() family of functions.
  • outlines the creation of default options with opts_read_json()
  • provides extended examples of how these options control parsing of JSON

The opts argument - Specifying options when reading JSON

All read_json_x() functions have an opts argument. opts takes a named list of options used to configure the way yyjsonr parses JSON into R objects.

The default argument for opts is an empty list, which internally sets the default options for parsing.

The default options for parsing can also be viewed by running opts_read_json().

The following three function calls are all equivalent ways of calling read_json_str() using the default options:

read_json_str(str)
read_json_str(str, opts = list())
read_json_str(str, opts = opts_read_json())

Setting arguments to override the default options

Setting a single option (and keeping all other options at their default value) can be done in a number of ways.

The following three function calls are all equivalent:

read_json_str(str, opts = list(str_specials = 'string'))
read_json_str(str, opts = opts_read_json(str_specials = 'string'))
read_json_str(str, str_specials = 'string')

Option promote_num_to_string - mixtures of numeric and string types

By default, yyjsonr does not promote string values to numerica values i.e. promote_num_to_string = FALSE.

If an array contains mixed types, then an R list will be returned, so that all JSON values retain their original type.

json <- '[1,2,3.1,"apple", null]'
read_json_str(json)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2
#> 
#> [[3]]
#> [1] 3.1
#> 
#> [[4]]
#> [1] "apple"
#> 
#> [[5]]
#> NULL

If promote_num_to_string is set to TRUE, then yyjsonr will promote numeric types to strings if the following conditions are met:

  • values are stored in a JSON array
  • the JSON array only contains numerics, strings or the JSON null value
yyjsonr::read_json_str(json, promote_num_to_string = TRUE)
#> [1] "1"        "2"        "3.100000" "apple"    NA

Option df_missing_list_elem - Missing list elements (when parsing data.frames)

When JSON data is being parsed into an R data.frame some columns become list-columns if there are mixed types in the original JSON.

It is possible that some values are completely missing in the JSON representation, and the df_missing_list_elem specifies the replacement for this missing value in the R data.frame. The default value is df_missing_list_elem = NULL.

JSON to data.frame (no list columns needed)

str <- '[{"a":1, "b":2}, {"a":3, "b":4}]'
read_json_str(str)
#>   a b
#> 1 1 2
#> 2 3 4

JSON to data.frame - list-columns required

str <- '[{"a":1, "b":[1,2]}, {"a":3, "b":2}]'
read_json_str(str)
#>   a    b
#> 1 1 1, 2
#> 2 3    2
str <- '[{"a":1, "b":[1,2]}, {"a":2}]'
read_json_str(str)
#>   a    b
#> 1 1 1, 2
#> 2 2 NULL
read_json_str(str, df_missing_list_elem = NA)
#>   a    b
#> 1 1 1, 2
#> 2 2   NA

Option obj_of_arrs_to_df - Reading JSON as a data.frame

By default, if JSON looks like it represents a data.frame it will be loaded as such. That is, a JSON {} object which contains only [] arrays (all of equal length) will be treated as data.frame. This is the default i.e. obj_of_arrs_to_df = TRUE.

If obj_of_arrs_to_df = FALSE then this data will be read in as a named list. In addition, if the [] arrays are not all the same length, then the data will also be read in as a named list as no inference of missing values will be done.

str <- '{"a":[1,2],"b":["apple", "banana"]}'
read_json_str(str)
#>   a      b
#> 1 1  apple
#> 2 2 banana
read_json_str(str, obj_of_arrs_to_df = FALSE)
#> $a
#> [1] 1 2
#> 
#> $b
#> [1] "apple"  "banana"
str_unequal <- '{"a":[1,2],"b":["apple", "banana", "carrot"]}'
read_json_str(str_unequal)
#> $a
#> [1] 1 2
#> 
#> $b
#> [1] "apple"  "banana" "carrot"

Option arr_of_objs_to_df - Reading JSON as a data.frame

str <- '[{"a":1, "b":2}, {"a":3, "b":4}]'
read_json_str(str)
#>   a b
#> 1 1 2
#> 2 3 4
read_json_str(str, arr_of_objs_to_df = FALSE)
#> [[1]]
#> [[1]]$a
#> [1] 1
#> 
#> [[1]]$b
#> [1] 2
#> 
#> 
#> [[2]]
#> [[2]]$a
#> [1] 3
#> 
#> [[2]]$b
#> [1] 4
str <- '[{"a":1, "b":2}, {"a":3, "b":4, "c":99}]'
read_json_str(str)
#>   a b  c
#> 1 1 2 NA
#> 2 3 4 99

Option str_specials - Reading string "NA" from JSON

JSON only really has the value null for representing special missing values, and this is converted to an R NA_character_ value when it is encountered in a string-ish context.

When yyjsonr encounters a literal "NA" value in a string-ish context, its conversion to an R value is controlled by the str_specials options

The possible values for the str_specials argument are:

  • string read in as the literal character string "NA" (the default behaviour)
  • special read in as NA_character_
str <- '["hello", "NA", null]'
read_json_str(str) # default: str_specials = 'string'
#> [1] "hello" "NA"    NA
read_json_str(str, str_specials = 'special')
#> [1] "hello" NA      NA

Option num_specials - Reading numeric "NA", "NaN" and "Inf"

JSON only really has the value null for representing special missing values, and this is converted to an R NA_integer_ or NA_real_ value when it is encountered in a number-ish context.

When yyjsonr encounters a literal "NA", "NaN" or "Inf" value in a number-ish context, its conversion to an R value is controlled by the num_specials options.

The possible values for the num_specials argument are:

  • special read in as an actual numeric NA, NaN or Inf value (the default behaviour)
  • string read in as the literal character string "NA" etc
str <- '[1.23, "NA", "NaN", "Inf", "-Inf", null]'
read_json_str(str) # default: num_specials = 'special'
#> [1] 1.23   NA  NaN  Inf -Inf   NA
read_json_str(str, num_specials = 'string')
#> [[1]]
#> [1] 1.23
#> 
#> [[2]]
#> [1] "NA"
#> 
#> [[3]]
#> [1] "NaN"
#> 
#> [[4]]
#> [1] "Inf"
#> 
#> [[5]]
#> [1] "-Inf"
#> 
#> [[6]]
#> NULL

Option int64 - large integer support

JSON supports large integers outside the range of R’s 32-bit integer type.

When such a large value is encountered in JSON, the int64 option controls the value’s representation in R.

The possible values for the int64 option are:

  • string store JSON integer as a string in R
  • double will store the JSON integer as a double precisision numeric. If the integer is outside the range +/- 2^53, then it may not be stored perfectly in the double.
  • bit64 convert to a 64-bit integer supported by the {bit64} package.
str <- '[1, 274877906944]'

# default: int64 = 'string'
# Since result is a mix of types, a list is returned
read_json_str(str) 
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] "274877906944"
# Read large integer as double
robj <- read_json_str(str, int64 = 'double')
class(robj)
#> [1] "numeric"
robj
#> [1]            1 274877906944
# Read large integer as 'bit64::integer64' type
library(bit64)
read_json_str(str, int64 = 'bit64')
#> integer64
#> [1] 1            274877906944

Option length1_array_asis - distinguishing scalars from length-1 vectors

JSON supports the concept of both scalar and vector values i.e. in JSON scalar 67 is different from an array of length 1 [67]. The length1_array_asis option is for situations where it is important to distinguish these value types in R.

However, R does not make this distinction between scalars and vectors of length 1.

To assist in translating objects from JSON to R and back to JSON, setting length1_array_asis = TRUE will mark JSON arrays of length 1 with the class AsIs. This option defaults to FALSE.

read_json_str('67')   |> str()
#>  int 67
read_json_str('[67]') |> str()
#>  int 67
read_json_str('67'  , length1_array_asis = TRUE) |> str()
#>  int 67
read_json_str('[67]', length1_array_asis = TRUE) |> str() # Has 'AsIs' class
#>  'AsIs' int 67

This option is then used with the option auto_unbox when writing JSON in order to control how length-1 R vectors are written. Shown below, if the length-1 vector is marked with AsIs class when reading, then when writing out to JSON with auto_unbox = TRUE it becomes a JSON vector value.

In the following example, only the second value ([67]) is affected by the option length1_array_asis. When the option is TRUE the value is tagged with a class of AsIs. Then when the created R object is subsequently written out to a JSON string, its structure is determined by auto_unbox which understands how to handle this class.

str <- '{"a":67, "b":[67], "c":[1,2]}'

# Length-1 vectors output as JSON arrays
read_json_str(str) |>
  write_json_str(auto_unbox = FALSE) |>
  cat()
#> {"a":[67],"b":[67],"c":[1,2]}
# Length-1 vectors output as JSON scalars
read_json_str(str) |>
  write_json_str(auto_unbox = TRUE) |>
  cat()
#> {"a":67,"b":67,"c":[1,2]}
# Length-1 vectors output as JSON arrays
read_json_str(str, length1_array_asis = TRUE) |>
  write_json_str(auto_unbox = FALSE) |>
  cat()
#> {"a":[67],"b":[67],"c":[1,2]}
# !!!!
# Those values marked with 'AsIs' class when reading are output
# as length-1 JSON arrays
read_json_str(str, length1_array_asis = TRUE) |>
  write_json_str(auto_unbox = TRUE) |>
  cat()
#> {"a":67,"b":[67],"c":[1,2]}

Option yyjson_read_flag - internal YYJSON C library options

The yyjson C library supports a number of internal options for reading JSON.

These options are considered advanced, and the user is referred to the yyjson documentation for further explanation on what they control.

Warning: some of these advanced options do not make sense for interfacing with R, or otherwise conflict with how this package converts JSON to R objects.

# A reference list of all the possible YYJSON options
yyjsonr::yyjson_read_flag
#> $YYJSON_READ_NOFLAG
#> [1] 0
#> 
#> $YYJSON_READ_INSITU
#> [1] 1
#> 
#> $YYJSON_READ_STOP_WHEN_DONE
#> [1] 2
#> 
#> $YYJSON_READ_ALLOW_TRAILING_COMMAS
#> [1] 4
#> 
#> $YYJSON_READ_ALLOW_COMMENTS
#> [1] 8
#> 
#> $YYJSON_READ_ALLOW_INF_AND_NAN
#> [1] 16
#> 
#> $YYJSON_READ_NUMBER_AS_RAW
#> [1] 32
#> 
#> $YYJSON_READ_ALLOW_INVALID_UNICODE
#> [1] 64
#> 
#> $YYJSON_READ_BIGNUM_AS_RAW
#> [1] 128
read_json_str(
  "[1, 2, 3, ] // A JSON comment not allowed by the standard",
  opts = opts_read_json(yyjson_read_flag = c(
    yyjson_read_flag$YYJSON_READ_ALLOW_TRAILING_COMMAS,
    yyjson_read_flag$YYJSON_READ_ALLOW_COMMENTS
  ))
)
#> [1] 1 2 3