Skip to content

Scoping / Documentation / Context issues -- How to be specific? #7709

@J-Moravec

Description

@J-Moravec

Hello,

I have been trying to find a good documentation on how to avoid potential scoping issues.

For instance, consider

library("data.table")
dt = data.table(a = 1:3, b = 4:6)
dt[a %in% 1:3]

So far so good, a is interpreted within the dt. But what if we set a = 4:6?
Documentation often talks about the . and .. calls, but those are specific to j and not i part of data.table.

a = 4:6
dt[a %in% a]

obviously wouldn't have desired effect.

But the Introduction to data.table doesn't really offer a solution.
https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html

Most stackoverflow or AI answers are incorrect, often they will say to use the get() solution, but it doesn't work either.

The solution is introduced in https://cran.r-project.org/web/packages/data.table/vignettes/datatable-programming.html, use the env = list(...), such as:

dt[col %in% value, env = list(col = "a", value = a)

Yet, in some cases, wrapping in I is also required if the value is of character, because it would be interpreted as a column.

dt[] = lapply(dt, as.character)
a = "1"
dt[col %in% value, env = list(col = "a", value = a))] # fails because "1" is not a column
dt[col %in% value, env = list(col = "a", value = I(a))] # works
dt[col %in% value, env = list(col = as.name("a"), value = I(a))] # safest?

All these things are imo required when he user wants to be specific, like this variable names a column in a data.table, while this_one names a variable comming from outer environment. This way wires won't be crossed.

This should IMO be in the "introduction to data.table" as a simple case or much more complex "Programming on data.table`

Apparently, env is a new interface and get(), mget() etc. were at one point interfaces of data.table but were discontinued (likely because they were buggy).

https://stackoverflow.com/a/54800108


tl,dr: Add a note in "Introduction to data.table" about the env interface and how to pass column names/values as variables.

Metadata

Metadata

Assignees

No one assigned

    Labels

    programmingparameterizing queries: get, mget, eval, envquestion

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions