DropFields 类
删除 DynamicFrame 中的字段。
方法
__call__(frame, paths, transformation_ctx = "", info = "", stageThreshold = 0, totalThreshold = 0)
删除 DynamicFrame 中的字节。
frame– 要在其中删除节点的DynamicFrame(必需)。paths– 要删除的节点的完整路径的列表 (必需)。transformation_ctx– 用于标识状态信息的唯一字符串 (可选)。info– 与转换中的错误关联的字符串 (可选)。stageThreshold– 在转换出错之前可能在其中发生的最大错误数 (可选;默认值为零)。totalThreshold– 在处理出错之前可能全面发生的最大错误数 (可选;默认值为零)。
返回不包含指定字段的新 DynamicFrame。
apply(cls, *args, **kwargs)
继承自 GlueTransform Apply。
name(cls)
继承自 GlueTransform 名称。
describeArgs(cls)
继承自 GlueTransform describeArgs。
describeReturn(cls)
继承自 GlueTransform describeReturn。
describeTransform(cls)
继承自 GlueTransform describeTransform。
describeErrors(cls)
继承自 GlueTransform describeErrors。
describe(cls)
继承自 GlueTransform Describe。
示例
DropFields 示例使用的数据集
以下数据集用于 DropFields 示例:
{name: Sally, age: 23, location: {state: WY, county: Fremont}, friends: []} {name: Varun, age: 34, location: {state: NE, county: Douglas}, friends: [{name: Arjun, age: 3}]} {name: George, age: 52, location: {state: NY}, friends: [{name: Fred}, {name: Amy, age: 15}]} {name: Haruki, age: 21, location: {state: AK, county: Denali}} {name: Sheila, age: 63, friends: [{name: Nancy, age: 22}]}
此数据集具有以下架构:
root |-- name: string |-- age: int |-- location: struct | |-- state: string | |-- county: string |-- friends: array | |-- element: struct | | |-- name: string | | |-- age: int
示例:删除顶级字段
使用类似如下的代码来删除 age 字段:
df_no_age = DropFields.apply(df, paths=['age'])
生成的数据集:
{name: Sally, location: {state: WY, county: Fremont}, friends: []} {name: Varun, location: {state: NE, county: Douglas}, friends: [{name: Arjun, age: 3}]} {name: George, location: {state: NY}, friends: [{name: Fred}, {name: Amy, age: 15}]} {name: Haruki, location: {state: AK, county: Denali}} {name: Sheila, friends: [{name: Nancy, age: 22}]}
生成的架构:
root |-- name: string |-- location: struct | |-- state: string | |-- county: string |-- friends: array | |-- element: struct | | |-- name: string | | |-- age: int
示例:删除嵌套字段
要删除嵌套字段,您可以使用 '.' 限定字段。
df_no_county = DropFields.apply(df, paths=['location.county'])
生成的数据集:
{name: Sally, age: 23, location: {state: WY}, friends: []} {name: Varun, age: 34, location: {state: NE}, friends: [{name: Arjun, age: 3}]} {name: George, age: 52, location: {state: NY}, friends: [{name: Fred}, {name: Amy, age: 15}]} {name: Haruki, age: 21, location: {state: AK}} {name: Sheila, age: 63, friends: [{name: Nancy, age: 22}]}
如果您删除 struct 类型的最后一个元素,则转换会删除整个 struct。
df_no_county = DropFields.apply(df, paths=['location.state])
生成的架构:
root |-- name: string |-- age: int |-- friends: array | |-- element: struct | | |-- name: string | | |-- age: int
示例:从数组中删除嵌套字段
要从 array 内嵌套的 struct 内删除字段,无需特殊语法。例如,我们可以使用以下语法从 数组中删除 friends 字段:age
df_no_friend_age = DropFields.apply(df, paths=['friends.age'])
生成的数据集:
{name: Sally, age: 23, location: {state: WY, county: Fremont}} {name: Varun, age: 34, location: {state: NE, county: Douglas}, friends: [{name: Arjun}]} {name: George, age: 52, location: {state: NY}, friends: [{name: Fred}, {name: Amy}]} {name: Haruki, age: 21, location: {state: AK, county: Denali}} {name: Sheila, age: 63, friends: [{name: Nancy}]}
生成的架构:
root |-- name: string |-- age: int |-- location: struct | |-- state: string | |-- county: string |-- friends: array | |-- element: struct | | |-- name: string
DropFields 示例
以下示例中 .zip 的两边必需反引号(`),因为列名称包含句点(.)。
dyf_dropfields = DropFields.apply(frame = dyf_join, paths = "`.zip`")