Fatih Pense's Blog
My First Zig Segfault
Saturday, September 16th, 2023
Just a few hours ago, I got my first segfault using Zig while parsing JSON from file. New badge unlocked!
- Line 59: Parsing First structure from embedded string works
- Line 75: Parsing Second structure from file works
- Line 90: Parsing First structure from file SEGFAULT.
What is happening? I feel unsafe
:P
Everything is a mix of first two approaches. That is interesting! Can you spot the bug here?
const std = @import("std");
const embedded_json = @embedFile("first_structure_embedded.json");
const Deeper = struct {
field_1: []u8,
field_2: []u8,
field_3: bool,
field_4: usize,
field_5: []u8,
};
const Deep = struct {
field_1: []u8,
field_2: []u8,
field_3: []u8,
field_4: bool,
field_5: bool,
field_6: usize,
field_7: []u8,
field_8: ?[]Deeper,
};
const InsideFirst = struct {
field_1: []u8,
field_2: []u8,
field_3: []u8,
field_4: []Deep,
};
const FirstStruct = struct { segments: std.json.ArrayHashMap(InsideFirst) };
//
const InsideSecond = struct {
field_1: []u8,
field_2: []u8,
field_3: []u8,
field_4: []u8,
field_5: usize,
field_6: bool,
field_7: usize,
};
const SecondStruct = struct {
lines: []InsideSecond,
};
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
const firstStructureParsedA: std.json.Parsed(FirstStruct) = try std.json.parseFromSlice(FirstStruct, allocator, embedded_json, .{ .ignore_unknown_fields = true });
defer firstStructureParsedA.deinit();
const firstStructureA: FirstStruct = firstStructureParsedA.value;
if (firstStructureA.segments.map.get("ABC")) |*value| {
std.debug.print("lets see 1: {s}\n", .{value.name});
}
var secondStructureParsed: std.json.Parsed(SecondStruct) = undefined;
{
const file = try std.fs.cwd().openFile("test/second_structure.json", .{});
defer file.close();
const size = (try file.stat()).size;
const source = try file.reader().readAllAlloc(allocator, size);
defer allocator.free(source);
secondStructureParsed = try std.json.parseFromSlice(SecondStruct, allocator, source, .{ .ignore_unknown_fields = true });
}
defer secondStructureParsed.deinit();
const secondStructure = secondStructureParsed.value;
std.debug.print("lets see 2: {s}\n", .{secondStructure.lines[0].field_1});
var firstStructureParsedB: std.json.Parsed(FirstStruct) = undefined;
{
const file = try std.fs.cwd().openFile("test/first_structure_runtime.json", .{});
defer file.close();
const size = (try file.stat()).size;
const source = try file.reader().readAllAlloc(allocator, size);
defer allocator.free(source);
firstStructureParsedB = try std.json.parseFromSlice(FirstStruct, allocator, source, .{ .ignore_unknown_fields = true });
}
defer firstStructureParsedB.deinit();
const firstStructureB: FirstStruct = firstStructureParsedB.value;
if (firstStructureB.segments.map.get("DEF")) |*value| {
std.debug.print("lets see 3: {s}\n", .{value.name});
}
}
Solution
A pleasant surprise is when you jump into the definitions of standard library, it opens the source code. And the source code is understandable. Zig’s commitment to simplicity pays off here.
So I open std.json.ArrayHashMap
definition, since it is the only difference between the structs.
It is using an allocator.
/// A thin wrapper around `std.StringArrayHashMapUnmanaged` that implements
/// `jsonParse`, `jsonParseFromValue`, and `jsonStringify`.
/// This is useful when your JSON schema has an object with arbitrary data keys
/// instead of comptime-known struct field names.
pub fn ArrayHashMap(comptime T: type) type {
return struct {
map: std.StringArrayHashMapUnmanaged(T) = .{},
pub fn deinit(self: *@This(), allocator: Allocator) void {
self.map.deinit(allocator);
}
pub fn jsonParse(allocator: Allocator, source: anytype, options: ParseOptions) !@This() {
var map = std.StringArrayHashMapUnmanaged(T){};
errdefer map.deinit(allocator);
//...
ParseOptions
caught my eye. So I jump to its definition. Then allocate
field explains everything.
/// Controls how to deal with various inconsistencies between the JSON document and the Zig struct type passed in.
/// For duplicate fields or unknown fields, set options in this struct.
/// For missing fields, give the Zig struct fields default values.
pub const ParseOptions = struct {
/// Behaviour when a duplicate field is encountered.
/// The default is to return `error.DuplicateField`.
duplicate_field_behavior: enum {
use_first,
@"error",
use_last,
} = .@"error",
/// If false, finding an unknown field returns `error.UnknownField`.
ignore_unknown_fields: bool = false,
/// Passed to `std.json.Scanner.nextAllocMax` or `std.json.Reader.nextAllocMax`.
/// The default for `parseFromSlice` or `parseFromTokenSource` with a `*std.json.Scanner` input
/// is the length of the input slice, which means `error.ValueTooLong` will never be returned.
/// The default for `parseFromTokenSource` with a `*std.json.Reader` is `std.json.default_max_value_len`.
/// Ignored for `parseFromValue` and `parseFromValueLeaky`.
max_value_len: ?usize = null,
/// This determines whether strings should always be copied,
/// or if a reference to the given buffer should be preferred if possible.
/// The default for `parseFromSlice` or `parseFromTokenSource` with a `*std.json.Scanner` input
/// is `.alloc_if_needed`.
/// The default with a `*std.json.Reader` input is `.alloc_always`.
/// Ignored for `parseFromValue` and `parseFromValueLeaky`.
allocate: ?AllocWhen = null,
};
It is an optimization method: when parsing something, if you already have the strings loaded into the memory in the source, then it makes sense to reuse that!
Then possible solutions to prevent this segfault are:
- Source lifetime must be equal to or longer than
firstStructureParsedB
. - We can specify parser to copy all the strings.
- We can give it
*std.json.Scanner
Stacktrace
For the completeness, I’m also adding the stacktrace here. It is a very helpful addition by Zig to show stacktrace when segfault happens.
I’m told other languages don’t have this feature. Only printing segfault
…
PS C:\dev\myproj> zig build run
lets see 1: value
lets see 2: value
Segmentation fault at address 0x1d244380017
C:\opt\zig\lib\std\array_hash_map.zig:48:19: 0x7ff632db5d6e in eqlString (myproj.exe.obj)
return mem.eql(u8, a, b);
^
C:\opt\zig\lib\std\array_hash_map.zig:43:25: 0x7ff632da8be2 in eql (myproj.exe.obj)
return eqlString(a, b);
^
C:\opt\zig\lib\std\array_hash_map.zig:1619:67: 0x7ff632db6197 in getSlotByKey__anon_10201 (myproj.exe.obj)
if (hash_match and checkedEql(ctx, key, keys_array[i], i))
^
C:\opt\zig\lib\std\array_hash_map.zig:964:43: 0x7ff632da8cc1 in getIndexWithHeaderGeneric__anon_9789 (myproj.exe.obj)
const slot = self.getSlotByKey(key, ctx, header, I, indexes) orelse return null;
^
C:\opt\zig\lib\std\array_hash_map.zig:957:61: 0x7ff632d9c335 in getIndexAdapted__anon_9318 (myproj.exe.obj)
.u8 => return self.getIndexWithHeaderGeneric(key, ctx, header, u8),
^
C:\opt\zig\lib\std\array_hash_map.zig:978:47: 0x7ff632d868f3 in getAdapted__anon_8254 (myproj.exe.obj)
const index = self.getIndexAdapted(key, ctx) orelse return null;
^
C:\opt\zig\lib\std\array_hash_map.zig:975:35: 0x7ff632d5458f in getContext (myproj.exe.obj)
return self.getAdapted(key, ctx);
^
C:\opt\zig\lib\std\array_hash_map.zig:972:35: 0x7ff632d5296f in get (myproj.exe.obj)
return self.getContext(key, undefined);
^
C:\dev\myproj\src\main.zig:92:41: 0x7ff632d52666 in main (myproj.exe.obj)
if (firstStructureB.segments.map.get("ABC")) |*value| {
^
C:\opt\zig\lib\std\start.zig:339:65: 0x7ff632d5351c in WinStartup (myproj.exe.obj)
std.os.windows.kernel32.ExitProcess(initEventLoopAndCallMain());
^
???:?:?: 0x7ffa87bb7343 in ??? (KERNEL32.DLL)
???:?:?: 0x7ffa882a26b0 in ??? (ntdll.dll)