在 R 中逐行读取文本文件

Jesse John 2023年1月30日

R R File

R 中的连接是什么
在 R 中使用 readLines() 函数逐行读取文本文件

使用 readLines() 函数很容易将纯文本文件读入 R。结果是一个字符向量，其中每个元素都是一个字符串，包括来自文本文件的一行。

但是当我们不想将整个文件加载到内存中时，我们需要一次一行地读取和处理文本文件。

R 中的连接是什么

在 R 中，连接是类文件对象的接口。

诸如 readLines() 之类的函数打开一个连接，使用它并关闭它。它一次读取整个文件。

要顺序读取文件，我们必须显式打开一个连接。为此，我们将使用 file() 函数。

我们将使用的选项如下。

description 代表文件名
open 指定我们只想读取文件
blocking = TRUE

在 R 中使用 `readLines()` 函数逐行读取文本文件

只要读取行，我们就会使用 readLines() 函数。因为我们不知道行数，所以我们需要知道何时到达文件末尾。

当文件中没有更多行时，readLines() 函数返回一个空字符向量 character(0)。

此时，我们将使用 break 退出循环。请注意，循环不会在空行处中断，因为它们具有行尾标记。

我们将在每次循环迭代中使用 identical() 函数来检查该行是否与 character(0) 相同。

如果是，我们 break；否则，我们处理该行。在示例中，我们将其打印出来。

根据文档，文本文件的最后一行应该有一个行尾标记，以便 readLines() 函数能够始终如一地工作。

首先，用以下文本创建一个纯文本文件，并用 UTF-8 编码和合适的文件名保存（我们需要相应地替换代码中的文件名）。

The first line.
The second line.
The third line.

Last line.

在示例代码中，我们将执行以下操作。

创建一个连接并打开它。
循环遍历文件，一次一行，打印它。
当行是 character(0) 时退出循环。
关闭连接。显式打开的连接需要显式关闭。

示例代码：

# Explicitly create and and open a connection.
# REPLACE THE FILE NAME AS PER YOUR FILE.
myCon = file(description = "filename.txt", open="r", blocking = TRUE)

# The position in the connection advances to the next line on each iteration.
# Loop till the line is the empty vector, character(0).
repeat{
  pl = readLines(myCon, n = 1) # Read one line from the connection.
  if(identical(pl, character(0))){break} # If the line is empty, exit.
  print(pl) # Otherwise, print and repeat next iteration.
}

# Explicitly opened connection needs to be explicitly closed.
close(myCon)
rm(myCon) # Removes the connection object from memory.

参考和帮助

请参阅 R 数据导入/导出手册的第 7 节。

此外，请参阅 connections 或 file 和 readLines 的文档。

作者： Jesse John

Jesse is passionate about data analysis and visualization. He uses the R statistical programming language for all aspects of his work.

R 中的连接是什么

在 R 中使用 readLines() 函数逐行读取文本文件

参考和帮助

相关文章 - R File

在 R 中使用 `readLines()` 函数逐行读取文本文件